A Case Study on Agile Estimating and Planning using Scrum

Introducing agile methods into their development process represents an important challenge for many software companies. In the last few years several successful implementations of agile methods have been reported in the literature, e.g., [1–4]. According to Agile Adoption Rate Survey [5] performed by Dr. Dobbs Journal in 2008 agile teams report significant improvements in productivity, quality, and stakeholder satisfaction, and reasonable improvements in cost. A similar survey conducted by VersionOne [6] additionally reports enhanced ability to manage changing priorities and significantly improved project visibility. For this reason, agile methods are especially suitable for development of information systems with changing and emergent user requirements, e.g., [7]. On the other hand, the same survey has revealed that the lack of experience with agile methods and the conflict between the company’s culture and core agile values are the leading causes of failed agile projects. In spite of the fact that Scrum [8, 9] is the most widespread method in industry (according to [6] Scrum is used by 58% of respondents, Scrum/Extreme Programming hybrid by 17%, custom hybrid by 5%, Extreme Programming by 4%, etc.), a systematic review of empirical studies on agile software development [10] found only one study investigating Scrum. Consequently, one of the clear findings was that the coverage of the research area should be increased placing more focus on management-oriented approaches such as Scrum, which Dingsøyr et al. [11] consider an example of an area where there is a large gap and should be given priority. In order to fill this gap empirical studies with students as subjects can be helpful in further assessment of the applicability of Scrum before it is actually deployed in industrial software environments. A properly designed study [12] can provide preliminary evidence about its strengths and weaknesses, thus reducing risks accompanying its adoption in practice. In this paper we describe a case study that was conducted at the University of Ljubljana with the aim of studying the behavior of development teams using Scrum for the first time, i.e., a situation typical for software companies preparing to introduce Scrum into their development process. Within the framework of the capstone course in software engineering, which (as recommended by [13]) students take in their last semester 13 student teams were required to develop an almost real project strictly using Scrum. The data on project management activities were collected in order to measure the amount of work completed, compliance with the release and iteration plans, ability of effort estimation, etc., thus contributing to evidence-based assessment of the typical Scrum processes for possible use in software engineering practice.


Introduction
Introducing agile methods into their development process represents an important challenge for many software companies.In the last few years several successful implementations of agile methods have been reported in the literature, e.g., [1][2][3][4].According to Agile Adoption Rate Survey [5] performed by Dr. Dobbs Journal in 2008 agile teams report significant improvements in productivity, quality, and stakeholder satisfaction, and reasonable improvements in cost.A similar survey conducted by VersionOne [6] additionally reports enhanced ability to manage changing priorities and significantly improved project visibility.For this reason, agile methods are especially suitable for development of information systems with changing and emergent user requirements, e.g., [7].On the other hand, the same survey has revealed that the lack of experience with agile methods and the conflict between the company's culture and core agile values are the leading causes of failed agile projects.
In spite of the fact that Scrum [8,9] is the most widespread method in industry (according to [6] Scrum is used by 58% of respondents, Scrum/Extreme Programming hybrid by 17%, custom hybrid by 5%, Extreme Programming by 4%, etc.), a systematic review of empirical studies on agile software development [10] found only one study investigating Scrum.Consequently, one of the clear findings was that the coverage of the research area should be increased placing more focus on management-oriented approaches such as Scrum, which Dingsøyr et al. [11] consider an example of an area where there is a large gap and should be given priority.
In order to fill this gap empirical studies with students as subjects can be helpful in further assessment of the applicability of Scrum before it is actually deployed in industrial software environments.A properly designed study [12] can provide preliminary evidence about its strengths and weaknesses, thus reducing risks accompanying its adoption in practice.
In this paper we describe a case study that was conducted at the University of Ljubljana with the aim of studying the behavior of development teams using Scrum for the first time, i.e., a situation typical for software companies preparing to introduce Scrum into their development process.Within the framework of the capstone course in software engineering, which (as recommended by [13]) students take in their last semester 13 student teams were required to develop an almost real project strictly using Scrum.The data on project management activities were collected in order to measure the amount of work completed, compliance with the release and iteration plans, ability of effort estimation, etc., thus contributing to evidence-based assessment of the typical Scrum processes for possible use in software engineering practice.

Aims of the study and research questions
The aim of the study was twofold: (1) to analyze development teams' abilities of adopting Scrum concepts (e.g., estimation of user stories, release and iteration planning, concept of a user story being 'done'), and (2) to gather their opinions regarding the importance of particular practices for a successful Scrum project.
Regarding the first aim our hypothesis was that the estimates and plans will be less accurate at the beginning, but will improve from Sprint to Sprint.There is substantial evidence reported in the literature that the expert estimates tend to be over-optimistic [14] and that the planning poker estimation technique used by Scrum does not completely eliminate the over-optimism [15].Therefore, we expected our study to yield similar results.Considering our previous experience [16] and results of a study on behavior of Scrum teams [17] reporting the problem of unclear completion criteria we also decided to pay special attention to the notion of 'done'.It was agreed that the Product Owner could accept only those stories that were fully tested and robust enough to survive an encounter with end users.
With regard to the second aim a survey was conducted at the end of the study in order to find practices that contribute most to the success of a Scrum project.Practices were rated using a 5-point Likert scale, the grade 1 representing the lowest and the grade 5 the highest level of importance.
The remainder of the paper is organized as follows: In the next two sections we describe the case study design and its results.Then a description of students' opinions regarding the importance of particular Scrum practices follows.Finally, the limitations of the study are discussed that should be considered when applying the results in the industrial environment.

Case study design
The case study was conducted in the Summer term of the Academic Year 2009/10 as a part of the capstone software engineering course that lasted 15 weeks and was taken by 52 students who were divided into 13 groups.Each group played the role of a self-organizing and selfmanaging Scrum Team responsible for the development of a Web-based student records system covering enrollment, examination applications, examination records, some statistical surveys, and a special module for the maintenance of all data required for the proper functioning of the system (i.e., the maintenance of various code tables, lists of required and optional courses, data about teachers of each course, etc.).
The initial Product Backlog comprised 60 user stories and was the same for all teams.It was prepared by the teacher who had considerable experience in developing the University of Ljubljana student records information system [18,19], thus being able to play the role of the Product Owner.55 stories described the required functionality for 4 different user roles (i.e., student records administrative staff, students, teachers, and data administrator), whereas five stories described constraints that had to be obeyed (e.g., the system had to enable remote access to data through the Internet, all outputs should also be printable, etc.).Each story contained a short description and a set of acceptance tests that had to be used to demonstrate that the story had been correctly and fully coded.
The Product Owner divided the stories into 4 groups on the basis of priority.There were 24 'must have', 5 'should have', and 4 'could have' stories required in the first release, which should have been finished by the end of the course.The remaining 27 'won't have this time' stories were specified merely to illustrate the desired functionality in the next release.
At the beginning of the course students were given 12 hours of formal lectures on agile principles, Scrum, and the use of user stories for requirements specification and iteration planning.The first three weeks also served as a preparatory Sprint (Sprint 0) before the start of the project.During Sprint 0 the development environment was prepared and students were given the aforementioned initial Product Backlog.
At the end of Sprint 0 each team was asked to estimate the stories of the first release using planning poker [20] and (considering its estimated velocity) prepare the release plan.A story point was treated as an ideal day of work and the estimates were constrained to specific predefined values of 0.5, 1, 2, 3, 5, 8, 13, and 20 as proposed by Cohn [21].Initial estimates and release plans of all teams were recorded for further analysis.
The rest of the study consisted of three Sprints, each of them lasting 4 weeks.Strictly following the Scrum method each Sprint started with a Sprint planning meeting at which student teams negotiated the contents of the next iteration with the Product Owner, and developed the initial version of the Sprint Backlog.During the Sprint the teams had to meet regularly at the Daily Scrum meetings and maintain their Sprint Backlogs decomposing the user stories into constituent tasks and assigning responsibility for each task.Each student individually estimated how many hours it will take to accomplish each task he/she had accepted.The instructors did not interfere in the distribution of tasks among team members and the estimation of effort, but merely paid attention that the process ran smoothly and everybody obeyed Scrum rules.
At the end of each Sprint the Sprint review and Sprint retrospective meetings took place.At the Sprint review meeting the students presented results of their work to instructors while at the Sprint retrospective meeting students and instructors met to review the work in the previous Sprint, giving suggestions for improvements in the next one.After three Sprints the first release had to be completed and delivered to the customer.
Since it was impossible to expect students to work on the project every day, two Daily Scrum meetings per week were prescribed, one on Monday and the other on Thursday.At the Daily Scrum meeting each team member had to record the number of hours spent and the amount of work remaining for each task he/she was responsible for.When the team finished a story the Product Owner was asked to evaluate its implementation.The Product Owner strictly enforced the concept of 'done', rejecting all stories that did not conform to user requirements.If the shortcomings were not removed by the end of the Sprint a new story was defined in the Product Backlog requiring the completion of missing features in one of the remaining Sprints.
At the end of each Sprint the actual velocity of each team was computed considering only the stories that were accepted by the Product Owner.The unstarted stories and stories that were either rejected or newly defined by the Product Owner were re-estimated in order to create a more realistic plan for subsequent iterations.

Case study results
Results of the study are presented for each Sprint separately in Tables 1 to 3. Data clearly confirm the hypothesis that the plans are less accurate at the beginning, but improve from iteration to iteration.
In the first Sprint the planned velocity estimates were too optimistic and only one team out of 13 (i.e., team T04) actually completed all functionality committed at the Sprint planning meeting.The actual velocity of all other teams was far behind the planned (mean value 11.00, median 8.00).The teams completed on average only 42% (median 35.71%) of story points planned and spent on average much more than one ideal day of work per story point (mean value 27.86, median 15.88 hrs/story point).
Analysis of results at the Sprint retrospective meeting revealed two important reasons for such a great difference between plans and actual achievement: (1) non-compliance with the concept of 'done' and (2) insufficient communication with the Product Owner on the part of students.
Many stories that teams declared completed were rejected either because of the Product Owner's strict insistence on providing fully tested, integrated and usable code or because they did not fully match the user requirements.Some teams complained that the non-compliance with user requirements was due to user stories not being precise enough in describing all the requirements details instead of being aware that the details should be worked out in conversations with the Product Owner.Therefore, all teams were strongly encouraged to increase the communication with the Product Owner during the subsequent Sprints and submit their user stories for review as soon as they were completed, not waiting till the Sprint review meeting.Strictly following the aforementioned recommendations the difference between planned and actual achievement diminished significantly in the second Sprint.The actual velocity more than doubled and (in spite of the fact that the planned velocity was unreasonably high) the teams completed on average 75.18%(median 68.66%) of story points planned.They spent on average 6.85 (median 6.09) hours per story point which was almost in line with the concept of a story point being equal to 6 hours of work.The initial problems and learning curves were to a great extent mastered, and those teams that established good cooperation among team members, improved testing and integration, and delivered regularly user stories for evaluation, fulfilled their plans completely.
In the third Sprint the teams estimated their velocity to be approximately the same as in the second Sprint, which proved to be the right decision (mean value 26.23, median 25.50).The actual achievement was very close to the plan (mean value 23.92, median 23.50).The teams completed on average 91.80% (median 95.83%) of story points planned and 5 teams achieved 100%.Two teams (T03 and T05) completed all the stories planned for the first release even before the Sprint.On the other hand, it became evident that the teams that had not established good internal communication remained far behind the plan (e.g., team T07).
The results of the study show that (in spite of overoptimistic and sometimes unrealistic initial estimates) the ability of estimating and planning quickly improves.Most teams were able to define almost accurate Sprint plans after three Sprints.In the third Sprint the velocity stabilized and the actual achievement almost completely matched the plan.Empirical data also show a continued increase of productivity.These findings can be considered when introducing Scrum into industrial software development.

Students' opinions regarding Scrum practices
Students' opinions regarding the importance of particular Scrum practices for a successful project are gathered in Table 4.Each practice was rated using a 5point Likert scale, the grade 1 indicating the practice was not important and the grade 5 indicating the practice was very important.In order to test the extent to which the students' judgments are consistent, the intra-class correlation coefficient (ICC) was computed using the absolute agreement type of the two-way random effects model.The average measure reliability ICC value was 0.935, indicating that the survey data were reliable enough to be generalized.The one-sample t-test was used to determine how much students' rates deviate from the null hypothesis that their opinions were neutral having the arithmetic mean value of all questions equal to 3. Results in Table 4 show that all hypotheses were rejected; therefore, we can accept the alternative hypothesis that students considered all practices important.
Students rated highest team-work and good communication among team members.Student teams that established good communication and team-work indeed achieved far better results than teams that acted as a group of individuals.
Good communication with Product Owner received the second highest grade which was not a surprise since the Product Owner played a central role in students' projects.Projects' progress to a great extent depended on his timely answers to students' questions and prompt evaluation of user stories.
The concept of 'done' was also rated very highly although we were afraid that the students would perceive the Product Owner's insistence on producing stable and highly reliable code as an unnecessary pedantry.However, it seems that through the project work they recognized that only fully tested code that meets all user requirements can be used in practice.Clarity of requirements specified in the Product Backlog was ranked fourth with an average grade of 4.28 indicating that students consider a well prepared and maintained Product Backlog an important factor affecting the success of the project.During the project students occasionally complained that the user stories should contain a more extensive description.However, we were trying to convince them that the essence of the agile approach is not in writing detailed requirements specifications, but in acquiring missing details through communication with the Product Owner and end users.
The high importance of Sprint review meetings can be deduced from the high grades accorded to communication with the Product Owner and the concept of 'done'.All these practices together enable customers to experience on-time delivery of increments and obtain frequent feedback on how the product really works.
The role of ScrumMaster was also considered important, but not as much as the role of Product Owner.We think this was because the teacher spent much more time playing the role of Product Owner than being the ScrumMaster.As a ScrumMaster he acted merely as a facilitator giving student teams the freedom to self-manage and self-organize as proposed by Scrum.Although he took care that everybody followed Scrum and obeyed its rules this role was less exposed than the role of Product Owner, thus giving an impression of less importance.
The importance of other meetings was rated between 3.72 and which means that these meeting are also considered important, but less than other Scrum practices.We can attribute a slightly lower grade of these meetings to the fact that students often perceive meetings as an unproductive waste of time.
Planning and estimation practices were rated least important (although still statistically significantly above average), which was somewhat of a surprise since the study paid a lot of attention to story estimation and release and Sprint planning.Although the purpose of agile planning is not to produce exact plans we think that students underestimated the importance of this area.There may be several reasons for such opinions.A previous study on students' perceptions of agile methods [22] has shown that students feel least comfortable with planning activities and have low trust in their estimates.Many students also consider estimating and planning unproductive administrative work not being fully aware of the importance of good estimates and plans.

Limitations of the study
From the standpoint of using the results in industry the main limitation of the study is that it was conducted with students in an academic environment.However, in order to increase the degree of validity every effort was made to simulate an industrial environment as closely as possible.User stories were defined on the basis of a real student records information system used at the University of Ljubljana and the study design strictly followed the checklist for integrating student empirical studies with research and teaching goals [12].The Product Owner strictly enforced the concept of 'done' requiring students to produce fully tested and integrated code resistant to user errors.The study relied on senior students enrolled in their last semester, thus blurring the line between these students and novice professionals.A previous study [23] has shown that these students perform similarly to industry personnel.
Another possible threat to validity is that students (due to other courses) could not work a normal workday, but met for a Daily Scrum twice a week.Considering the even distribution of the total course workload over 15 weeks each student was required to perform 6-8 hours (i.e., approximately one day) of work between two consecutive Daily Scrum meetings, thus simulating the real workload of a normal workday.The rest of the time the students could use for other academic duties.Regular execution of the Daily Scrum meetings worked fine encouraging students to work consistently rather than procrastinate.
However, the 3-4 days interval between the meetings provided some room for reallocation of workload allowing students to work more than 8 hours between the two consecutive meetings, which could lead to an uneven distribution of effort over Sprints and skewed the statistics concerning velocity in extreme case.We noticed such an abuse on the part of the team T03, which reallocated a substantial amount of work from Sprint 3 to Sprint 2 in order to complete the project before the end of the course, but this did not affect significantly the study results.
On the other hand, the results of the study in a great deal depended on the proper role of the Product Owner.A knowledgeable and responsive Product Owner contributed a lot to smooth running of students' projects and consequently to better statistics regarding velocity and ability of planning.A non-responsive and/or not knowledgeable enough Product Owner could cause delays and unproductive working periods.

Conclusions
Empirical studies with students as subjects can help industry in providing evidence-driven assessment of new processes, methods, and tools before their introduction in software engineering practice.While most software companies cannot afford extensive experiments, it is not a problem to conduct a study with several teams working on an almost real project within the framework of a software engineering capstone course.In this paper we described an example of such a study that concentrated on (1) the assessment of abilities of estimating and planning when using Scrum for the first time, and (2) gathering students' opinions regarding the importance of particular Scrum practices.
Results of the study have shown that the beginners are able to almost completely grasp Scrum's benefits after a couple of Sprints.Their ability of estimating and planning improved from Sprint to Sprint and after three Sprints almost all teams were able to define accurate iteration plans.The velocity also constantly grew, thus indicating the improvement in productivity.
The study has also revealed the importance of the role of Product Owner.Since the user stories serve merely as a remainder for conversation all user requirements details should be clarified in communication with the Product Owner.In order to assure smooth running of a Scrum project it is important that the Product Owner provides timely answers to questions regarding details of user stories, and makes quick evaluations of work completed strictly enforcing the concept of a user story being 'done'.
Students were overwhelmingly positive about the course because it enabled them to learn agile methods using project oriented approach, which also proved to be successful in other areas of engineering, e.g., [24,25].We describe a case study that was conducted at the University of Ljubljana with the aim of studying the behavior of development teams using Scrum for the first time, i.e., a situation typical for software companies trying to introduce Scrum into their development process.13 student teams were required to develop an almost real project strictly using Scrum.The data on project management activities were collected in order to measure the amount of work completed, compliance with the release and iteration plans, and ability of effort estimation, thus contributing to evidence-based assessment of the typical Scrum processes.It was found that the initial plans and effort estimates were over-optimistic, but the abilities of estimating and planning improved from Sprint to Sprint.Most teams were able to define almost accurate Sprint plans after three Sprints.In the third Sprint the velocity stabilized and the actual achievement almost completely matched the plan.Bibl.25, tabl.4 (in English; abstracts in English and Lithuanian).Aprašoma "Scrum" projektų valdymo sistema.Ši sistema naudojama Liublianos universitete programinės įrangos kūrimo procesui organizuoti.Beveik realiems projektams įgyvendinti buvo sudaryta trylika studentų komandų.Bandant nustatyti atitiktį "Scrum" procesams organizuoti buvo renkami įvairūs duomenys (darbo pabaigimo lygis, atitikimas planams ir kt.).Nustatyta, kad, bandant ketvirtą kartą, pagal "Scrum" metodologiją procesą galima organizuoti be klaidų.Bibl.25, lent.4 (anglų kalba; santraukos anglų ir lietuvių k.).

Table 1 .
Planned and actual achievement in Sprint 1

Table 2 .
Planned and actual achievement in Sprint 2

Table 3 .
Planned and actual achievement in Sprint 3

Table 4 .
Students' opinions regarding the importance of Scrum practices (N=51)