The Sprint Methodology in Agile Project Management
(→Data Science Life Cycle) |
|||
(161 intermediate revisions by one user not shown) | |||
Line 1: | Line 1: | ||
+ | ''Developed by Sarah Bourdiaux Terp'' | ||
+ | |||
+ | |||
== Abstract == | == Abstract == | ||
− | + | Working in sprints facilitates a quick and continuous review of results allowing regular feedback and thereby keeping on the right path towards a successful project for all parties involved. This eventually also helps ensuring that the project continues being profitable as well as maintaining the interest and momentum towards stakeholders, which can be critical towards the funding of the project <ref name=" Sprints " />. This rapid iterativeness thus challenges the traditional way of conducting projects, also known as waterfall project management. Traditional project management is characterized by more robustness and formalities involving very thorough preliminary planning as well as protracted requirements definition. The long lasting requirements definition often results in outdated requirements before the project development even has begun. Agile project management (APM) was thus developed to meet the needs of a new type of project where the development phase is more crucial than the planning phase, namely within information systems and technology projects <ref name=" Scrum " />. | |
− | + | Sprints are part of the APM framework scrum, which is typically applied within IT projects, software development projects and projects of a more exploratory nature like data science projects. These types of projects are affected by the constant technological development, which makes sprints an appropriate way of managing them, as this methodology helps responding to high levels of change and uncertainty <ref name=" PMBOK " /> <ref name=" PRINCE " />. | |
− | |||
− | + | This article investigates relevant aspects of the sprint methodology within data science projects. It provides concrete and hands-on recommendations to project managers or scrum masters of data science projects who are about to plan and conduct sprints with their team. | |
− | + | ==Motivation== | |
− | + | According to PRINCE 2, a project is "a temporary organization that is created for the purpose of delivering one or more business products according to an agreed Business Case."<ref name=" PRINCE " />. To deliver such business products, a project needs to be managed with respect to many aspects, defined as "the planning, delegating, monitoring and control of all aspects of the project, and the motivation of those involved, to achieve the project objectives within the expected performance targets for time, cost, quality, scope, benefits and risks.", according to PRINCE 2 <ref name=" PRINCE " />. | |
− | In the traditional predictive life cycle, the products and projects' deliverables are clearly defined at the beginning with little room for changes in the scope of the project without needing careful managing. The project scope, costs and time are thus defined as early in the process as possible. This requires that a project is definable within all | + | Projects go through a series of phases, also known as a project's life cycle. These phases can differ from project to project but are typically broken down by deliverables or milestones. In general, the phases can, according to the PMBOK® Guide, be summed down to the starting of the project, the organizing and preparing, the carrying out of project work and eventually the closing of the project <ref name=" PMBOK " />. |
− | + | In the traditional predictive life cycle, the products and projects' deliverables are clearly defined at the beginning with little room for changes in the scope of the project without needing careful managing. The project scope, costs and time are thus defined as early in the process as possible. This requires that a project is definable within all these aspects at an early stage and is characterized by low levels of change and uncertainty. However, some projects are characterized by such an amount of uncertainty that the early defining of such aspects simply is too difficult, which makes traditional project management and predictive life cycles inadequate. This is where APM and adaptive life cycles like sprints become relevant and challenge the traditional project life cycle as the one defined by the PMBOK® Guide <ref name=" PMBOK " />. New demands to projects thus require new methodologies and frameworks. | |
− | + | ||
− | = | + | Taking offset in agile software development, APM takes emphasis in two main concepts that helps defining how project teams can adapt rapidly to these unpredictable and changing requirements. First, risk is minimized when the team focuses on short iterations with clearly defined deliverables. Secondly, the feedback loop created by these short iterations facilitates the direct communication with partners during the development process, which helps avoiding unnecessary documentation and bureaucracy <ref name=" Scrum " />. |
− | + | == Agile Project Management == | |
− | [[File:APM_overview2.png|thumb| | + | ===Scrum Roles and Process === |
+ | [[File:APM_overview2.png|thumb|right|451x418px|Figure 1: Agile Project Management Frameworks]] | ||
− | + | Adaptive life cycles like sprints are characterized by being very rapid iterations of approximately 2-4 weeks. Each sprint is fixed in time and cost at its beginning. Here, the product backlog list, also known as project requirements, is also reviewed in order to determine, which of the requirements that can be delivered within the next iteration <ref name=" PMBOK " />. | |
− | + | ||
− | + | Within APM, a large number of different but useful frameworks exists. Three of the most frequently used ones, namely the scrum, lean and kanban frameworks, are illustrated in figure 1. | |
+ | Sprints are part of the scrum framework, which is mainly used within IT projects <ref name=" Sprints " />. The scrum consists of three major components: the roles, the process and the artifacts. | ||
[[File:scrums.png|thumb|right|451x418px|Figure 2: Scrum Activities]] | [[File:scrums.png|thumb|right|451x418px|Figure 2: Scrum Activities]] | ||
− | + | Starting with the roles, the scrum team is cross-functional and consists of a product owner, a scrum master and a scrum team. This cross-functional constellation works full-time on the project in order to get through the product backlog list in due time. The second major scrum component, namely the process, counts five overall activities: the kick-off, the sprint planning meeting, the sprint, the daily scrum, and the sprint review meeting, which all are illustrated in figure 2 <ref name=" Scrum " />. The third major scrum component, the artifacts, include the product backlog, the sprint backlog and the burn-down charts, which will be elaborated further in the ''Scrum Artifacts'' section. | |
− | + | At the beginning of a project, the product owner, the scrum master and the scrum team all meet together to kick it off. This meeting occurs only once during a project. At this kick-off meeting, the high-level backlog for the project, which is the overall project scope and requirements, as well as its major goals are defined. Subsequently, the same participants all meet again at the beginning of each sprint for a sprint planning meeting. Here, they define the product backlog list i.e. the decomposition of the high-level backlog. Furthermore, they determine the sprint goal, which is the formal outcome of the sprint. Unlike the kick-off meeting, the sprint planning meeting is held at the beginning of each sprint within a project. | |
− | + | ||
− | + | ||
− | = | + | After defining these two main parts, the participants consequently outline the sprint backlog. These sprint planning meetings will usually take up to a day <ref name=" Scrum " />. Having planned the sprint, its backlog and goals, the team is now ready to explore the sprint’s research topics. |
− | === | + | ===Scrum Artifacts=== |
+ | [[File:burn-down.png|thumb|right|500x500px|Figure 3: Examples of burn-down charts made in Excel (Inspired by <ref name=" fig1 " /> <ref name=" fig2 " /> <ref name=" fig3 " />)]] | ||
− | + | As mentioned, the scrum artifacts include the product and the sprint backlog as well as the burn-down charts. The product and sprint backlogs are defined during the sprint planning meetings. The product backlog is defined with the product owner whereas the sprint backlog is defined only by the scrum team, since this type of backlog is more specific to the individual sprint <ref name=" Scrum " />. | |
− | + | The use of burn-down charts is intentionally applied within scrum unlike in traditional project management. Burn-down charts are simple two-dimensional charts showing the progress of a given process. The purpose of using burn-down charts is to provide relevant information about these progress in an easy-to-comprehend manner. This is especially relevant during scrum and sprints, where time is a limited resource and needs to be micro-managed carefully <ref name=" Scrum " />. Within burn-down charts, three types are commonly used. The sprint burn-down chart documents the progress of the sprint. | |
− | + | The release burn-down chart documents the progress of the release and the product burn-down chart documents the overall project progress <ref name=" Scrum " />. An illustration of how these three types of burn-down charts can look like can be seen in figure 3. | |
− | + | ||
− | + | As the examples in figure 3 illustrate, each task is typically represented on the x-axis in terms of time with the duration on the y-axis. The sprint burn-down chart is on a very operational level, showing the day-to-day progress by depicting the total backlog hours remaining in the sprint per day. These total backlog hours are estimated based on the amount of time left in the sprint. The sprint burn-down chart would thus decrease to zero hours remaining by the end of the sprint. | |
− | + | The release burn-down chart works in a similar way, however, it represents the remaining time until the release will be done. Finally, the product burn-down chart is thus used to indicate the overall project progress on a higher level <ref name=" Scrum " />. | |
− | + | These burn-down charts can thus help the scrum team and especially the scrum master keeping an easy overview of how the project is progressing and on a more specific level also get a day-to-day overview of how each sprint is progressing. | |
− | + | ===Sprint Methodology=== | |
− | + | What characterizes sprints, besides being short iterations of maximum four weeks, is that no external interference should influence the work of the scrum team. This also means that project requirements cannot be changed during a sprint. In general, it is recommended to hold short daily stand-up meetings of 15 minutes, known as scrum meetings, during a sprint. Here, relevant matters such as what has been done since last scrum meeting and what will be done within next scrum meeting can be discussed. | |
− | + | The purpose of these daily meetings is to hold track of how the sprint and thereof the project is progressing. This can help the scrum master keeping track of the team and ensure as efficient a sprint as possible <ref name=" Scrum " />. The meetings are short and informal leaving no room for problem solving or thorough explanations, hence the importance of standing up during these meetings. Standing up can help the team being focused and remove any obstacles to progress <ref name=" Harvey " />. | |
− | + | Within data science projects, a number of five additional meetings are recommended to be held within each sprint. These meetings will be discussed further in the Application section. After each sprint is finished, it is reviewed during the sprint review meeting. Here, the sprint product is demonstrated to the product owner. | |
− | + | ==Application== | |
− | + | ||
− | + | ===Data Science Life Cycle Sprints=== | |
+ | Depending on the project type, a number of different adaptive life cycles exist. Data science projects differ from software development and data mining projects as they are of a more experimental and exploratory nature and thus need a less rigid project life cycle. In data science projects, the team learns from data and adjusts its work to the results accordingly. The amount of planning and interdependencies between the project phases in the software development life cycle (SDLC) and data mining life cycle (CRISP-DM) are thus not suited for data science projects. The data science life cycle (DSLC) sprint is better suited to data science projects due to its more lightweight and agile nature. It consists of six phases, taking offset in the scientific method <ref name=" DSLC " />. These six phases are: | ||
− | + | * Identify | |
+ | * Question | ||
+ | * Research | ||
+ | * Results | ||
+ | * Insights | ||
+ | * Learn | ||
− | + | [[File:data_science.png|thumb|right|451x418px|Figure 4: Data Science Life Cycle (Inspired by D. Rose (2016) <ref name=" DSLC " />)]] | |
− | + | Unlike in the SDLC where each phase leads to the next, a
team can “cycle” through
the questioning, researching and
results phases in each DSLC sprint. This mechanism is illustrated by arrows in figure 4. This inner cycle provides
data science projects flexibility and
agility. In the DSLC phases, the scrum team starts by identifying
the key roles and players in a given
context in the identification phase. Then, in the questioning phase of the DSLC sprint, the team starts asking relevant questions about these key players in order to explore the data the best way possible. To help the team identifying all the right questions, the use of a question board is recommended. The question board allows the rest of the organization to contribute with its questions. The board should be easily accessible for all to add a post-it with a question when walking by <ref name=" Sprints " />. | |
+ | During the research phase, the team’s data analysts explore the data based on the identified players and questions, also known as research topics. Here, the rest of the team will work closely with the data analysts to make sure that there is compliance with the desired outputs. When the data analysts have finished exploring data within the given research topics, a short report with the results from each research topic is produced during the results phase. These short reports can subsequently be used if a research topic and its results provide a solid basis for further research. When all research topics have been explored, the team evaluates the results and assesses if they provide any valuable and interesting insights. Based on these insights, the team will in the last phase try to create organizational knowledge, which eventually can provide actual value to the rest of the organization showing what has been learned from the research in the sprint <ref name=" DSLC " />. | ||
+ | ===Data Science Life Cycle Sprint Meetings=== | ||
+ | In order to run scrums efficiently, it is necessary to manage sprints carefully. A way to ensure that the sprint is on track is to hold regular meetings. For scrum teams in data science projects, it is recommended to run five different types of meetings during a sprint, which is recommended to last no longer than two weeks for data science projects <ref name=" Sprints " />. These five meetings are: | ||
+ | * Research Planning | ||
+ | * Question Breakdown | ||
+ | * Visualization Design | ||
+ | * Storytelling Session | ||
+ | * Team Improvement | ||
+ | The meetings are to be held together for every sprint to make sure that the scrum team provides interesting insights at the end of the sprint. These five types of meetings have different purposes and they all together ensure that each of the six phases of a DSLC sprint is covered properly, as is illustrated in figure 5. It is essential that these five sprint meetings are all time-boxed. | ||
+ | A meeting is time-boxed when the team agrees upon the duration of the meeting prior to initiating it. What will be agreed during this meeting will thus have to last for the rest of the sprint, since time-boxed meetings cannot be rescheduled or followed-up upon. This helps ensuring that a sprint is rapid and agile, focusing on exploring and experimenting rather than planning and organizing <ref name=" Sprints " />. | ||
+ | [[File:sprint_meetings.png|thumb|right|451x418px|Figure 5: Data Science Sprint Meetings (Inspired by D. Rose (2016) <ref name=" Sprints " />]] | ||
+ | ====Research Planning==== | ||
+ | Starting with the first sprint meeting, namely the Research Planning meeting, the scrum team evaluates all the questions that have been identified during the identification phase and on the question board. During this typically two-hour long meeting, the team selects the most interesting questions and the data analysts and scrum master subsequently prepares the sprint agenda. The sprint agenda is thus based on a compromise between the data analysts and scrum master to ensure minimum viable research topics exploring data but avoiding spending unnecessary time on it <ref name=" Sprints " />. | ||
+ | ====Question Breakdown==== | ||
+ | During the Question Breakdown
meeting, the scrum team gathers to review any new questions
from the question board and try to
come up with some new ones. This
type of meeting is also to identify potential clusters in the questions and whether any questions can be broken down into easier manageable questions. It is recommended to have at least two one-hour sessions of such meetings during a sprint. These meetings ensure a constant flow in the research topics and that the whole team is in compliance. It is also a good setting for prioritizing all questions and preparing the following sprint <ref name=" Sprints " />. | ||
+ | Together with the Research Planning meetings, the Question Breakdown meetings are thus ways of managing the first two phases of the DSLC sprint: the identifying and questioning phase. | ||
+ | ====Visualization Design==== | ||
+ | During the third meeting, the Visualization Design meeting, the data analysts and scrum master develop interesting visualizations together based on the results from the data analysts’ experimentations with data. The meeting should not last more than an hour and the visualizations can remain as a rough draft <ref name=" Sprints " />. These visualizations will subsequently be used in the Storytelling Session. | ||
+ | The Visualization Design is thus a meeting that helps managing the sprint results from the DSLC result phase and creating valuable insights during the DSLC insights phase. | ||
+ | ====Storytelling Session==== | ||
+ | Based on the visualizations done by the data analysts and scrum master, the scrum team meets at a Storytelling Session to present the story of what they have learned during the sprint. The meeting should last an hour, in which data visualizations are shown, questions from the question board are discussed and the stories behind the questions are told <ref name=" Sprints " />. | ||
+ | The Storytelling Session is thus a meeting during the DSLC learning phase that helps transmitting the knowledge gained through the sprint. | ||
+ | ====Team Improvement==== | ||
+ | At the end of the sprint, the scrum team meets for a two-hour improvement meeting where the team progress is discussed and evaluated. Here, potential changes can be agreed upon before the next sprint and this helps ensuring that the team is in constant progress and learning fast from its mistakes and successes <ref name=" Sprints " />. | ||
+ | == Limitations == | ||
+ | Running a project in sprints is an ideal methodology to use when projects are of an exploratory nature or simply require fast development and deployment <ref name=" Sprints " />. This perception of failing fast allows project teams to develop ideas and test them fast without having to spend too much time on planning and waiting unnecessarily long for high-level approval. Therefore, the overall focus is on developing and delivering minimum viable products and working further on them if they are a success and thus living up to the sprint goals and especially the project goals <ref name=" Scrum " /> <ref name=" Pitfall " />. | ||
+ | However, even though APM methodologies are said to minimize risks, there are some limitations to be aware of both as a team and scrum master of DSLC sprint projects <ref name=" Scrum " />. Taking offset in the PMBOK Guide and relevant literature, two main limitations will be discussed. | ||
+ | '''No interfering''' | ||
+ | Since sprints are very limited in time, and especially DSLC sprints <ref name=" DSLC " />, it is crucial that the team works on the sprint backlog exclusively <ref name=" Scrum " />. This can be a challenge, especially in larger organizations where the risk of getting interrupted is very high due to the number of employees. Scrum teams are cross-functional and thus not teams that are necessarily used to working together. Therefore, other employees in the departments may not be aware of these sprints and reaching out to their colleagues for help as they may be used to. It is therefore important that the scrum master emphasizes the importance and priority of the sprint backlog and helps communicating this to team members' direct managers if necessary. | ||
+ | '''No objectives''' | ||
+ | As DSLC sprints are mainly recommended when working on exploratory projects like data science projects, the team needs to adjust their usual working habits from working towards specific objectives to having to develop research topics based on relevant questions during the Research Planning meeting <ref name=" Pitfall " />. These research topics will then serve as requirements in the sprint backlog but as data science projects are exploratory, there is no way of defining the objectives prior to having explored the data. | ||
+ | This way of working thus challenges the usual perception of work where there are clear objectives that have to be met before the project can come to an end <ref name=" PMBOK " />. As the PMBOK® Guide describes, a project end is reached when "the project's objectives have been achieved or when the project is terminated because its objectives will not or cannot be met..." <ref name=" PMBOK " />. | ||
+ | This lack of specific objectives thus makes DSLC sprints unsuitable for other types of projects where there are some clear deliverables that need to be done. Such projects could, for example, be construction projects where there are a very clear start and end to the projects as well as clearly defined deliverables and success criteria. Also, these types of projects require a great deal of planning and funding, which is contradictory to the nature of DSLC projects <ref name=" DSLC " />. | ||
+ | DSLC sprints are not explicitly part of well-renowned standards like the PMBOK® Guide or PRINCE 2, however, they are a subcategory of the adaptive and agile life cycle projects as described in the PMBOK® Guide <ref name=" PMBOK " />. These project life cycles all take offset in the APM methodology and DSLC sprints are thus an extension of the adaptive life cycles. | ||
+ | == Glossary == | ||
+ | '''APM''' - Agile Project Management | ||
+ | '''DSLC''' - Data Science Life Cycle | ||
+ | '''SDLC''' - Software Development Life Cycle | ||
+ | '''CRISP-DM''' - Cross Industry Standard Process for Data Mining | ||
+ | == Annotated Bibliography == | ||
+ | '''Rose D. (2016) Data Science: Create Teams That Ask the Right Questions and Deliver Real Value In: Data Science. Apress, Berkeley, CA:''' This book elaborates in details how to run data science projects successfully. This wiki article has primarily focused on chapters on how to deliver successful data science projects but the book also covers a lot of other relevant aspects. The first section defines what data science is. The second digs into how to build a well-functioning data science team and the last part focuses on how to ask the right questions during the Question Breakdown meetings. | ||
+ | '''Project Management Institute (2013) A Guide to the Project Management Body of Knowledge (PMBOK® Guide). In: Project Management Institute:''' This guide provides widely accepted guidelines, rules and characteristics for project, program and portfolio management. In this wiki article, the PMBOK® Guide has primarily been used for its section on project life cycles, however, it covers many more relevant subjects within project, program and portfolio management. The PMBOK® Guide for example also provides extensive guidelines within Project Scope, Time, Cost and Quality Management as well as Project Risk Management. | ||
− | + | '''H. Frank Cervone (2011) Understanding agile project management methods using Scrum:''' This article provides a very clear understanding of what scrums are and how they relate to agile project management. Some details are addressed in this wiki article but further knowledge is to be found for the interested reader. It elaborates the origin of agile project management in the agile software development movement and provides tangible tools for managing scrum projects. | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
== References == | == References == | ||
− | + | <references> | |
− | + | ||
− | + | ||
− | + | <ref name="Sprints"> Rose D. (2016) Data Science: Create Teams That Ask the Right Questions and Deliver Real Value - Chp. 13: Working in Sprints. In: Data Science. Apress, Berkeley, CA </ref> | |
− | <ref name="Sprints"> Rose D. (2016) Working in Sprints. In: Data Science. Apress, Berkeley, CA </ref> | + | |
<ref name="Harvey"> Maylor H. (2010) Project Management. In: Financial Times Prentice Hall </ref> | <ref name="Harvey"> Maylor H. (2010) Project Management. In: Financial Times Prentice Hall </ref> | ||
<ref name="PMBOK"> Project Management Institute (2013) A Guide to the Project Management Body of Knowledge (PMBOK® Guide). In: Project Management Institute </ref> | <ref name="PMBOK"> Project Management Institute (2013) A Guide to the Project Management Body of Knowledge (PMBOK® Guide). In: Project Management Institute </ref> | ||
<ref name="PRINCE"> Office Of Government Commerce (2009) Managing Successful Projects with PRINCE2™. In: TSO </ref> | <ref name="PRINCE"> Office Of Government Commerce (2009) Managing Successful Projects with PRINCE2™. In: TSO </ref> | ||
<ref name="Scrum"> H. Frank Cervone (2011) Understanding agile project management methods using Scrum. In: OCLC Systems & Services: International digital library perspectives </ref> | <ref name="Scrum"> H. Frank Cervone (2011) Understanding agile project management methods using Scrum. In: OCLC Systems & Services: International digital library perspectives </ref> | ||
− | <ref name="DSLC"> Rose D. (2016) Using a Data Science Life Cycle. In: Data Science. Apress, Berkeley, CA </ref> | + | <ref name="DSLC"> Rose D. (2016) Data Science: Create Teams That Ask the Right Questions and Deliver Real Value - Chp. 12: Using a Data Science Life Cycle. In: Data Science. Apress, Berkeley, CA </ref> |
+ | <ref name="fig1"> Inspiration for Sprint burn-down chart: http://www.sw-engineering-candies.com/blog-1/howtouseproduct-burndown-chartsandsprint-burndown-chartsnotonlyinscrumprojects </ref> | ||
+ | <ref name="fig2"> Inspiration for Release burn-down chart: http://www.softwaretestingstudio.com/burndown-chart-agile-scrum/ </ref> | ||
+ | <ref name="fig3"> Inspiration for Product burn-down chart: https://www.scrum-institute.org/Burndown_Chart.php </ref> | ||
+ | <ref name="Pitfall"> Rose D. (2016) Data Science: Create Teams That Ask the Right Questions and Deliver Real Value - Chp. 14: Avoiding Pitfalls in Delivering in Data Science Sprints. In: Data Science. Apress, Berkeley, CA </ref> | ||
</references> | </references> |
Latest revision as of 18:40, 16 November 2018
Developed by Sarah Bourdiaux Terp
Contents |
[edit] Abstract
Working in sprints facilitates a quick and continuous review of results allowing regular feedback and thereby keeping on the right path towards a successful project for all parties involved. This eventually also helps ensuring that the project continues being profitable as well as maintaining the interest and momentum towards stakeholders, which can be critical towards the funding of the project [1]. This rapid iterativeness thus challenges the traditional way of conducting projects, also known as waterfall project management. Traditional project management is characterized by more robustness and formalities involving very thorough preliminary planning as well as protracted requirements definition. The long lasting requirements definition often results in outdated requirements before the project development even has begun. Agile project management (APM) was thus developed to meet the needs of a new type of project where the development phase is more crucial than the planning phase, namely within information systems and technology projects [2].
Sprints are part of the APM framework scrum, which is typically applied within IT projects, software development projects and projects of a more exploratory nature like data science projects. These types of projects are affected by the constant technological development, which makes sprints an appropriate way of managing them, as this methodology helps responding to high levels of change and uncertainty [3] [4].
This article investigates relevant aspects of the sprint methodology within data science projects. It provides concrete and hands-on recommendations to project managers or scrum masters of data science projects who are about to plan and conduct sprints with their team.
[edit] Motivation
According to PRINCE 2, a project is "a temporary organization that is created for the purpose of delivering one or more business products according to an agreed Business Case."[4]. To deliver such business products, a project needs to be managed with respect to many aspects, defined as "the planning, delegating, monitoring and control of all aspects of the project, and the motivation of those involved, to achieve the project objectives within the expected performance targets for time, cost, quality, scope, benefits and risks.", according to PRINCE 2 [4].
Projects go through a series of phases, also known as a project's life cycle. These phases can differ from project to project but are typically broken down by deliverables or milestones. In general, the phases can, according to the PMBOK® Guide, be summed down to the starting of the project, the organizing and preparing, the carrying out of project work and eventually the closing of the project [3]. In the traditional predictive life cycle, the products and projects' deliverables are clearly defined at the beginning with little room for changes in the scope of the project without needing careful managing. The project scope, costs and time are thus defined as early in the process as possible. This requires that a project is definable within all these aspects at an early stage and is characterized by low levels of change and uncertainty. However, some projects are characterized by such an amount of uncertainty that the early defining of such aspects simply is too difficult, which makes traditional project management and predictive life cycles inadequate. This is where APM and adaptive life cycles like sprints become relevant and challenge the traditional project life cycle as the one defined by the PMBOK® Guide [3]. New demands to projects thus require new methodologies and frameworks.
Taking offset in agile software development, APM takes emphasis in two main concepts that helps defining how project teams can adapt rapidly to these unpredictable and changing requirements. First, risk is minimized when the team focuses on short iterations with clearly defined deliverables. Secondly, the feedback loop created by these short iterations facilitates the direct communication with partners during the development process, which helps avoiding unnecessary documentation and bureaucracy [2].
[edit] Agile Project Management
[edit] Scrum Roles and Process
Adaptive life cycles like sprints are characterized by being very rapid iterations of approximately 2-4 weeks. Each sprint is fixed in time and cost at its beginning. Here, the product backlog list, also known as project requirements, is also reviewed in order to determine, which of the requirements that can be delivered within the next iteration [3].
Within APM, a large number of different but useful frameworks exists. Three of the most frequently used ones, namely the scrum, lean and kanban frameworks, are illustrated in figure 1. Sprints are part of the scrum framework, which is mainly used within IT projects [1]. The scrum consists of three major components: the roles, the process and the artifacts.
Starting with the roles, the scrum team is cross-functional and consists of a product owner, a scrum master and a scrum team. This cross-functional constellation works full-time on the project in order to get through the product backlog list in due time. The second major scrum component, namely the process, counts five overall activities: the kick-off, the sprint planning meeting, the sprint, the daily scrum, and the sprint review meeting, which all are illustrated in figure 2 [2]. The third major scrum component, the artifacts, include the product backlog, the sprint backlog and the burn-down charts, which will be elaborated further in the Scrum Artifacts section.
At the beginning of a project, the product owner, the scrum master and the scrum team all meet together to kick it off. This meeting occurs only once during a project. At this kick-off meeting, the high-level backlog for the project, which is the overall project scope and requirements, as well as its major goals are defined. Subsequently, the same participants all meet again at the beginning of each sprint for a sprint planning meeting. Here, they define the product backlog list i.e. the decomposition of the high-level backlog. Furthermore, they determine the sprint goal, which is the formal outcome of the sprint. Unlike the kick-off meeting, the sprint planning meeting is held at the beginning of each sprint within a project.
After defining these two main parts, the participants consequently outline the sprint backlog. These sprint planning meetings will usually take up to a day [2]. Having planned the sprint, its backlog and goals, the team is now ready to explore the sprint’s research topics.
[edit] Scrum Artifacts
As mentioned, the scrum artifacts include the product and the sprint backlog as well as the burn-down charts. The product and sprint backlogs are defined during the sprint planning meetings. The product backlog is defined with the product owner whereas the sprint backlog is defined only by the scrum team, since this type of backlog is more specific to the individual sprint [2].
The use of burn-down charts is intentionally applied within scrum unlike in traditional project management. Burn-down charts are simple two-dimensional charts showing the progress of a given process. The purpose of using burn-down charts is to provide relevant information about these progress in an easy-to-comprehend manner. This is especially relevant during scrum and sprints, where time is a limited resource and needs to be micro-managed carefully [2]. Within burn-down charts, three types are commonly used. The sprint burn-down chart documents the progress of the sprint. The release burn-down chart documents the progress of the release and the product burn-down chart documents the overall project progress [2]. An illustration of how these three types of burn-down charts can look like can be seen in figure 3.
As the examples in figure 3 illustrate, each task is typically represented on the x-axis in terms of time with the duration on the y-axis. The sprint burn-down chart is on a very operational level, showing the day-to-day progress by depicting the total backlog hours remaining in the sprint per day. These total backlog hours are estimated based on the amount of time left in the sprint. The sprint burn-down chart would thus decrease to zero hours remaining by the end of the sprint.
The release burn-down chart works in a similar way, however, it represents the remaining time until the release will be done. Finally, the product burn-down chart is thus used to indicate the overall project progress on a higher level [2].
These burn-down charts can thus help the scrum team and especially the scrum master keeping an easy overview of how the project is progressing and on a more specific level also get a day-to-day overview of how each sprint is progressing.
[edit] Sprint Methodology
What characterizes sprints, besides being short iterations of maximum four weeks, is that no external interference should influence the work of the scrum team. This also means that project requirements cannot be changed during a sprint. In general, it is recommended to hold short daily stand-up meetings of 15 minutes, known as scrum meetings, during a sprint. Here, relevant matters such as what has been done since last scrum meeting and what will be done within next scrum meeting can be discussed.
The purpose of these daily meetings is to hold track of how the sprint and thereof the project is progressing. This can help the scrum master keeping track of the team and ensure as efficient a sprint as possible [2]. The meetings are short and informal leaving no room for problem solving or thorough explanations, hence the importance of standing up during these meetings. Standing up can help the team being focused and remove any obstacles to progress [8].
Within data science projects, a number of five additional meetings are recommended to be held within each sprint. These meetings will be discussed further in the Application section. After each sprint is finished, it is reviewed during the sprint review meeting. Here, the sprint product is demonstrated to the product owner.
[edit] Application
[edit] Data Science Life Cycle Sprints
Depending on the project type, a number of different adaptive life cycles exist. Data science projects differ from software development and data mining projects as they are of a more experimental and exploratory nature and thus need a less rigid project life cycle. In data science projects, the team learns from data and adjusts its work to the results accordingly. The amount of planning and interdependencies between the project phases in the software development life cycle (SDLC) and data mining life cycle (CRISP-DM) are thus not suited for data science projects. The data science life cycle (DSLC) sprint is better suited to data science projects due to its more lightweight and agile nature. It consists of six phases, taking offset in the scientific method [9]. These six phases are:
- Identify
- Question
- Research
- Results
- Insights
- Learn
Unlike in the SDLC where each phase leads to the next, a team can “cycle” through the questioning, researching and results phases in each DSLC sprint. This mechanism is illustrated by arrows in figure 4. This inner cycle provides data science projects flexibility and agility. In the DSLC phases, the scrum team starts by identifying the key roles and players in a given context in the identification phase. Then, in the questioning phase of the DSLC sprint, the team starts asking relevant questions about these key players in order to explore the data the best way possible. To help the team identifying all the right questions, the use of a question board is recommended. The question board allows the rest of the organization to contribute with its questions. The board should be easily accessible for all to add a post-it with a question when walking by [1].
During the research phase, the team’s data analysts explore the data based on the identified players and questions, also known as research topics. Here, the rest of the team will work closely with the data analysts to make sure that there is compliance with the desired outputs. When the data analysts have finished exploring data within the given research topics, a short report with the results from each research topic is produced during the results phase. These short reports can subsequently be used if a research topic and its results provide a solid basis for further research. When all research topics have been explored, the team evaluates the results and assesses if they provide any valuable and interesting insights. Based on these insights, the team will in the last phase try to create organizational knowledge, which eventually can provide actual value to the rest of the organization showing what has been learned from the research in the sprint [9].
[edit] Data Science Life Cycle Sprint Meetings
In order to run scrums efficiently, it is necessary to manage sprints carefully. A way to ensure that the sprint is on track is to hold regular meetings. For scrum teams in data science projects, it is recommended to run five different types of meetings during a sprint, which is recommended to last no longer than two weeks for data science projects [1]. These five meetings are:
- Research Planning
- Question Breakdown
- Visualization Design
- Storytelling Session
- Team Improvement
The meetings are to be held together for every sprint to make sure that the scrum team provides interesting insights at the end of the sprint. These five types of meetings have different purposes and they all together ensure that each of the six phases of a DSLC sprint is covered properly, as is illustrated in figure 5. It is essential that these five sprint meetings are all time-boxed.
A meeting is time-boxed when the team agrees upon the duration of the meeting prior to initiating it. What will be agreed during this meeting will thus have to last for the rest of the sprint, since time-boxed meetings cannot be rescheduled or followed-up upon. This helps ensuring that a sprint is rapid and agile, focusing on exploring and experimenting rather than planning and organizing [1].
[edit] Research Planning
Starting with the first sprint meeting, namely the Research Planning meeting, the scrum team evaluates all the questions that have been identified during the identification phase and on the question board. During this typically two-hour long meeting, the team selects the most interesting questions and the data analysts and scrum master subsequently prepares the sprint agenda. The sprint agenda is thus based on a compromise between the data analysts and scrum master to ensure minimum viable research topics exploring data but avoiding spending unnecessary time on it [1].
[edit] Question Breakdown
During the Question Breakdown meeting, the scrum team gathers to review any new questions from the question board and try to come up with some new ones. This type of meeting is also to identify potential clusters in the questions and whether any questions can be broken down into easier manageable questions. It is recommended to have at least two one-hour sessions of such meetings during a sprint. These meetings ensure a constant flow in the research topics and that the whole team is in compliance. It is also a good setting for prioritizing all questions and preparing the following sprint [1].
Together with the Research Planning meetings, the Question Breakdown meetings are thus ways of managing the first two phases of the DSLC sprint: the identifying and questioning phase.
[edit] Visualization Design
During the third meeting, the Visualization Design meeting, the data analysts and scrum master develop interesting visualizations together based on the results from the data analysts’ experimentations with data. The meeting should not last more than an hour and the visualizations can remain as a rough draft [1]. These visualizations will subsequently be used in the Storytelling Session.
The Visualization Design is thus a meeting that helps managing the sprint results from the DSLC result phase and creating valuable insights during the DSLC insights phase.
[edit] Storytelling Session
Based on the visualizations done by the data analysts and scrum master, the scrum team meets at a Storytelling Session to present the story of what they have learned during the sprint. The meeting should last an hour, in which data visualizations are shown, questions from the question board are discussed and the stories behind the questions are told [1].
The Storytelling Session is thus a meeting during the DSLC learning phase that helps transmitting the knowledge gained through the sprint.
[edit] Team Improvement
At the end of the sprint, the scrum team meets for a two-hour improvement meeting where the team progress is discussed and evaluated. Here, potential changes can be agreed upon before the next sprint and this helps ensuring that the team is in constant progress and learning fast from its mistakes and successes [1].
[edit] Limitations
Running a project in sprints is an ideal methodology to use when projects are of an exploratory nature or simply require fast development and deployment [1]. This perception of failing fast allows project teams to develop ideas and test them fast without having to spend too much time on planning and waiting unnecessarily long for high-level approval. Therefore, the overall focus is on developing and delivering minimum viable products and working further on them if they are a success and thus living up to the sprint goals and especially the project goals [2] [10].
However, even though APM methodologies are said to minimize risks, there are some limitations to be aware of both as a team and scrum master of DSLC sprint projects [2]. Taking offset in the PMBOK Guide and relevant literature, two main limitations will be discussed.
No interfering
Since sprints are very limited in time, and especially DSLC sprints [9], it is crucial that the team works on the sprint backlog exclusively [2]. This can be a challenge, especially in larger organizations where the risk of getting interrupted is very high due to the number of employees. Scrum teams are cross-functional and thus not teams that are necessarily used to working together. Therefore, other employees in the departments may not be aware of these sprints and reaching out to their colleagues for help as they may be used to. It is therefore important that the scrum master emphasizes the importance and priority of the sprint backlog and helps communicating this to team members' direct managers if necessary.
No objectives
As DSLC sprints are mainly recommended when working on exploratory projects like data science projects, the team needs to adjust their usual working habits from working towards specific objectives to having to develop research topics based on relevant questions during the Research Planning meeting [10]. These research topics will then serve as requirements in the sprint backlog but as data science projects are exploratory, there is no way of defining the objectives prior to having explored the data.
This way of working thus challenges the usual perception of work where there are clear objectives that have to be met before the project can come to an end [3]. As the PMBOK® Guide describes, a project end is reached when "the project's objectives have been achieved or when the project is terminated because its objectives will not or cannot be met..." [3]. This lack of specific objectives thus makes DSLC sprints unsuitable for other types of projects where there are some clear deliverables that need to be done. Such projects could, for example, be construction projects where there are a very clear start and end to the projects as well as clearly defined deliverables and success criteria. Also, these types of projects require a great deal of planning and funding, which is contradictory to the nature of DSLC projects [9].
DSLC sprints are not explicitly part of well-renowned standards like the PMBOK® Guide or PRINCE 2, however, they are a subcategory of the adaptive and agile life cycle projects as described in the PMBOK® Guide [3]. These project life cycles all take offset in the APM methodology and DSLC sprints are thus an extension of the adaptive life cycles.
[edit] Glossary
APM - Agile Project Management
DSLC - Data Science Life Cycle
SDLC - Software Development Life Cycle
CRISP-DM - Cross Industry Standard Process for Data Mining
[edit] Annotated Bibliography
Rose D. (2016) Data Science: Create Teams That Ask the Right Questions and Deliver Real Value In: Data Science. Apress, Berkeley, CA: This book elaborates in details how to run data science projects successfully. This wiki article has primarily focused on chapters on how to deliver successful data science projects but the book also covers a lot of other relevant aspects. The first section defines what data science is. The second digs into how to build a well-functioning data science team and the last part focuses on how to ask the right questions during the Question Breakdown meetings.
Project Management Institute (2013) A Guide to the Project Management Body of Knowledge (PMBOK® Guide). In: Project Management Institute: This guide provides widely accepted guidelines, rules and characteristics for project, program and portfolio management. In this wiki article, the PMBOK® Guide has primarily been used for its section on project life cycles, however, it covers many more relevant subjects within project, program and portfolio management. The PMBOK® Guide for example also provides extensive guidelines within Project Scope, Time, Cost and Quality Management as well as Project Risk Management.
H. Frank Cervone (2011) Understanding agile project management methods using Scrum: This article provides a very clear understanding of what scrums are and how they relate to agile project management. Some details are addressed in this wiki article but further knowledge is to be found for the interested reader. It elaborates the origin of agile project management in the agile software development movement and provides tangible tools for managing scrum projects.
[edit] References
- ↑ 1.00 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 1.10 1.11 Rose D. (2016) Data Science: Create Teams That Ask the Right Questions and Deliver Real Value - Chp. 13: Working in Sprints. In: Data Science. Apress, Berkeley, CA
- ↑ 2.00 2.01 2.02 2.03 2.04 2.05 2.06 2.07 2.08 2.09 2.10 2.11 H. Frank Cervone (2011) Understanding agile project management methods using Scrum. In: OCLC Systems & Services: International digital library perspectives
- ↑ 3.0 3.1 3.2 3.3 3.4 3.5 3.6 Project Management Institute (2013) A Guide to the Project Management Body of Knowledge (PMBOK® Guide). In: Project Management Institute
- ↑ 4.0 4.1 4.2 Office Of Government Commerce (2009) Managing Successful Projects with PRINCE2™. In: TSO
- ↑ Inspiration for Sprint burn-down chart: http://www.sw-engineering-candies.com/blog-1/howtouseproduct-burndown-chartsandsprint-burndown-chartsnotonlyinscrumprojects
- ↑ Inspiration for Release burn-down chart: http://www.softwaretestingstudio.com/burndown-chart-agile-scrum/
- ↑ Inspiration for Product burn-down chart: https://www.scrum-institute.org/Burndown_Chart.php
- ↑ Maylor H. (2010) Project Management. In: Financial Times Prentice Hall
- ↑ 9.0 9.1 9.2 9.3 9.4 Rose D. (2016) Data Science: Create Teams That Ask the Right Questions and Deliver Real Value - Chp. 12: Using a Data Science Life Cycle. In: Data Science. Apress, Berkeley, CA
- ↑ 10.0 10.1 Rose D. (2016) Data Science: Create Teams That Ask the Right Questions and Deliver Real Value - Chp. 14: Avoiding Pitfalls in Delivering in Data Science Sprints. In: Data Science. Apress, Berkeley, CA