The Agile world is rife with misconceptions about SAFe, particularly around estimating with Story Points. That leads to bad practices standing uncorrected and being repeated over and over again. Dive into an exploration that challenges popular beliefs about estimating in SAFe and uncovers the true essence of Story Points. Whether you’re new to SAFe or think you’ve heard it all, this article promises fresh insights, debunking myths and guiding you through the agile maze with clarity and precision. Read this article if you want to know how proper agile estimation works, since years, and still with SAFe.
TLDR: When using SAFe, we do not change the way we use Story Points and estimates. We do add another level of flow. However, this follows the same principles.
- No, in SAFe we do not measure normalized Story Points based on ideal developer days. That is wrong, and if done that is a problem.
- We estimate the size of Product Backlog items relative to other estimated items.
- Before we start estimating with Story Points, we must define what “1” means.
- It helps when teams that work together have an aligned understanding of size, but it is not necessary.
- Once you have started estimating with your definition of “1”, all estimated Product Backlog items become reference points for further relative size estimations.
- We measure Velocity and forecast Capacity.
- Only for a new team we calculate – guess – initial Capacity.
- With Story Points and Velocity, we can forecast delivery dates.
- Story Points are not for comparisons or compensation.
- A team can’t stop relating Story Points to fixed effort? Use T-shirt sizes.
- Features can have their own relative size.
No, in SAFe we do not measure normalized Story Points based on ideal developer days.
Many articles wrongly claim that SAFe suggests that this is how things should be done (e.g. Kevin Bendeler). Yes, this is what many people do. But no, this is not how SAFe suggests this is done (see Iteration Planning and WSJF). SAFe is not the problem. So, let’s see how proper estimation works in a multi team setting.
We estimate the size of Product Backlog items relative to other estimated items.
The number you give to the size of a Product Backlog item (“Story”) is called Story Points. Story Points are a relative estimate. This means that an item with 2 Story Points is twice the size of an item with 1 Story Point, and an item of 8 Story Points is 8 times the size as an item with size 1 or four times as much as an item with size 2. In comparison to a 1 SP Story, a 2 SP Story should cost approximately 2x as much effort to implement.
Story Points are related to each other, but Story Points have no fixed relation to effort (and that includes things like ideal developer days). The reason for relative estimates is that they save us the time to re-estimate when a team becomes faster. It also allows us to understand throughput and to make forecasts – things we like with an empirical mindset. The size of Product Backlog items is estimated by developers.
An important thing to understand about an estimate is that it is not a number, but a probability distribution around the estimate. If we estimate that an item has a size of 2, the actual value (once we implement it and know the actual size) could be anything, but a 2 is more likely than a 13. The uncertainty and spread of the distribution increases with larger estimates. This is why the range of Story Points spreads out as it gets larger (3, 5, 8, 13 …).
Before we start estimating with Story Points, we must define what “1” means.
As an independent unit of size, Story Points need a definition of what “1” means. Typically, we find a Story to use as a reference for size “1”.
SAFe suggests choosing a Product Backlog item that takes “half a day to develop plus half a day to test and validate”. It cannot be emphasized enough that this is only a guide to find the size “1” Story. The reference is the Story found, not the person-day, and certainly not an “ideal” person-day (there is no such thing in SAFe).
If you prefer a different heuristic for finding the size 1 Story, feel free to use it. Teams can agree on any way to find that “1” as long as it produces a reference Story.
It helps when teams that work together have an aligned understanding of size, but this is not necessary.
If you have multiple teams who work together on the same Product Backlog, it helps if they have a similar definition of “1”. This enables teams to discuss size across teams. E.g., when they work together on a shared Product Backlog item, or if they pass a Story from one team to another.
If each team picks an initial Story that will take ‘half a day to develop plus half a day to test and validate’, we achieve this. I often ask teams to make sure that when they compare their size “1” Stories, they are all indeed the same size.
When the relation of Story Points is not only agreed in one team, but across teams, this is called “aligned” in SAFe. However, aligning Story Points across teams is not necessary. It is an option, and often Story Points align anyway due to cooperation between teams.
Once you have started estimating with your definition of “1”, all estimated Backlog items become reference points for further relative size estimations.
We only need to find the “1” once in the life of a Product Backlog. Later, as we estimate more Product Backlog Items, there is a wealth of Product Backlog entries with size estimates. All of them can be used as a reference. Never again will we think about how we came to choose the size “1” reference Story.
We measure Velocity and forecast Capacity.
At the end of every timebox/cadence (e.g. Sprint/Iteration or Planning Interval/PI), we can count the number of Story Points of all the Stories done in the timebox. We can correlate this to the effort spent. This number is called Velocity.
Velocity is similar to Throughput, which is often used in Kanban systems to measure work items delivered per time period (see Kanban Glossary or SAFe Measure and Grow Article). In contrast to Throughput, Velocity takes into account the different sizes of items.
Velocity (or Throughput) will gradually change over time, especially as a team improves.
Velocity is useful to make a forecast of what the Velocity of upcoming Sprints/Iterations could be. This forecast is called Capacity. It helps to shape a Sprint Goal (aka Iteration Objective) and select the Stories that the team forecasts to deliver in the next Sprint/Iteration. Of course, future Velocity is statistics: the forecast is an estimator with variance. Tina may become ill, the actual size of the story may vary from the estimated size, and other things may happen.
Only for a new team we calculate – guess – initial capacity.
When a team is new, we do not have any data. Therefore, we cannot forecast capacity. There are several ways to get around this.
SAFe suggests calculating the capacity for the first Sprint/Iteration. SAFe uses 8 Story Points per developer for a 2 week Sprint, or more generally 0.8 Story Points per workday. 0.8 Story Points per workday take into account the time spent for the Sprint events and Refinement.
Another option to determine the Capacity for the first Sprint/Iteration would be to select Stories for the Sprint and then to count the number of Story Points.
You can choose either way. It does not really matter, because we only do this once in the life of a team. After that initial Sprint/Iteration, we measure Velocity and forecast Capacity (see above). We rarely calculate Capacity again.
With Story Points and Velocity, we can forecast delivery dates.
Stakeholders often want to know when items will be completed. With Story Points and Velocity, we can provide estimates for the “when will it be completed” question. By multiplying the Story Points by the measured Velocity, we have an estimate of the time it will take to complete those Story Points.
For example, if there are Product Backlog items with 100 SP, and we have a current Velocity of 10 SP/Sprint, those items will take 10 Sprints. Not exactly, of course, because the time is an estimate with variance and a confidence interval. This technique can be found in many amusement parks, where at certain lengths of the queue, signs indicate the amount of time to wait from that point.
There are two advantages to forecasting delivery dates using Story Points and Velocity. First, the delivery date is based on measurements, not wishes. Second, as the Velocity changes, the forecasts of the delivery dates follow that change.
Story Points are not for Comparisons or Compensation.
Story Points help teams to discuss stories, do empiricism, and make forecasts. There is often a strong desire to measure and compare the delivery of teams and people. This is not what Story Points are for.
Some people use Story Points to pay teams. This is also not what Story Points are for. Paying by Story Points is just like paying bonuses, defocusing teams from their goals. (See “Large Stakes and Big Mistakes” from the Federal Reserve Bank of Boston). Of course, we all know this from Scrum, so we won’t change it when Scaling.
A team can’t stop relating Story Points to fixed effort? Use T-shirt sizes.
Often teams get caught up in estimating Story Points: They continue to relate them to fixed effort; They have a hard time wrapping their heads around the fact that each estimate represents a distribution (“something around 13”) rather than a number.
When this happens, I use T-shirt sizes (XXS to XXL). Often you still want to do math with the size estimates, such as velocity measurements, delivery date forecasts, or capacity forecasts. You can do this with establishing a relation between the T-shirt sizes and mapping each T-shirt-size to a Story Point number.
If a team is not comfortable with estimates („no estimating“), similar sized Stories and the use of Throughput also work. This allows forecasts to be made just as well as with Velocity and Story Points.
Features can have their own relative size.
In multi team settings we often have Product Backlog items on different flight levels. SAFe uses Stories (fit into a Sprint) and Features (fit into larger cadence of a multiple of Sprints, the Planning Interval, or PI).
For Features, all the things we have said before apply in the same way. We estimate the size relatively. We measure Velocity and we forecast Capacity for a Planning Interval (PI).
One option is to use Feature Points as an independent unit of size for Features. Feature Points work the same way as Story Points, just on the higher flight level of Features. Feature Points have no direct relation to Story Points, just as Story Points have no direct relation to time.
Another option is to estimate the size of Features in Story Points, too. In this case there is not only a relationship between Stories, but also between Features and Stories. If you want to use Story Points for Features, the Story Points must be aligned between teams. You will also be using “big” Story Points, such as 100, 200, etc. SAFe talks about both options. After all, it is your choice. I am a fan of Feature Points to keep things loosely coupled.
Size estimates for Features should be done by team representatives. SAFe indeed suggests this. While Product Owners are team representatives from a scaling point of view (they are a member of the Scrum Team), I personally like to have developers in the room, too, when estimating the size of Features.
Do not change anything in the concept of Story Points when Scaling.
Long story short: The concept of Story points is old, and with SAFe we do not change it. If the concept appears to be different, something is probably wrong.
Thank you, reviewers.
Estimating is a though topic. I thank my many reviewers who vastly improved this blog article: David Croome, Tina Behers, Alexander Post, and Simon Porro.
Do you want to know more? Come to one of our trainings.
We discuss these topics in our Leading SAFe training or in our Implementing SAFe training. Experience a training with a lot of hands-on activities, good discussions and experienced experts. Meet the people with the most practical experience in scaling agile and establishing business agility.