Estimation with Normalized Story Points? Really?

I heard a discussion about normalized Story Points in SAFe®. People were saying that normalized Story Points are not agile because it means that a Story Point equals an ideal person day. But is this true? What is “normalized” anyway? Is it necessary in SAFe? Our research found surprising information about what SAFe describes. This may differ from common practices.

SAFe® talks about aligned (approximately normalized) Story Points.

SAFe uses the term ‘normalized’ Story Points. The articles that use this term refer to the Iteration Planning article for the definition of ‘normalized’. This article states:

“Story points can be aligned (approximately normalized) by teams in the ART, providing a shared basis for capacity forecasting and economic decision-making.”

In statistics, ‘normalized’ means that values measured on different scales are adjusted to a common scale.

In statistics, normalization refers to the process of adjusting values measured on different scales to a common scale, often to compare or combine them. For example, we can normalize test scores from different exams so that we can compare them directly.

For more information, read the Wikipedia article. It explains ‘Normalization’ in the field of statistics.

In SAFe, ‘normalized’ means that Story Points on different scales from different teams are adjusted to a common understanding of ‘one’ Story Point.

The definition of ‘normalization’ in statistics explains what is meant by ‘normalized Story Points’ in SAFe.

With multiple teams, each team has its own Story Point scale. Normalizing Story Points means that the values measured on different scales (by each team) are adjusted to a common scale. If we make sure that the reference stories of the different teams (which represent the size ‘one’ Story Point for each team) have a comparable size, we get an approximate common scale. In this case a story A of size ‘two’ from one team and a story B of size ‘two’ from another team have comparable size. 

Keep in mind that even with aligned Story Points, each team still has its own Story Point scale. The scales are comparable, but not the same. That’s why SAFe calls it ‘approximately normalized’.

Normalized Story Points do not mean that one Story Point equals one day of effort.

SAFe suggests a way to create a common baseline for story point estimation in the Iteration Planning article:

“Each team finds a small story that would take a day to develop, test, and validate. Call it a ‘one.’”

We establish this baseline only once before we start estimating. Also, this technique is just a suggestion (SAFe: “one approach”). If you prefer a different heuristic for finding the size ‘1’ Story, feel free to use it. Teams can agree on any way to find that ‘1’ as long as it produces a reference Story.

It cannot be emphasized enough that this algorithm is only a guide to find the size ‘1’ Story at the beginning. The reference is the Story found, not a person day.

Story Points have a relationship to effort that changes over time.

Many people have come to understand that ‘normalized Story Points’ means that one Story Point equals an ideal person day of effort. This is wrong. This is not what SAFe suggests.

Relative estimation with Story Points decouples effort from size. We use forecasted Velocity to forecast effort based on actual, current, and evolving data. A fixed relationship between size and effort defeats the whole concept and benefit of estimating size instead of effort. 

Why we decouple size from effort:

At any given moment, size is related to duration and effort. The factor in this relationship is the current Velocity (or Throughput). The Velocity of a team changes as the team improves or as we introduce techniques like test automation. As the Velocity changes, the duration and effort change. While size tends to be stable, Velocity tends to change. Separating size from duration/effort helps us predict the duration/effort. We do this based on current data, without the need to re-estimate the items. Separating size from duration/effort means that there is no fixed relationship between size and duration/effort. We use a current Velocity to make a current prediction of duration/effort.

Aligning (approximately normalizing) Story Points is optional.

I often use the term ‘aligned’ when explaining the concept of teams having the same understanding of size across teams. I do this because many people think ‘normalized Story Points’ means that one Story Points equals an ideal person day of effort. I want to stay away from this misconception and bad practice.

I also make sure, in our training and SAFe implementations, that people understand that ‘aligned (approximately normalized)’ does not mean a Story Point has fixed effort. Aligned only means the “size ‘one’” reference Stories (from each team) have a similar size.

I also make sure people understand that these concepts are optional:

  • Aligning (approximately normalizing) Story Points is optional.
  • The algorithm “Each team finds a small story that would take a day to develop, test, and validate. Call it a ‘one.’” to find comparable size ‘one’ reference Stories in each team is optional.

An alternative technique for aligning Story Points

You like alignment, but you dislike the technique suggested in SAFe? Here is an alternative:

  1. Each team chooses its own size ‘one’ reference story. 
  2. Representatives of each team meet and make a relative estimation of the reference stories. This leads to some reference stories being a ‘2’ or ‘3’ (or maybe something else) after aligning. 
  3. The team representatives go back their teams, with some of their reference stories being a ‘2’ or ‘3’ (or maybe something else) after the alignment.

However, this is a more advanced technique that requires some understanding of Story Point estimation by the teams.

Pros and cons of aligning (approximately normalizing) Story Points

If you have many teams working together on the same ART Product Backlog, it can help if they have an aligned definition of ‘1’. This allows the teams to discuss size across teams. For example, when working together on a common Feature, or when passing a Story from one team to another. 

It may also make sense not to align Story Points. One reason may be that almost all Features can be edited by one team at a time and Stories rarely need to be shared. In this case, alignment does not bring much added value. Another reason for not aligning may be that experienced Scrum teams come together that already have an existing Story Point scale.

Read my other articles about SAFe estimation.

Thank you, reviewers.

Estimation is a tough subject. I would like to thank my reviewers. They greatly improved this article. They are Simon Porro, David Croome, Philipp Erdmann, Hannes Jung, and Matthias Faßbender. We also checked the article with SAFe CoPilot.

Want to learn more? Come to one of our training courses.

We cover the above topics in our Leading SAFe training or in our Implementing SAFe training. Experience a training with a lot of hands-on activities, good discussions and experienced experts. Meet the people with the most practical experience in scaling agile and establishing business agility.

All quotes from the SAFe website are © Scaled Agile, Inc.

Leave a Reply

Your contact:

Malte Foegen

wibas GmbH

Malte Foegen

Otto-Hesse-Str. 19B

64293 Darmstadt

+49 6151 5033490