An Overview of Estimates

Oh boy, here we go again. If there’s one topic in agile development that has been talked to death it is estimation. That is, if one has been in the “agile space” long enough and done enough research and collaboration. To everyone else, the various methods of estimation and what they yield is frequently news, to this day. Until that stops being the case, I am happy to help people new to agile development figure out what tools they have at their disposal.

As with most of my ideas, they are only current with respect to when I put them on paper. Tomorrow, with new information or perhaps some more sleep, they may change.

Back on topic, there are so many things we can estimate. The amount of time we think we will spend on a deliverable (in hours, or in “ideal days”, or in “moons”, or whatever time measurement you fancy). The relative complexity of that deliverable. The business value of that deliverable. The effort of testing that deliverable. And so on.

Ultimately, these estimates are tools, and the focus of this discussion will be on what these tools are for and how to use them properly. I will start by saying this: using them improperly is likely worse than not using them at all, because using them invariably comes at a cost.

So let’s look more closely at our estimation toolbox. Once we have an idea of what kind of data different estimates can provide, we’ll look at situations in which we should consider applying them to gather that data.

Hours to completion
A common estimate is how long a task or set of tasks (such as a User Story) will take in hours. Estimating whether a task will take one hour, or 10 hours, can help with capacity planning.

Pros

  • simple concept to understand
  • all kinds of work, from deliverables to technical discovery efforts to training, can be estimated in the same units (time)
  • easy to plot in a chart since hours scale linearly

Cons

  • very large margins of error
  • precision of estimate often at odds with error margin
  • cannot yield per-time data since units will cancel
  • does not take into account context-switching
  • depends on who does the work
  • difficult to track due to meetings and other context switches

Ideal days to completion
Some find it simpler to estimate using “ideal days,” which translate to a full workday without any interruptions. The point is to acknowledge that such days rarely exist and that interruptions and context switches inevitably occur. If something is estimated to take two ideal days, it will likely take longer than two days, and may even take several depending on how much availability someone actually has. This also helps think about capacity when planning a sprint. Note that performing a conversion to hours using something like 6 or 8 hours per day and using the resulting hours for planning largely misses the point. Consider: going from two ideal days to 16 hours introduces another significant digit of precision to the estimate. It also creates a burden of accounting for interruptions and context switching rather than just acknowledging that they exist.

Pros

  • less time spent estimating than with hours
  • precision is kept reasonably low
  • implicitly acknowledges that days are in fact not “ideal” and have interruptions
  • simplifies capacity planning

Cons

  • cannot yield per-time data since units will cancel
  • depends on who does the work
  • may not improve predictability over hour estimation
  • management may unnecessarily question discrepancy between ideal and actual days

Relative complexity
The most frequently suggested form of estimation is through relative complexity. Sometimes this is referred to as “story points.” Deliverables, or user stories, are assigned a measure of complexity relative to each other, using a “simple” story as a baseline. That story gets one point, and the rest are compared to it and assigned two, three, or more points. Points of complexity follow the Fibonacci sequence in order to reflect the growing margin of error as complexity increases. So point values are 1, 2, 3, 5, 8, 13, and so forth. A common practice is to take any story that is, say, eight or more points and try to make sure that effort is taken to break it down into smaller deliverables. The more small, lower complexity deliverables there are, the easier it is to track work and make predictions. Complexity is also not tied to any person’s experience, skill level, or availability.
As a general practice, if points of complexity are used to measure velocity (points delivered per unit of time), then points are often only assigned to deliverables that provide business value. The idea is that other tasks, while important to call out and track, exist in order to improve cycle time and quality of business deliverables. Tasks without points, such as technical discovery efforts, can be time-boxed if appropriate, and in any case should be factored into capacity planning.
Metrics such as velocity can be used to determine how well a team manages who does what work. Complexity is irrespective of a person’s ability or experience, so as velocity improves it may mean that a team has reduced expertise-related bottlenecks and is sharing knowledge. Velocity can also measure how much faster or slower a team delivers software in general as they evolve their process.

Pros

  • less time spent estimating than when using time-based values
  • can yield per-time data since it is not itself a measure of time
  • built-in acknowledgment of error margins by using Fibonacci sequence
  • does not depend on who does the work
  • can be used to predict how long a project will take based on velocity
  • immune as a measurement to distractions and context-switches
  • will not be factored into utilization planning or metrics
  • less likely to have a large discrepancy in estimated vs actual

Cons

  • require some experience to use appropriately in planning
  • impossible to convert to hours but people often try
  • team-specific; one point means something different to each team

Relative size or effort
Another way to estimate relative values is through something like t-shirt sizes rather than story points. People assign values like “small,” “medium,” “large,” or “extra large” to deliverables. This is a fast way of estimating but has some disadvantages compared to story points. One is that to get a metric such as velocity, the relative sizes need to be mapped to numbers in the first place. This means basically switching over to a point-based system to take advantage of any predictability of the data.
Another concern with relative sizes is that some people are thinking in terms of complexity while others are thinking in terms of how long it will take them to do, which can lead to misleading data.

Pros

  • little time spent estimating
  • if representing complexity
    • can yield per-time data once converted to points
    • does not depend on who is doing the work
  • will not be factored into utilization planning or metrics

Cons

  • if representing time to completion (avoid this)
    • cannot yield per-time data since units cancel
    • depends on who does the work
  • team-specific; sizes mean something different for each team

Value Points
This is a very different measure that looks at the business value of work rather than its complexity or the time it will take to complete. It is something orthogonal to time and complexity — it has nothing to do with them and can be mapped on a perpendicular axis. It is in fact this mapping that can help with product planning.
Time and complexity are ultimately measures that drive cost. Business value is the opposite — it provides a return. And so, the higher business value a certain deliverable has, the higher it can be prioritized by the product owner. More precisely, the higher its value relative to the cost of producing it, the higher the deliverable can be prioritized. High-value, low-cost deliverables may be selected over high-value, high-cost ones. Low-value, low-cost are ones typically left for last and low-value, high-cost deliverables may not be completed at all.
What are value points, ultimately? Are they abstractions for dollar amounts, similar to “ideal days” being abstractions for time? They can be, since using actual dollar amounts is difficult and similar to using hours to estimate time — the precision is too high given the amount of unknowns and there is no baseline to compare to. But deciding that a “value point” is, say, $10000, is also too much of a guessing game because returns on investment are quite difficult to predict in terms of absolute numbers. So comparing relative value through points may make more sense. Using the Fibonacci sequence, one can say that a one-value-point story feels like it is three times more valuable than a one-value-point story, and that is usually good enough to inform how work should be prioritized.

To understand what various estimates can be used for, we have to take a step back and look at our end goals as a software development shop (or team).

  1. be successful in an engagement with a client (i.e. a software project)
  2. generally improve as a software development team

When working on a software project, you will invariably have constraints. A nearly unavoidable one is money — the client only has a certain amount that will be spent on your team, whether it is per a given time period or overall. Another is product quality — whether it is stated outright or not, it is unlikely that a client will accept work that falls below some standard of quality and leave it at that. Next, we have constraints such as time — there may be a hard deadline for the project — and scope, where there is a minimal viable feature set that must be implemented. Together these constraints may be familiar to some as the “iron triangle” (https://en.wikipedia.org/wiki/Project_management_triangle), which maps cost, schedule, and scope as constraints tied to one another. Quality is included as a fourth constraint, often depicted in the middle of the triangle. It is also the one that is least likely to be actually negotiable despite being frequently compromised through poor processes and planning.

The estimate that is the most difficult to avoid is an early one: “how much will this project cost?”. It actually pertains to the first constraint mentioned — money. If your engagement happens to be a fixed-bid contract where you state that you will do a certain amount of work for a particular price, you have to have arrived at that number somehow. In this case, your best friend is historical data. If you’ve done similar work before, use the actual time you spent on that work to inform the estimate for this new project. If this work is completely new, then you have have to do some investigation and come up with a ballpark time estimate that you then convert to a dollar amount based on cost per unit of time. The only suggestions I have for this estimate is to use low precision, put a margin of error around the final number, and use the high end for your negotiations. This estimate’s main purpose is to close a deal and set some basic expectations.

When you are constrained by time because there is a fixed deadline for your project, then it is useful to know whether you are on track to meet that deadline given current scope.

When you are constrained by scope because there is a minimum viable product, then it is useful to know whether you are on track to deliver that scope given the current deadline.

When you are constrained by scope and by time, then it is useful to know whether you are on track to deliver that scope within that time given the current resources.

If you want to know that information — that given your current scope, timeline, and resources, you are on track for success — then you probably have to come up with some estimates.

The estimation methods described above are your options here: hours, ideal days, points, or t-shirt sizes. Depending on your team makeup, how many priorities you have to juggle (especially different projects), whether you care about measuring utilization (you probably shouldn’t), how much time you are spending on estimation, how well-groomed your product backlog is, and so forth, choose the option that works best for you.

Another way to look at it is this: which of these options you choose to pursue should depend on what actionable knowledge you expect these estimates to provide you, and what actions you would be capable of taking in turn. For example, if you want a measurement of work completed per unit of time, then you need a velocity and points are your best bet.

The other side of the equation is, what kind of information do you need in order to be successful in your engagement with a client? For example, do you need to predict how long it will take to complete the current product backlog?

Something to consider is, if you can’t think of how you could use the knowledge that gathering estimates yields, then it’s not valuable knowledge and there is no point in gathering it. Having it for its own sake is a great example of generating waste.

If you choose to estimate the cost of work, then I would recommend points of complexity because the pros outweigh the cons by so much compared to the other available options. But if you use points, you have to make sure there is discipline around when they are assigned and when they are not. If capacity planning is important to you, then cards with points must live alongside cards without points that nevertheless take away from capacity (administrative tasks, training, technical discoveries, defect fixes, and so forth). Other things such as meetings will take away from capacity as well, as will team size fluctuations, naturally.

If you choose to estimate the value of work provided, which is actually not a common practice as far as I know, then value points are a good option compared to dollar amounts. But few businesses use them because the initial several sprints are typically to create the “minimum viable product” which implies that all deliverables within it are equally and maximally valuable — no matter what, they must get done. But once a product gets into straight “maximize ROI mode,” they can help prioritization and yield a velocity that may even be more meaningful (business value per unit of time) than one that points of complexity can provide.

Effective planning is something I might look at in an upcoming writing.

I want to conclude by mentioning that there ARE alternatives to estimating altogether, depending on circumstances. If you have a fixed bid project that has a frozen scope and a hard deadline, then further estimation is a waste of what precious time you have to actually complete the work at high quality.

There is also the case, albeit an uncommon one, where a team is capable of breaking the backlog out into deliverables of roughly equal size, all “small enough.” In that case the team can just track how many deliverables are completed per iteration and how many are left in the backlog.

Good User Stories and the Definition of Done

On occasion, I see dev shops complaining of work taking too long to finish, of deliverables being carried over from Sprint to Sprint, and the root cause turns out to be that the developers don’t know what it means for their deliverables to be “done” until they are done. They start work, having a “fairly good idea” of what the customer is asking for, hoping that they’ll finish the work in time to demo it at the end of the iteration. And they find out the hard way that “fairly good” often just doesn’t cut it.

So how to get from “I think I have an idea of what they want” to “we are on the same page about exactly what needs to be done”? It’s a question with a potentially very long answer, but ultimately comes down to creating quality requirements.

The quality of software requirements has a direct influence on the quality of deliverables. You need a sufficiently clear understanding of the business domain, of the users involved and their needs, of the scope of the work to be done, and any edge cases in the logic that needs implementing.

Eliciting all of that requires a certain baseline of understanding and commitment on the part of the product owner (or customer) and the team gathering and implementing these requirements. Further, when it comes to building the actual requirements, there are many kinds of documents to consider, from a specification of required product features, to user personas and how they work day-to-day, to a list of business challenges that currently have no solution. Some of these can be captured in free-form documents, others in bulleted lists, and others in specialized formats. Product features, broken down into small deliverables, can be recorded as User Stories. Today I will take a very narrow focus and look at User Stories in particular.

A key idea behind User Stories is that they are “atomic,” the idea being that a single User Story cannot be broken down further into requirements that themselves are good User Stories. (If it can, then that should be done, and the original user story may become an Epic or some other metadata that can logically group the resulting, smaller User Stories.)

A good User Story should be valuable (for the business), independent (provide that value on its own), testable, estimable (understood well enough that its complexity can be determined within an acceptable range), and small (ideally, so small that it cannot be broken down further into independent, valuable, testable user stories).

The I.N.V.E.S.T. mnemonic is sometimes used to consider how “good” a User Story is.
https://en.wikipedia.org/wiki/INVEST_(mnemonic)

The typical format for the opening statement of a User Story can capture the “valuable” criterion.

As a ~
I want ~
So that ~

For example: As the site owner, I want users to authenticate prior to browsing, so that I can create a small barrier to entry, personalize user experience, and track user behavior.

Each part of the 3-phrase story contains useful information. The “As a ~” will describe the role from whose perspective the story is written. This can imply all sorts of things. “As an administrator” may imply that only those with administrator-level privileges are able to experience the described behavior and others should not. The “I want ~” briefly describes the desired behavior, and “So that ~” describes the value that this behavior provides. If a product owner cannot articulate the “So that ~” phrase — that is, if he or she cannot think of a good reason to desire the behavior — then it is likely that the work does not need to be done or in any case requires further conversation.

The rest is down to the details. Is the story independent or does it not actually contribute value on its own? Is it small or can it be further broken up into two or more user stories? Is it testable?

Most of the time, the “As a ~, I want ~, so that ~” statement is not enough to answer those questions. And, importantly, it is not enough to tell you the scope of the work. It is not clear how much work you have to do in order for the story to be considered done.

No User Story is complete without a clear definition of “done.”

One way to think about what “done” means is that if the product owner has accepted the work as complete, then there is no more work to be done. In other words, the User Story has a set of “Acceptance Criteria” that must be met. These should be written out so that they are clear to everyone, including the product owner, those implementing the Story, and those testing the Story.

Acceptance Criteria should certainly include everything relevant from the user’s perspective. They may also call out other requirements that concern the development team more than the product owner.

An example with our authentication story:

As the site owner
I want users to authenticate prior to browsing
So that I can create a small barrier to entry, personalize user experience, and track user behavior

Acceptance Criteria

  • The user should be shown a login form upon visiting the site
  • The login form should appear over site content and disappear upon authentication (that is, the login form does not live on a separate page)
  • The user should not be directed to a different URL upon successful navigation
  • The user should be shown an error message if incorrect credentials are entered

If there are any unknowns in this story that prevent it from being estimable, those should be brought up as soon as possible so that the User Story is ready to be implemented once it is high-enough priority. One crucial detail is missing in the story above — the credentials themselves. What are the accepted user credentials? Is it a username and password? Is it an email and password?

In this case the missing criterion for this story is: User credentials are an email and password. This story now has enough clarity that the work described can be estimated in terms of complexity.

There are other behaviors that relate to user authentication that are not touched on in this story. They may be worth asking about as part of requirement elicitation. An example is “can the user link his or her Facebook or Google account instead of using a site-specific one?” or “what happens when a user tries to log in with the wrong password several times in a row? should the site lock that user out to prevent brute-force account hacks?” or “what if the user has forgotten his or her password? should there be a ‘forgot password’ flow?” Mind you, the answer to these questions, if yes, does not mean additional acceptance criteria for the story described above. These behaviors are independent enough that the story above, done as-is, will provide business value on its own. Then features such as password reset, account locking, and so forth, would be described in separate User Stories that follow up on this one.

Perhaps this is the first time that the notion of user account management has even come up, which can spawn a whole separate discussion around security and how credentials are collected and stored. And shouldn’t there be user registration since there is user authentication?

Note that some things are not mentioned in the Acceptance Criteria that nevertheless become a decision point for the implementer and tester. For example, does the user submit credentials by clicking a “Log in” button? Or does the user press the Enter key on the keyboard? Or is it both? These details are specific to the solution rather than the problem and therefore the product owner may not care which direction is taken. Nevertheless, it is usually worth a quick conversation as work progresses so that there is less need for rework down the road.

I attended a talk by Ken Schwaber (a founder of Scrum) at a conference, where he succinctly explained that a story is done when “there is no more work to be done.” In other words, nothing is hidden, such as database migration scripts, or creation of user accounts, or whatever else may not be called out in a user story and therefore left until later but nevertheless must happen before that work can be live in production.

So Acceptance Criteria may not fully capture the definition of “done” for a User Story. In fact, many “nonfunctional requirements” are typically called out somewhere else. For example, stability and scalability. Does the fact that the site must support 10000 concurrent users logging in mean that should be called out in the Acceptance Criteria for user authentication? It almost certainly won’t be. Things such as performance, usability, accessibility, running on multiple platforms (Chrome, Edge, Safari, etc.), architectural conformance — these are probably considerations for every single bit of work done, and therefore are universally implied, provided they are called out somewhere and with sufficient precision (e.g. “must support 100 transactions per second for a sustained (1 hour +) period of time without experiencing statistically significant performance degradation”).

As you collaborate with the product owner and elicit requirements that are broken down and organized into User Stories with accompanying Acceptance Criteria, always keep in mind what “done” means and seek to clarify that definition before you commit to implementation.