Velocity, Estimates, and Cost

I recently read a blog post by Gojko Adzic (see here) about velocity and using its measurement. It’s a good post — it warns of relying on velocity as an indicator of success. In short, he argues that while low velocity can point to problems, a high-enough velocity doesn’t imply long-term success.

I agree. I also think velocity can be used to revisit estimated completion dates and costs. Much as developers don’t like those notions (I know I don’t).

One of the inescapable issues of software development, at least for me (if you have found a way to escape it, please share it), is the need for upper management to always have an “end date” for the work in sight. For better or worse, estimated completion dates are used to inform high-level budgets, resource allocation, and so forth.

Naturally, regardless of what estimation methods developers use, “the business” wants hours. The idea is to convert those hours to dollars.

But estimating in hours sucks because it’s not reliable (let’s be honest) and it doesn’t really yield a velocity. So some people end up with an arbitrary conversion of hours to “story points” (maybe 1 point is 1 day which is 6 hours or something like that). Which makes the word “point” mean “X hours” and nothing more.

Well, that’s a horrible thing to do. For one, what becomes of velocity? If all you have to measure rate of delivery with is hours, then isn’t that just how many hours of work a team does? Why even measure that number? It will stay constant from iteration to iteration unless you add or remove team members.

If velocity were to actually reflect the pace at which a team is delivering, then it could be used as a predictor. But I submit that then it can’t be tied to hours!

I accept that some things cannot be [easily] changed. Budget approvals have to happen very early in a project’s lifecycle and so you need a relatively high cost and time estimate because the earlier in the lifecycle you are, the more unknowns you have to accept. And no matter what you use to estimate that early in the game, your margin of error will be very large.

But the cool thing about having a properly measured velocity is that revisiting those estimates becomes a trivial exercise. And there’s some value to that, since the sooner it is known that work is taking longer than initially promised, the more time there is to make appropriate adjustments. Conversely, the sooner it is known that work is taking less time than intially expected, the sooner any benefits of this realization can be leveraged.

How to do it? Well like I already said, hours suck, so stop using hours altogether. They are too precise and they mean different things for different people in terms of effort. Using points of complexity is less precise, particularly if you go with the usual sequence of 1, 2, 3, 5, 8, and more universal. Whether a 2-point deliverable takes me one day or takes another developer half a day, it’s the same amount of complexity delivered, and that’s cool because now you are able to look at delivering more/less points per unit of time (proper velocity!).

I am serious about not using hours. If someone asks you “how many hours is a point?” you say “NO.” If someone asks you “how long will this take?” you say “NO.” Points are a swag at relative complexity. That’s it. There is no conversion and there are no hours. Deliverables have points as metadata and those points are used by the team to derive velocity — points delivered per iteration. And from velocity, a few other things can be derived.

One is an approximate amount of work for the team to pull into the next iteration. Historical data  — the velocity and its trend over the last few iterations — can give the team an indicator of about how much work it can reasonably commit to right now.

The other thing is more relevant to this blog entry, and that is a prognosis for the completion date assuming the current backlog. The backlog is continuously groomed, with work broken down for easier estimation and work added or removed as business needs dictate. A team can calculate their velocity, weighted by recent trends, and use that to get a rough count of how many iterations would be needed to clear out the current backlog.

How do you arrive at a weighted velocity? However you want, as long as it makes sense. I’ve used a formula I basically made up on the spot and it has worked “okay.” Given n is the latest completed iteration and n-1 is the previous iteration:

Weighted Velocity = Vn * .5 + Vn-1 * .3 + Vn-2 * .2

You can fiddle with the weighting factors; I don’t know how good they are. I just want to try and capture a trend if there is one and also help diminish the effect of any outliers (e.g. everyone having the flu during some iteration).

Anyway, so you get your velocity, and you look at how many points of work currently remain in your product backlog, and you do simple division and round up. With a velocity of 20 and 175 points remaining, you have 9 more iterations to go. If each iteration is 2 weeks and your team costs you about $400 per hour altogether then you are looking at $400 * 80 * 9, which yields $288000, so you round to $300000 to preserve the appropriate number of significant digits.

Bam. 9 weeks and about $300k. And that’s as of right now. Look at this at the end of every iteration and adjust. You still should not think of these numbers as any sort of promise, but at least the numbers themselves will be better and better informed, and hopefully closer and closer to accurate as work goes on.

Is there a whole lot of value to doing this exercise with your project? Well, if you don’t have a need to predict completion date and remaining cost,  then of course don’t waste your time with this stuff altogether. But if you do have that need, and many organizations continue to, then I think this is a reasonable approach.

Though, if you have ideas about how it can be changed for the better, I am very interested in hearing them.

On Individual Performance Metrics

My previous entry was all about how individual performance metrics in a collaborative, agile environment are misguided. But to keep things focused on reality, where demands for such metrics are sometimes unavoidable, I ended on a challenge for myself to come up with some individual, quantitative performance metrics that did not conflict with notions of self-organization, trust, collaboration, teamwork, and all that stuff that makes agile development teams happy and productive.

While driving home from work I came up with a couple of ideas.

To begin, I’ll jot down some attributes of what I would consider an acceptable individual, quantitatively measurable performance metric.

  1. [ ]  it is not adversely affected by taking time to help others
  2. [ ]  it is not adversely affected by doing high priority work over high complexity work
  3. [ ]  it does not discourage doing high complexity work for fear of failure
  4. [ ]  it makes individual praise sensible and individual punishment nonsensical
  5. [ ]  the closer to an ideal value it is, the more successful the team at delivering valuable work
  6. [ ]  it does not loom over people’s heads and lower morale

Let’s say that if an idea ticks all six of the boxen, then it’s fully acceptable. If it doesn’t, then it’s to be avoided. With that in mind, let’s look at a handful of “candidate” metrics.

Individual velocity
I ranted about this at length in my previous entry. This metric fails points 1, 2, 3, and 6, and makes a weak case for passing 5. It incentivizes selfishness and compromises team collaboration.

Defect count per person
Whether used to evaluate developers based on how few defects are discovered in their work or to evaluate testers based on how many defects they dicover in others’ work, this metric fails points 3 and 6. In the case of evaluating testers for, effectively, reporting how crappy the developers’ code is, it also fails points 1 and 5.

Number of lines of code
LOL.

Ok on to the ideas that I think might at least be conversation starters…

Contributions to an “agility index”
This one came to me while reflecting on a conversation featuring Ken Schwaber and the notion of having an Agility Index that would indicate how “agile” a company is. From what I could tell, it amounted to a checklist of agility-enabling practices that in turn yielded a score between 0 and 100. While I can’t seem to find this particular checklist on Scrum.org — presumably they’d want to sell its usage — I think a comparable checklist can be formulated by anyone with sufficient experience. The more items checked off, the higher the score on the “agility index.”

Making progress towards a higher index benefits the entire team and should not compromise any other good practices. In other words, all six requirements mentioned above would be met. What’s left is figuring out how to make it a quantifiable individual measurement. Well, when it comes time for a performance evaluation, simply ask each developer to mark the “agility index” checklist items that he or she contributed to, with a short blurb specifying the nature of the contribution. The more items contributed to, the better. And the nature of the items should ensure that nothing beneficial got compromised.

Kudos received through team retrospectives
This one is a bit wacky, but I think it’s worth a go. Some variations on team retrospectives include “giving kudos” to fellow team members. These could be for help offered, or ideas presented, or the quality of some deliverable, or really anything at all that was appreciated by others on the team. By being all about collaboration and contribution to team success, this approach ticks all six of the above boxes. And if a process or project manager keeps track of the number of “kudos” each team member receives, that can later be turned into a performance indicator.

So, that’s what I came up with on my drive home. There may be other things of this nature that focus much more on success than failure, and team success at that, all the while allowing for an individual perspective. And I think these are the kinds of metrics we should focus on if we absolutely have to. If it were up to me, I’d just focus on enabling team success.

Don’t Measure Individuals

I recently started looking at some project management software called AtTask, evaluating whether it is appropriate for agile development. While it seemed to be quite capable as a “waterfall” PM tool I wasn’t thrilled by its take on “Agile” (yes, the AtTask people use that word as a proper noun, which annoys me but I’ll write on that another time). I brought up AtTask’s inadequacies to the client, briefly mentioning my recommendation to go with Greenhopper… oh wait Atlassian renamed it… to JIRA Agile…  dang they’re also using it as a proper noun).

Anyway, what project management tool ends up being used remains to be seen. But what the client impressed on me is that the software used should allow for measuring individual velocity. (By velocity I am referring to “points of complexity” delivered per iteration.)

HOLD ON NOW!

Individual velocity?

Folks, if someone says that they want to quantitatively measure individual performance in an environment that’s supporting teams, just say no. That idea is bad in so many ways that I will actually enumerate some of them.

1. It is a conceptual non-starter
What does individual velocity mean when you are talking about a software development team? In the environment in question, each deliverable will be handled by at least three people — a primary developer, a developer who provides peer review, and a dedicated tester. In many cases there will be even more team members involved. So whose “velocity” is at stake when a deliverable isn’t done at the end of an iteration? If a developer hands off a bunch of stuff to testers, does that developer have a high velocity? If those testers find a million defects, is that a low velocity for the developer and a high one for the testers? If a technical lead who is needed for peer review is at a conference for a few days and some deliverables don’t make progress, does the primary developer’s velocity take a hit? Nothing about this concept makes sense to me.

Software is a team effort. Isolating each team member’s individual effort in delivering points of complexity is a suspect task. Just as an example, something like sending an email can be easy enough to implement but a pain in the arse to write automated tests for. Does Bob the tester get more credit than Mary the developer? Perhaps Bob got a whole bunch of assistance from Vick the technical lead. Should Vick get in on some of that sweet velocity loot? There is a team velocity because the team as a unit delivers software.

2. It is impossible to actually do
Besides the troubles listed above, velocity, like all such metrics, can be gamed. And I want to be very clear when I say that it will be gamed, at the expense of the team and the product.

Suppose that not all work has “points of complexity” (sometimes only business deliverables get this attribute, while other technical and non-technical tasks do not; the latter is then seen as helping get the former “done”). Presumably people that work on tasks that don’t have points won’t be looked at negatively when they don’t “take points across the wall.” So, let’s say Sam is a below-average developer and is having trouble writing as much production-ready code as some of his team members. Sam could just take on tasks that don’t have any points, so as he finishes them at his comparatively slow pace, he doesn’t have to worry about being compared to his colleagues. Sam makes himself difficult to measure.

Alternately, Sam may avoid doing any work that doesn’t have story points (e.g. setting up continuous integration) so that he can maximize how many points he delivers, leaving the “non-measured” work to the rest of the team.

Or, even worse, Sam just decides to forego any notion of maintainability and cuts corners like he’s got a Super Star in Mario Kart. His points delivered for the iteration go up. But the technical debt incurred from his shenanigans raises the complexity of future work. Team velocity suffers down the road.

But come on, Sam wouldn’t do that. Sam is a good team member who knows what it means to write quality code. He’s just inexperienced and needs a bit more guidance than his peers. But when he asks for help, Jack and Satou tell him “ain’t nobody got time for that” because they’ve got their own points to worry about!

Satou would never say something like that, though. He’s from Rhode Island. Also, he would help Sam out, and in turn his own work would stall while Sam’s moved along. This might be good for the team (and the business), but it is at Satou’s expense. That is, unless Satou also “took some credit,” which would only be fair, right?

At this point, we’ve really lost all sense of a metric for individual performance. Who did “more” work, Satou or Sam? Who did more important work? What about Jack — if part of Jack’s value to the team is the knowledge he is able to share, then does his refusal to help Sam reflect on his performance and how does it weigh against the “points” he was able to deliver?

“Velocity” on an individual level is a metric that’s vulnerable to so many deliberate and non-deliberate breakages that it’s effectively void. And hopefully it’s clear that the actual numbers that you’d come up when trying to measure “individual velocity” in a real situation would be very hard to distinguish from some you’d get by rolling dice. There is nothing to ensure that they reflect anyone’s skill level or actual productivity. And there are still other reasons not to go down this dark path.

3. It shifts focus to all the wrong things
When a business asks for software, what is ultimately promised by the team tasked to deliver the software? That the right functionality and quality will be implemented for a reasonable cost? Or that each team member will perform adequately in accordance with some performance metric? I’m guessing the first one.

When a team is focused solely on writing valuable, high-quality software, they will assist each other as needed and avoid compromising their goal for no good reason.

But I submit that reward for visible high contribution and/or punishment for visible low contribution can be quite compelling reasons indeed. When one is incentivized to look better than one’s peers (or to at least not look worse), then a conflict of interest arises where the actual quality of the software competes with the perceived quality of one’s individual contribution.

4. It lowers morale
Individual performance metrics are stressors as well as a potential source of tension and discord among team members. Rather than emphasizing success and movement in a positive direction; rather than encouraging collaboration and teamwork; rather than fostering a feeling of joint ownership, they introduce the fear of punishment for failure; they discourage altruism, knowledge sharing, and generally working together; they incentivize people to mask their inexperience. They can single-handedly make an otherwise positive experience into a negative one. Developers can become less happy. And when morale is low, so is productivity.

5. It is a net value loss
I suppose I should address the elephant in the room at this point, so here we go…

Why would anyone want individual performance metrics? Is it to give everyone cookies and donuts and Clif bars relative to how awesome they are? Probably not; it’s much more likely an attempt to target the underperformers. It’s a gathering of “objective evidence” that people that you already perceive to totally suck in fact do.

I have yet to see any other reason put forth that makes sense. If you want to reward people for good performance, nobody is going to challenge you for “proof.” If you want to manage resources in such a way that teams are balanced in skill and capability then you can do better than rely on fuzzy math to do it.

So this endeavor adds rather little value and carries a rather high cost. As mentioned, people will game the system, focusing on perception at the expense of ultimate quality. There’s the problem of lower morale and in turn lower productivity. These result in higher costs for the business to get what it needs. And the supposed value-add? The “addition by subtraction” of removing an underperforming team member? It’s far from a guarantee, not least because the system is vulnerable to gaming from all angles.

So what happens is the person who you think totally sucks merely continues to totally suck except now you’ve introduced a whole bunch more problems to worry about in terms of damaging team dynamics.

6. It goes against the principle of empowered, self-organizing teams
If a team is entrusted with delivering software then why should that team be burdened with a handicap like “individual velocity” just for the sake of gathering evidence against “bad” developers? Let the team figure out how to deliver the best software it can, let people collaborate as they see fit, and if the team decides a member is having a negative effect then trust the team to make that decision. (Naturally, asking for proof in the form of some numbers can take you down a very ugly path of infighting, subterfuge, and sabotage as people try to game a flawed system in conflicting ways. So don’t do it.) Find someone empowered to manage team personnel and remove the problem member if the team deems it necessary.

To conclude, individual performance metrics look to be a terribly unproductive endeavor at best and a highly damaging one at worst. Development teams, especially ones that have a good level of transparency built into their approach, already have no secrets about who’s good and who sucks. Efforts can be made to let team members help and improve each other and remove negative members if necessary, or efforts can be made to undermine what a productive dev team should be all about. Don’t fall into a trap of going for the latter.


As an afterword, suppose you absolutely have to obtain some “quantitative” measurement for individual performance reviews due to some stinky contract that was signed eons ago when software was written by fish. The challenge is to come up with metrics that do not compromise the principles of agility, trust, and self-organization that are worth so much to a dev shop — metrics that don’t introduce a conflict of interest. This is actually a bit of a puzzle and I will think on it some. I’ll post my thoughts in my next entry.

Greetings

This is the beginning of my blog. I plan on writing primarily about the following topics:

  • software (various topics)
  • travel, photography
  • fitness
  • movies, music, and other media entertainment
  • cars
  • the NBA (possibly through another blog of mine)
  • whatever else I please. this is my blog I do what I want!

Everyone is welcome to post comments.

Also, I plan on playing with the color scheme of this blog for a little while so if things are hard to read it’s probably because I haven’t bothered to style them intelligently yet. Give it some time.

Cheers
–Sciros (I also go by “Sci” or sometimes even my real name o_O)