Monday 8 September 2003 — This is more than 21 years old. Be careful.
A friend (and former boss) wrote to me asking for an estimate of my average productivity per month in terms of raw lines of source code. I complied (source code control systems are great for mining this sort of information), but also started thinking about this metric.
We all know it’s a blunt instrument, full of caveats and gotchas:
- How do you compare different languages? 100 lines of COBOL equals 5 lines of Python.
- How do you compare different programming tasks? Implementing multithreaded database access with transaction support takes more care than writing diagnostic logging utilities.
- How do you compare different phases of the project? The end game requires slowing down the coding and taking far more care with every change.
- Do you only count lines created, or do you include lines changed? The same line can be edited many times in a month. Which programmer gets the credit?
- How do you deal with someone who spends a good chunk of time mentoring?
- Do you count comment lines? (Of course you do: they are an important part of writing software.) Do you count blank lines?
Even once you get past all these details, do you even want higher numbers? I once worked with an engineer who was clearly the most productive guy on the team in terms of lines per month. He also had a reputation for writing buggy, poorly designed code, and was difficult to communicate with. The whole team would have been better off if he had slowed down, thought about each line a little bit more, written fewer lines each month, and talked to everyone more often.
You have to be careful what you wish for. We all know that if you impose strict metrics on people, they will organize their work to maximize the metric. An extreme example: I heard a horror story years ago about a disk drive manufacturer that was having difficulty meeting their shipment quotas. The manager made it clear that he wanted the quotas met, and didn’t want to hear any bad news. In desperation, they started shipping boxes with bricks in them instead of drives, or so I was told. Maybe it’s an apocryphal story. But it has the ring of truth to it, and all of us have been in situations that smelled a little bit like that.
I don’t think we’ll ever get away from the lines per month metric. It’s just too easy to compute. The best we can do is take it all with a huge grain of salt, and use it as one piece of information alongside many others.
Comments
I've done the cvs annotate | awk trick for project line counts, mostly for amusement - on one project, there were 15kloc for me, 15kloc for the other primary developer, and maybe 100 lines total for the other three (who were really customer support, they had just contributed bugfixes at times.) The surprising bit was that the project had split so evenly, even though there was nothing inherently symmetric about it...
Generally, though, line counts have been either (1) external reporting (which generally didn't lead to any decisions based on the numbers) (2) sanity check of perceptions of level of involvement [last-change implying most-familiar, and a check on "truck factor" - if one guy has made 75% of the changes, there's not enough broad knowledge of the code among the team.] There was one exception where they attempted to use line counts to predict bug counts, but it wasn't especially successful (an attempt to apply big-iron development methodologies to a small "moderately agile" team, one of the downsides of having a startup get bought :)
I too use this word the way that you do, but I thought that I would be pedantic and boring today.
All in all, I think I've done a good day's work!
It's like telephone support, where productivity is often measured by the number of calls taken. You can take a lot of calls if you hang up on everyone, does that make you the best telephone support person??
It's a false measurement, and often put in place by people that don't understand the problems being solved.
But each fix adds more code, creating more bugs. When does it end... ;-)
This is really an extended study in the law of unintended consequences in human systems.
-2000 Lines Of Code
"...These problems became so severe that a controlled study in 1994 that used both LOC metrics and function point metrics for 10 versions of the same application coded in 10 languages reached an alarming conclusion: LOC metrics violated the standard assumptions of economic productivity so severely that using LOC metrics for studies involving more than one programming language comprised professional malpractice."
Src: Caper Jones "A SHORT HISTORY OF LINES OF CODE (LOC) METRICS Version 2.0, May 10, 2008"
Add a comment: