Lines of code per month

Monday 8 September 2003This is more than 21 years old. Be careful.

A friend (and former boss) wrote to me asking for an estimate of my average productivity per month in terms of raw lines of source code. I complied (source code control systems are great for mining this sort of information), but also started thinking about this metric.

We all know it’s a blunt instrument, full of caveats and gotchas:

  • How do you compare different languages? 100 lines of COBOL equals 5 lines of Python.
  • How do you compare different programming tasks? Implementing multithreaded database access with transaction support takes more care than writing diagnostic logging utilities.
  • How do you compare different phases of the project? The end game requires slowing down the coding and taking far more care with every change.
  • Do you only count lines created, or do you include lines changed? The same line can be edited many times in a month. Which programmer gets the credit?
  • How do you deal with someone who spends a good chunk of time mentoring?
  • Do you count comment lines? (Of course you do: they are an important part of writing software.) Do you count blank lines?

Even once you get past all these details, do you even want higher numbers? I once worked with an engineer who was clearly the most productive guy on the team in terms of lines per month. He also had a reputation for writing buggy, poorly designed code, and was difficult to communicate with. The whole team would have been better off if he had slowed down, thought about each line a little bit more, written fewer lines each month, and talked to everyone more often.

You have to be careful what you wish for. We all know that if you impose strict metrics on people, they will organize their work to maximize the metric. An extreme example: I heard a horror story years ago about a disk drive manufacturer that was having difficulty meeting their shipment quotas. The manager made it clear that he wanted the quotas met, and didn’t want to hear any bad news. In desperation, they started shipping boxes with bricks in them instead of drives, or so I was told. Maybe it’s an apocryphal story. But it has the ring of truth to it, and all of us have been in situations that smelled a little bit like that.

I don’t think we’ll ever get away from the lines per month metric. It’s just too easy to compute. The best we can do is take it all with a huge grain of salt, and use it as one piece of information alongside many others.

Comments

[gravatar]
I got the same email and have (so far) declined to answer. Why? Because we have been in a long-term planning cycle and my "productivity" (as bluntly measured in "lines-per-month") was almost zero. For my own sanity (I would much rather be coding than going to interminable meetings and writing fucking specs) and for the sake of his bonheaded "analysis", I won't send him my numbers (L: if you are listening, it is FUCKING ZERO. Are you happy?!)
[gravatar]
Certainly many of my most productive coding days have involved a net deletion of code. Simplify! Simplify!

I've done the cvs annotate | awk trick for project line counts, mostly for amusement - on one project, there were 15kloc for me, 15kloc for the other primary developer, and maybe 100 lines total for the other three (who were really customer support, they had just contributed bugfixes at times.) The surprising bit was that the project had split so evenly, even though there was nothing inherently symmetric about it...

Generally, though, line counts have been either (1) external reporting (which generally didn't lead to any decisions based on the numbers) (2) sanity check of perceptions of level of involvement [last-change implying most-familiar, and a check on "truck factor" - if one guy has made 75% of the changes, there's not enough broad knowledge of the code among the team.] There was one exception where they attempted to use line counts to predict bug counts, but it wasn't especially successful (an attempt to apply big-iron development methodologies to a small "moderately agile" team, one of the downsides of having a startup get bought :)
[gravatar]
BTW, I am not sure that you used the word "apocryphal" correctly. A common misuse is to think that it has something to do with "apocalypse". It's derivative is on "apocrypha" which is the candidate chapters of the modern bible that did NOT make the cut into the final product, as defined by the Catholic Church, I believe.

I too use this word the way that you do, but I thought that I would be pedantic and boring today.
[gravatar]
I have written three lines of code today. I've deleted several more, and I've also spent a couple of hours sitting here with a pencil and paper trying to figure out how my code should be designed in a new(ish) project.

All in all, I think I've done a good day's work!
[gravatar]
I meant apocryphal as in, "it might not be true": apocryphal.
[gravatar]
IBM many years ago used to be all over this, they used to ask developers how many KLOCs they had written (1000 Lines of Code).

It's like telephone support, where productivity is often measured by the number of calls taken. You can take a lot of calls if you hang up on everyone, does that make you the best telephone support person??
It's a false measurement, and often put in place by people that don't understand the problems being solved.
[gravatar]
One of the most productive people I know spent weeks deleting tens of thousands lines of (someone else's) badly written code and replaced them with his own few thousand lines of elegantly written code. Is this negative KLOC productivity? According to some worthless metric, I suppose it is.
[gravatar]
it's like measuring the quality of a book by the number of words in it. somehow that metric (i always say metric now whenever i mean a numeric quantity, it make me really sound smart) seems quite apart from the point of the exercise.
[gravatar]
A related issue with code metrics is to use the KLOC figure to compute other metrics. For example, Windows XP contains roughly 45 million lines of code. Some software metric mavens claim that bugs are agnostic. No matter the type of application, you can count on finding 30 bugs per 1000 lines of code on average. So Windows XP contained roughly 1.3 million bugs. Think they found them all? I've also read that the XP team had about 2700 developers. That works out to only 500 bugs per developer to fix.

But each fix adds more code, creating more bugs. When does it end... ;-)
[gravatar]
The most compelling reference on measurement systems gone awry, and a bit of advice on how to create measurement systems that work, is in: Robert. D. Austin's "Measuring and Managing Performance in Organizations"

This is really an extended study in the law of unintended consequences in human systems.
[gravatar]
As a largely self-taught late-starter, I've never been much with "currently accepted methodologies", so there's no way to gauge my own productivity in a wider sense. However, I am minded of an occasion when I replaced 4300 non-blank, non-comment lines of JavaScript on a single web page with 28 lines (including white space and comments) during a rewrite. On a lines-produced basis, the "coder" whose work I revised would have rated a little higher than me, I'd guess....
[gravatar]
I found this quote from http://quote.wikipedia.org/wiki/Bill_Gates: "Measuring programming progress by lines of code is like measuring aircraft building progress by weight." - Bill Gates
[gravatar]
Reading a tale on folklore.org, I was reminded of this weblog post (how's that for memory ;o) ). Here's the relevant folklore story:

-2000 Lines Of Code
[gravatar]
Check out this post (http://skysigal.xact-solutions.com/Blog/tabid/427/Default.aspx) where I was able to quote and reference Caper Jones stating (in 2008) that LOC is dead, dead, dead and to be avoided:

"...These problems became so severe that a controlled study in 1994 that used both LOC metrics and function point metrics for 10 versions of the same application coded in 10 languages reached an alarming conclusion: LOC metrics violated the standard assumptions of economic productivity so severely that using LOC metrics for studies involving more than one programming language comprised professional malpractice."

Src: Caper Jones "A SHORT HISTORY OF LINES OF CODE (LOC) METRICS Version 2.0, May 10, 2008"

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.