The Commenting Practice of Open Source (Completed, for Now)

For now, the final paper in this sequence of short publications of how open source software projects document their code. The paper is basically a more comprehensive summary of prior articles, with a bit more of data. Here the abstract and reference:

Abstract: The development processes of open source software are different from traditional closed source development processes. Still, open source software is frequently of high quality. This raises the question of how and why open source software creates high quality and whether it can maintain this quality for ever larger project sizes. In this paper, we look at one particular quality indicator, the density of comments in open source software code. We find that successful open source projects follow a consistent practice of documenting their source code, and we find that the comment density is independent of team and project size.

Reference: Oliver Arafat, Dirk Riehle. “The Commenting Practice of Open Source.” In Companion to the Proceedings of the 22nd Conference on Object Oriented Programming Systems, Languages, and Application(OOPSLA Onward! 2009). ACM Press, 2009. Page 857-864.

We got good feedback on prior articles and blog posts, see here:

I expect Oliver and me to summarize and extend on this work in a journal article (and upon Jacob’s special request, we’ll try to get boilerplate comments and headers removed from the comment line counting :-)).

The paper is available as a PDF file.

5 thoughts on “The Commenting Practice of Open Source (Completed, for Now)

  1. Hi Caleb—well possible. We measured the comment density of a broad and large number of open source projects, not any one in particular.

    I have wondered about this: Agile folks claim that if you have to comment, your code is poor… always thought that’s a bit of a stretch…

    As you look at the paper and data, you’ll see that the majority of comment contributions are small changes; one line changes, two line changes… So folks are evolving their comments as much as their code. Why wouldn’t that be the case with git or the kernel?

  2. @Caleb —

    Linus and his ilk may not believe in comments but they should.

    They should comment for many reasons:
    1) Document approaches to problem that were rejected (“Don’t try to do X here because this will cause Y”)

    2) WHY the code is written a certain way.

    3) reduce likelihood of misunderstanding the intent of the code.

    Over a large period of time, a block of code will change HOW it is solving a problem, but rarely does it change which problem it is solving.

    Arguments about “comments being out of date quickly” are always referring to low-value/low-content comments.

    Some posts I wrote on the subject:

    http://www.sworddance.com/blog/2009/03/04/not-commenting-code-is-dangerous-to-your-career/

    http://www.sworddance.com/blog/2009/03/16/when-to-comment/

    http://www.sworddance.com/blog/2009/03/04/code-review-7-comment-the-why-not-the-what/

    @Dirk —

    IDEs generate comments that follow a very definite pattern. You should do a bayesian analysis on the comments so you can have a conversation about number/lines of unique comments.

    Analysis over time would also be quite interesting and I think more insightful. In particular correlating the age of the project to the comment density as well as first and second derivatives to see the velocity, acceleration and acceleration of acceleration rates.

Leave a Reply