Six Easy Pieces of Quantitatively Analyzing Open Source Projects

I’ll be giving a talk at the Open Source Business Conference 2009 in San Francisco on March 24, 2009. The talk will present an easily accessible summary of our data-driven analytical work on how open source software development works. Here is the abstract:

For the first time in the history of software engineering, we can both broadly and deeply analyze the behavior and dynamics of software development projects. This has become possible because of open source, which is publicly developed software. In this presentation, I will discuss our recent findings about open source software, its development process, and programmer behavior. I also discuss the challenges we encountered when quantitatively mining software repositories for such insights.

Reference: Talk at OSBC 2009. San Francisco, CA: 2009.

Available as a PDF file.

2 Replies to “Six Easy Pieces of Quantitatively Analyzing Open Source Projects”

  1. Really loved this overview, Dirk; I hope that lots of people see it. One thing I’d like to see is a comparison to non-open source project metrics. For example, you suggest that open source projects have hit the sweet spot when it comes to commenting. What’s this in comparison to?
    I realize that one of the advantages of studying open source is that you have a large swath of data you can study in the first place. Still, some comparison points would be useful.

  2. Hey Eugene, thanks for the comments (and the retweet!) There are two aspects to your question and an answer.
    First, I agree we absolutely need to compare open source with corporate software development. Right now, we are crawling SAP’s own internal source code base to be able to make that comparison. I hope we’ll have that data soon!
    As to hitting the sweet spot, I’m arguing that volunteer communities of well-working open source projects neither over-comment nor under-comment. They are unlikely to over-comment because it is a waste of resources and nobody tells them to do it, and they are unlikely to under-comment, because than the project might not be around any longer. (Of course these are still averages.)
    I discussed the details of the commenting work elsewhere on this blog too.

Leave a Reply