Open Collaboration within Corporations Using Software Forges (Abstract)

Abstract: Over the past 10 years, open source software has become an important cornerstone of the software industry. Commercial users have adopted it in standalone applications, and software vendors are embedding it in products. Surprisingly then, from a commercial perspective, open source software is developed differently from how corporations typically develop software. Research into how open source works has been growing steadily. One driver of such research is the desire to understand how commercial software development could benefit from open source best practices. Do some of these practices also work within corporations? If so, what are they, and how can we transfer them?

Keywords: Inner source, firm-internal open source, corporate source, software forge, open collaboration, open source.

Reference: Dirk Riehle, John Ellenberger, Tamir Menahem, Boris Mikhailovski, Yuri Natchetoi, Barak Naveh, Thomas Odenwald. “Open Collaboration within Corporations Using Software Forges.” IEEE Software, vol. 26, no. 2 (March/April 2009). Page 52-58.

Available as HTML or as a PDF file.

Estimating Commit Sizes Efficiently

Authors: Philipp Hofmann, Dirk Riehle

Abstract: The quantitative analysis of software projects can provide insights that let us better understand open source and other software development projects. An important variable used in the analysis of software projects is the amount of work being contributed, the commit size. Unfortunately, post-facto, the commit size can only be estimated, not measured. This paper presents several algorithms for estimating the commit size. Our performance evaluation shows that simple, straightforward heuristics are superior to the more complex text-analysis-based algorithms. Not only are the heuristics significantly faster to compute, they also deliver more accurate results when estimating commit sizes. Based on this experience, we design and present an algorithm that improves on the heuristics, can be computed equally fast, and is more accurate than any of the prior approaches.

Reference: In Proceedings of the 5th International Conference on Open Source Systems (OSS 2009). Springer Verlag, 2009. Page 105-115.

Available as a PDF file.

The Sweet Spot of Code Commenting in Open Source

In a large-scale study of active working open source projects we have found an average comment density of about 20% (= one comment line in five code lines). Given that much of open source remains volunteer work, we believe that a comment density of 20% represents the sweet spot of code commenting in open source projects: Neither are you over-documenting your code and hence wasting resources, nor are you under-documenting and thereby endangering your project.

Continue reading “The Sweet Spot of Code Commenting in Open Source”

The Comment Density of Open Source Software Code

Author: Oliver Arafat, Dirk Riehle

Abstract: The development processes of open source software are different from traditional closed source development processes. Still, open source software is frequently of high quality. Thus, we are investigating how open source software creates high quality and whether it can maintain this quality for ever larger project sizes. In this paper, we look at one particular quality indicator, the density of comments in open source software code. In a large-scale study of more than 5,000 projects, we find that active open source projects document their source code, and we find that the comment density is independent of team and project size, but not of project age. In future work, we intend to correlate comment density with project success or failure.

Reference: In Companion to Proceedings of the 31st International Conference on Software Engineering (ICSE 2009). IEEE Press, 2009. Page 195-198.

Available as a PDF file.

My Position on Privacy (Seven Things About Me)

Stormy Peters recently tagged me to post seven items about my life. This is a “viral” pyramid scheme; you are supposed to write these seven items and then tag seven other people to do the same. It is not the first time I got such a request; I also got tagged on Facebook to post 25 items about my life, and in general it is quite tempting to let your personal thoughts hang out on a blog like this.

I usually ignore such requests for reasons of privacy. Everything you do or say on the Internet can be used at some future point in time. The saying “on the Internet, nobody knows you are a dog” is completely wrong; on the Internet anyone with enough resources cannot only know you are a dog but can also know everything about you down to hereditary diseases—even things you may not know yourself. Or, as Scott McNealy is famous for saying: “You have no privacy. Get over it.”

Here then seven things about my take at privacy in the Internet age:

Continue reading “My Position on Privacy (Seven Things About Me)”

Call for Papers: Fourth Workshop on Wikis for Software Engineering

For your information, the fourth workshop on wikis for (in) software engineering. I’m on the program committee.


Fourth Workshop on “Wikis for Software Engineering”, May 16, 2009, at ICSE 2009, Vancouver, Canada, May 16-24, 2009

Submissions are due on January 26 (abstracts), February 2 (papers), 2009

Continue reading “Call for Papers: Fourth Workshop on Wikis for Software Engineering”

Six Easy Pieces of Quantitatively Analyzing Open Source Projects

I’ll be giving a talk at the Open Source Business Conference 2009 in San Francisco on March 24, 2009. The talk will present an easily accessible summary of our data-driven analytical work on how open source software development works. Here is the abstract:

For the first time in the history of software engineering, we can both broadly and deeply analyze the behavior and dynamics of software development projects. This has become possible because of open source, which is publicly developed software. In this presentation, I will discuss our recent findings about open source software, its development process, and programmer behavior. I also discuss the challenges we encountered when quantitatively mining software repositories for such insights.

Reference: Talk at OSBC 2009. San Francisco, CA: 2009.

Available as a PDF file.

Organizational Design and Engineering

Most readers of this blog are probably familiar with Conway’s Law. So named by Fred Brooks in the “Mythical Man-Month” and popularized by the saying “if you have four teams working on a compiler you will get a four-pass compiler.” This sociological observation stipulates that the social architecture of a corporation i.e. its organizational hierarchy determines the technical architecture of its products. My industry experience supports this observation and I made fun of it as early (for me) as 1996.

Now Rodrigo Magalhaes and Antonio Rito Silva of Technical University of Lisbon are expanding this and related observations into a full-blown research area called “organizational design and engineering”. You are invited to submit to and participate in

I’ll be helping as member of the IWODE ’09 program committee and as a member of the IJODE editorial board. Please find appended the Call for Papers for IWODE ’09 as a PDF file.

WikiSym 2009 Call for Papers (Submissions)

WikiSym 2009 Call for Papers

The International Symposium on Wikis and Open Collaboration

October 25-27, 2009, in Orlando, Florida, USA

In-cooperation with ACM SIGPLAN and ACM SIGWEB, co-located with ACM OOPSLA 2009, peer-reviewed and archived in the ACM Digital Library


The International Symposium on Wikis (WikiSym) is the premier conference dedicated to wikis and related open collaboration systems and processes.

Continue reading “WikiSym 2009 Call for Papers (Submissions)”