The Commenting Practice of Open Source (Completed, for Now)

For now, the final paper in this sequence of short publications of how open source software projects document their code. The paper is basically a more comprehensive summary of prior articles, with a bit more of data. Here the abstract and reference:

Abstract: The development processes of open source software are different from traditional closed source development processes. Still, open source software is frequently of high quality. This raises the question of how and why open source software creates high quality and whether it can maintain this quality for ever larger project sizes. In this paper, we look at one particular quality indicator, the density of comments in open source software code. We find that successful open source projects follow a consistent practice of documenting their source code, and we find that the comment density is independent of team and project size.

Reference: Oliver Arafat, Dirk Riehle. “The Commenting Practice of Open Source.” In Companion to the Proceedings of the 22nd Conference on Object Oriented Programming Systems, Languages, and Application(OOPSLA Onward! 2009). ACM Press, 2009. Page 857-864.

Continue reading

Design Pattern Density Defined

Abstract: Design pattern density is a metric that measures how much of an object-oriented design can be understood and represented as instances of design patterns. Expert developers have long believed that a high design pattern density implies a high maturity of the design under inspection. This paper presents a quantifiable and observable definition of this metric. The metric is illustrated and qualitatively validated using four real-world case studies. We present several hypotheses of the metric’s meaning and their implications, including the one about design maturity. We propose that the design pattern density of a maturing framework has a fixed point and we show that if software design patterns make learning frameworks easier, a framework’s design pattern density is a measure of how much easier it will become.

Reference: Dirk Riehle. “Design Pattern Density Defined.” In Proceedings of the 2009 Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA Onward! ’09). ACM Press, 2009. Page 469-480.

Available as a PDF file.

Commercial Open Source Paper Appears in LNBIP 36

My AMCIS 2009 paper on the Commercial Open Source Business Model will be republished in an LNBIP (Lecture Notes in Business Information Processing) issue by Springer-Verlag. The reference is:

Dirk Riehle. “The Commercial Open Source Business Model.” In Value Creation in e-Business Management, LNBIP 36. Edited by M.L. Nelson et al. Springer-Verlag, 2009. Page 18–30.

If you feel like it, you can acquire a commercial license by purchasing the paper download from Springer. Alternatively, you can use the community edition linked to above.

The Commercial Open Source Business Model

Abstract: Commercial open source software projects are open source software projects that are owned by a single firm that derives a direct and significant revenue stream from the software. Commercial open source at first glance represents an economic paradox: How can a firm earn money if it is making its product available for free as open source? This paper presents the core properties of commercial open source business models and discusses how they work. Using a commercial open source approach, firms can get to market faster with a superior product at lower cost than possible for traditional competitors. The paper shows how these benefits accrue from an engaged and self-supporting user community. Lacking any prior comprehensive reference, this paper is based on an analysis of public statements by practitioners of commercial open source. It forges the various anecdotes into a coherent description of revenue generation strategies and relevant business functions.

Reference: Dirk Riehle. “The Commercial Open Source Business Model.” In Proceedings of the Fifteenth Americas Conference on Information Systems (AMCIS 2009). AIS Electronic Library, 2009. Paper 104.

Available as HTML or PDF file.

Modeling Micro-Blogging Adoption in the Enterprise

Abstract: Despite a broad range of collaboration tools already available, enterprises continue to look for ways to improve internal and external communication. Micro-blogging is such a new communication channel with some considerable potential to improve intra-firm transparency and knowledge sharing. However, the adoption of such social software presents certain challenges to enterprises. Based on the results of four focus group sessions, we identified several new constructs to play an important role in the micro-blogging adoption decision. Examples include privacy concerns, communication benefits, perceptions regarding signal-to-noise ratio, as well codification effort. Integrating these findings with common views on technology acceptance, we formulate a model to predict the adoption of a micro-blogging system in the workspace. Our findings serve as an important guideline for managers seeking to realize the potential of micro-blogging in their company.

Reference: Oliver Günther, Hanna Krasnova, Dirk Riehle, Valentin Schöndienst. “Modeling Micro-Blogging Adoption in the Enterprise.” In Proceedings of the Fifteenth Americas Conference on Information Systems (AMCIS 2009). AIS Electronic Library, 2009. Paper 544.

Available as a PDF file.

Open Collaboration within Corporations Using Software Forges

Abstract: Over the past 10 years, open source software has become an important cornerstone of the software industry. Commercial users have adopted it in standalone applications, and software vendors are embedding it in products. Surprisingly then, from a commercial perspective, open source software is developed differently from how corporations typically develop software. Research into how open source works has been growing steadily. One driver of such research is the desire to understand how commercial software development could benefit from open source best practices. Do some of these practices also work within corporations? If so, what are they, and how can we transfer them?

Keywords: Inner source, firm-internal open source, corporate source, software forge, open collaboration, open source.

Reference: Dirk Riehle, John Ellenberger, Tamir Menahem, Boris Mikhailovski, Yuri Natchetoi, Barak Naveh, Thomas Odenwald. “Open Collaboration within Corporations Using Software Forges.” IEEE Software, vol. 26, no. 2 (March/April 2009). Page 52-58.

Available as HTML or as a PDF file.

Estimating Commit Sizes Efficiently

Author: Philipp Hofmann, Dirk Riehle

Abstract: The quantitative analysis of software projects can provide insights that let us better understand open source and other software development projects. An important variable used in the analysis of software projects is the amount of work being contributed, the commit size. Unfortunately, post-facto, the commit size can only be estimated, not measured. This paper presents several algorithms for estimating the commit size. Our performance evaluation shows that simple, straightforward heuristics are superior to the more complex text-analysis-based algorithms. Not only are the heuristics significantly faster to compute, they also deliver more accurate results when estimating commit sizes. Based on this experience, we design and present an algorithm that improves on the heuristics, can be computed equally fast, and is more accurate than any of the prior approaches.

Reference: In Proceedings of the 5th International Conference on Open Source Systems (OSS 2009). Springer Verlag, 2009. Page 105-115.

Available as a PDF file.

The Comment Density of Open Source Software Code

Author: Oliver Arafat, Dirk Riehle

Abstract: The development processes of open source software are different from traditional closed source development processes. Still, open source software is frequently of high quality. Thus, we are investigating how open source software creates high quality and whether it can maintain this quality for ever larger project sizes. In this paper, we look at one particular quality indicator, the density of comments in open source software code. In a large-scale study of more than 5,000 projects, we find that active open source projects document their source code, and we find that the comment density is independent of team and project size, but not of project age. In future work, we intend to correlate comment density with project success or failure.

Reference: In Companion to Proceedings of the 31st International Conference on Software Engineering (ICSE 2009). IEEE Press, 2009. Page 195-198.

Available as a PDF file.

PLoP Proceedings now in ACM Digital Library

Thanks to the efforts of Joe Yoder and Ralph Johnson, the proceedings of the 2006 conference on Pattern Languages of Programming have been archived in the ACM Digital Library. I expect the 2007 and future proceedings to be made available through the ACM DL as well. Whether it will be applied to past years is unclear.

Continue reading

The Commit Size Distribution of Open Source Software

Authors: Oliver Arafat, Dirk Riehle

Abstract: With the growing economic importance of open source, we need to improve our understanding of how open source software development processes work. The analysis of code contributions to open source projects is an important part of such research. In this paper we analyze the size of code contributions to more than 9,000 open source projects. We review the total distribution and distinguish three categories of code contributions using a size-based heuristic: single focused commits, aggregate team contributions, and repository refactorings. We find that both the overall distribution and the individual categories follow a power law. We also suggest that distinguishing these commit categories by size will benefit future analyses.

Reference: In Proceedings of the 42nd Hawaiian International Conference on System Sciences (HICSS-42). IEEE Press, 2009. Page 1-8.

Available as a PDF file.