The Comment Density of Open Source Software Code

Author: Oliver Arafat, Dirk Riehle

Abstract: The development processes of open source software are different from traditional closed source development processes. Still, open source software is frequently of high quality. Thus, we are investigating how open source software creates high quality and whether it can maintain this quality for ever larger project sizes. In this paper, we look at one particular quality indicator, the density of comments in open source software code. In a large-scale study of more than 5,000 projects, we find that active open source projects document their source code, and we find that the comment density is independent of team and project size, but not of project age. In future work, we intend to correlate comment density with project success or failure.

Reference: In Companion to Proceedings of the 31st International Conference on Software Engineering (ICSE 2009). IEEE Press, 2009. Page 195-198.

Available as a PDF file.

The Commit Size Distribution of Open Source Software

Authors: Oliver Arafat, Dirk Riehle

Abstract: With the growing economic importance of open source, we need to improve our understanding of how open source software development processes work. The analysis of code contributions to open source projects is an important part of such research. In this paper we analyze the size of code contributions to more than 9,000 open source projects. We review the total distribution and distinguish three categories of code contributions using a size-based heuristic: single focused commits, aggregate team contributions, and repository refactorings. We find that both the overall distribution and the individual categories follow a power law. We also suggest that distinguishing these commit categories by size will benefit future analyses.

Reference: In Proceedings of the 42nd Hawaiian International Conference on System Sciences (HICSS-42). IEEE Press, 2009. Page 1-8.

Available as a PDF file.

The Economic Motivation of Open Source Software: Stakeholder Perspectives

Author: Dirk Riehle

Abstract: Open source software has changed the rules of the game, impacting significantly the economic behavior of stakeholders in the software ecosystem. In this new environment, developers strive to be committers, vendors feel pressure to produce open source products, and system integrators anticipate boosting profits.

Reference: IEEE Computer, vol. 40, no. 4 (April 2007). Page 25-32.

Available as a PDF file or in HTML.

(This is a repost due to popular demand.)

JUnit 3.8 Documented Using Collaborations

Author: Dirk Riehle

Abstract: This paper describes the design of the unit testing framework JUnit v3.8. The documentation technique employed is an enhanced version of collaboration-based design, also known as role modeling. In collaboration-based design, objects are viewed as playing multiple roles in different contexts, and different contexts are viewed as task specific collaborations. The documentation accounts for every method in the JUnit 3.8 framework by assigning it to a role. It thereby investigates whether roles and collaborations can serve as basic units of functionality provided by a design like a framework. Such a measure of functionality can serve multiple purposes, for example estimating implementation efforts or measuring complexity.

Keywords: JUnit 3.8 Documentation

Reference: In Software Engineering Notes Volume 33, Issue 2 (March 2008), Article No 5. ACM Press, 2008.

Available as a PDF file.

The Total Growth of Open Source (Abstract)

Authors: Amit Deshpande, Dirk Riehle

Abstract: Software development is undergoing a major change away from a fully closed software process towards a process that incorporates open source software in products and services. Just how significant is that change? To answer this question we need to look at the overall growth of open source as well as its growth rate. In this paper, we quantitatively analyze the growth of more than 5000 active and popular open source software projects. We show that the total amount of source code as well as the total number of open source projects is growing at an exponential rate. Previous research showed linear and quadratic growth in lines of source code of individual open source projects. Our work shows that open source is expanding into new domains and applications at an exponential rate.

Reference: In Proceedings of the Fourth Conference on Open Source Systems (OSS 2008). Springer Verlag, 2008. Page 197-209.

Available as a PDF file or in HTML, also see the Addendum.

Continuous Integration in Open Source Software Development

Authors: Amit Deshpande, Dirk Riehle

Abstract: Commercial software firms are increasingly using and contributing to open source software. Thus, they need to understand and work with open source software development processes. This paper investigates whether the practice of continuous integration of agile software development methods has had an impact on open source software projects. Using fine-granular data from more than 5000 active open source software projects we analyze the size of code contributions over a project’s life-span. Code contribution size has stayed flat. We interpret this to mean that open source software development has not changed its code integration practices. In particular, within the limits of this study, we claim that the practice of continuous integration has not yet significantly influenced the behavior of open source software developers.

Reference: In Proceedings of the Fourth Conference on Open Source Systems (OSS 2008). Springer Verlag, 2008. Page 273-280.

Available as a PDF file.

Towards End-User Programming With Wikis

Abstract: When business software fails to provide the desired functionality, users typically turn to spreadsheets to perform simple but general computational tasks. However, spreadsheets enforce a view of the world that consists mostly of tables and numbers rather than the domain concepts users have in mind. We are using wikis as a platform for empowering end-users to perform computational tasks of their choice. This paper discusses how core properties of wikis can support end-user programming. We illustrate our approach using wiki prototype software for working with business objects as made available by SAP’s business application suite.

Reference: Craig Anslow, Dirk Riehle. In Proceedings of the Fourth Workshop in End-User Software Engineering (WEUSE IV). IEEE Press, 2008. Page 61-65.

Available as a PDF file.

An XML Interchange Format for Wiki Creole 1.0

Abstract: Wikis have become an important application on the web and in the enterprise, yet there are no interoperability standards between different wiki engines. We present the first complete XML representation format of Wiki Creole 1.0. Wiki Creole is a community standard for wiki markup, the language used to write wiki pages. This report presents the complete XML representation format using a validating XML schema. In addition we present XSLT definitions for transforming the XML representations to XHTML on the one hand and for transforming the XML representations to Wiki Creole markup on the other hand. Our work shows how using XML technologies we can make wiki interchange, wiki upgrading, and wiki conversion independent from a specific wiki engine implementation.

Reference: Martin Junghans, Dirk Riehle, Umit Yalcinalp. In ACM SIGWEB Newsletter, Volume 2007, Issue Winter (Winter 2007), Article No. 5. ACM Press, 2007.

Available as a PDF file.

An EBNF Grammar for Wiki Creole 1.0

Abstract: Today’s wiki engines are not interoperable. This is an unfortunate consequence of the lack of rigorously specified standards. This technical report presents a complete and validated EBNF-based grammar for Wiki Creole, a community standard for wiki markup. Wiki Creole is also the only standard currently available. Wiki Creole is being specified using prose, leading to inconsistencies and ambiguities. Our grammar uncovered those ambiguities which we fed back into the specification process. The Wiki Creole grammar presented in this report makes the creation of Wiki Creole parsers simple using parser generators, ANTLR in our case. Using a precise specification of wiki markup lets us decouple wiki editors from wiki storage from further wiki processing tools. Based on this decoupling layer we expect innovation on these different parts to proceed independently and at a faster pace than before.

Reference: Martin Junghans, Dirk Riehle, Rama Gurram, Matthias Kaiser, Mario Lopes, Umit Yalcinalp. In ACM SIGWEB Newsletter, Volume 2007, Issue Winter (Winter 2007), Article No. 4. ACM Press, 2007.

Available as a PDF file.