Developer Belief vs. Reality: The Case of the Commit Size Distribution

Abstract: The design of software development tools follows from what the developers of such tools believe is true about software development. A key aspect of such beliefs is the size of code contributions (commits) to a software project. In this paper, we show that what tool developers think is true about the size of code contributions is different by more than an order of magnitude from reality. We present this reality, called the commit size distribution, for a large sample of open source and selected closed source projects. We suggest that these new empirical insights will help improve software development tools by aligning underlying design assumptions closer with reality.

Reference: Dirk Riehle, Carsten Kolassa, Michel A. Salim. “Developer Belief vs. Reality: The Case of the Commit Size Distribution.” In Proceedings of Software Engineering 2012 (SE ’12). Springer Verlag, 2012.

The paper is available as a PDF file. The survey used in the paper is also available as a PDF file.

Business Risks and Governance of Open Source in Software Products (in German)

Titel: Geschäftsrisiken und Governance von Open-Source in Softwareprodukten

Zusammenfassung: In fast jedem Softwareprodukt, auch in großer Standardsoftware, sind heute Open-Source-Komponenten enthalten. Die Hersteller dieser Software müssen die Geschäftsrisiken, die mit der Integration von Open-Source-Software in kommerzielle Produkte verbunden sind, verstehen und vernünftig managen. Dieser Artikel zeigt ein Modell verschiedener rechtlicher, technischer und sozialer Risiken auf, die durch unkontrollierten Einsatz von Open-Source-Software entstehen und erläutert ausgewählte Erfolgsmethoden der Open-Source-Governance, die von führenden Firmen angewandt werden. Das Modell ist das Analyseergebnis von fünf mit großen deutschen Softwareherstellern geführten Interviews sowie weiterer Literaturrecherche.

Continue reading

Design and Implementation of the Sweble Wikitext Parser: Unlocking the Structured Data of Wikipedia

Abstract: The heart of each wiki, including Wikipedia, is its content. Most machine processing starts and ends with this content. At present, such processing is limited, because most wiki engines today cannot provide a complete and precise representation of the wiki’s content. They can only generate HTML. The main reason is the lack of well-defined parsers that can handle the complexity of modern wiki markup. This applies to MediaWiki, the software running Wikipedia, and most other wiki engines. This paper shows why it has been so difficult to develop comprehensive parsers for wiki markup. It presents the design and implementation of a parser for Wikitext, the wiki markup language of MediaWiki. We use parsing expression grammars where most parsers used no grammars or grammars poorly suited to the task. Using this parser it is possible to directly and precisely query the structured data within wikis, including Wikipedia. The parser is available as open source from http://sweble.org.

Keywords: Wiki, Wikipedia, Wiki Parser, Wikitext Parser, Parsing Expression Grammar, PEG, Abstract Syntax Tree, AST, WYSIWYG, Sweble.

Reference: Hannes Dohrn and Dirk Riehle. “Design and Implementation of the Sweble Wikitext Parser: Unlocking the Structured Data of Wikipedia.” In Proceedings of the 7th International Symposium on Wikis and Open Collaboration (WikiSym 2011). ACM Press, 2011.

The paper is available as a PDF file (preprint).

Technical Report on WOM: An Object Model for Wikitext

Abstract: Wikipedia is a rich encyclopedia that is not only of great use to its contributors and readers but also to researchers and providers of third party software around Wikipedia. However, Wikipedia’s content is only available as Wikitext, the markup language in which articles on Wikipedia are written, and whoever needs to access the content of an article has to implement their own parser or has to use one of the available parser solutions. Unfortunately, those parsers which convert Wikitext into a high-level representation like an abstract syntax tree (AST) define their own format for storing and providing access to this data structure. Further, the semantics of Wikitext are only defined implicitly in the MediaWiki software itself. This situation makes it difficult to reason about the semantic content of an article or exchange and modify articles in a standardized and machine-accessible way. To remedy this situation we propose a markup language, called XWML, in which articles can be stored and an object model, called WOM, that defines how the contents of an article can be read and modified.

Keywords: Wiki, Wikipedia, Wikitext, Wikitext Parser, Open Source, Sweble, Mediawiki, Mediawiki Parser, XWML, HTML, WOM

Reference: Hannes Dohrn and Dirk Riehle. WOM: An Object Model for Wikitext. University of Erlangen, Technical Report CS-2011-05 (July 2011).

The technical report is available as a PDF file.

Controlling and Steering Open Source Projects

The IEEE just published a short version of the “control points and steering mechanisms” article. Here is the abstract. Please see the original for more details.

Abstract: Open source software has become an important part of the software business. In a 2009 survey, Forrester Research found that 46 percent of all responding enterprises were using or implementing open source software. Moreover, in 2009, the Gartner Group estimated that by 2012, at least 80 percent of all software product firms will use open source software. Thus, it’s important to understand how software product firms depend on open source and how they manage that dependency to meet their business goals. There are three main types of software product firms. [...]

Continue reading

Lessons Learned from Using Design Patterns in Industry Projects

Abstract: Design patterns help in the creative act of designing, implementing, and documenting software systems. They have become an important part of the vocabulary of experienced software developers. This article reports about the author’s experiences and lessons learned with using and applying design patterns in industry projects. The article not only discusses how using patterns benefits the design of software systems, but also how firms can benefit further from developing a firm-specific design language and how firms can motivate and educate developers to learn and develop this shared language.

Keywords: Design pattern, pattern language, design language, design communication, design collaboration, design implementation, design documentation.

Reference: Dirk Riehle. “Lessons Learned from Using Design Patterns in Industry Projects.” In Transactions on Pattern Languages of Programming II, LNCS 6510. Springer-Verlag, 2011. Page 1-15.

The paper is available as a PDF file.

Micro-Blogging Adoption in the Enterprise: An Empirical Analysis

Abstract: Given the increasing interest in using social software for company-internal communication and collaboration, this paper examines drivers and inhibitors of micro-blogging adoption at the workplace. While nearly one in two companies is currently planning to introduce social software, there is no empirically validated research on employees’ adoption. In this paper, we build on previous focus group results and test our research model in an empirical study using Structural Equation Modeling. Based on our findings, we derive recommendations on how to foster adoption. We suggest that micro-blogging should be presented to employees as an efficient means of communication, personal brand building, and knowledge management. In order to particularly promote content contribution, privacy concerns should be eased by setting clear rules on who has access to postings and for how long they will be archived.

Reference: Valentin Schöndienst, Hanna Krasnova, Oliver Günther, and Dirk Riehle. “Micro-Blogging Adoption in the Enterprise: An Empirical Analysis.” In Proceedings of the 10th International Conference on Wirtschaftsinformatik (WI 2011). Page 931-940.

The paper is available in PDF form. You may also like the prior paper “Modeling Micro-Blogging Adoption in the Enterprise” as well as my “patterns of effective tweeting”.

The Single-Vendor Commercial Open Source Business Model

Update 2012-01-28: Springer changed the citation. The reference below reflects this.


Springer just republished our 2009 article on how vendor-owned open source works, again. Here is the abstract:

Abstract: Single-vendor commercial open source software projects are open source software projects that are owned by a single firm that derives a direct and significant revenue stream from the software. Single-vendor commercial open source at first glance represents an economic paradox: How can a firm earn money if it is making its product available for free as open source? This paper presents the core properties of single-vendor open source business models and discusses how they work. Using a single-vendor open source approach, firms can get to market faster with a superior product at lower cost than possible for traditional competitors. The paper shows how these benefits accrue from an engaged and self-supporting user community. Lacking any prior comprehensive reference, this paper is based on an analysis of public statements by practitioners of single-vendor open source. It forges the various anecdotes into a coherent description of revenue generation strategies and relevant business functions.

Reference: Dirk Riehle. “The Single-Vendor Commercial Open Source Business Model.” Information Systems and e-Business Management vol. 10, no. 1. Springer Verlag, 2012. Page 5-17.

You can read it online, download a PDF, or use the Springer site.

The Economic Case for Open Source Foundations

Abstract: An open source foundation is a group of people and companies that has come together to jointly develop community open source software. Examples include the Apache Software Foundation, the Eclipse Foundation, and the Gnome Foundation. There are many reasons why software development firms join and support a foundation. One common economic motivation is to save costs in the development of the software by spreading them over the participating parties. However, this is just the beginning. Beyond sharing costs, participating firms can increase their revenue through the provision and increased sale of complementary products. Also, by establishing a successful open source platform, software firms can compete more effectively across technology stacks and thereby increase their addressable market. Not to be neglected, community open source software is a common good, creating increased general welfare and hence goodwill for the involved companies.

Reference: Dirk Riehle. “The Economic Case for Open Source Foundations.” IEEE Computer, vol. 43, no. 1 (January 2010). Page 86-90.

Available as HTML or as a PDF file.

Talk Slides: Design Pattern Density Defined

Here the slides for my OOPSLA Onward! 2009 talk on “Design Pattern Density Defined.” First the abstract:

Design pattern density is a metric that measures how much of an object-oriented design can be understood and represented as instances of design patterns. Expert developers have long believed that a high design pattern density implies a high maturity of the design under inspection. This paper presents a quantifiable and observable definition of this metric. The metric is illustrated and qualitatively validated using four real-world case studies. We present several hypotheses of the metric’s meaning and their implications, including the one about design maturity. We propose that the design pattern density of a maturing framework has a fixed point and we show that if software design patterns make learning frameworks easier, a framework’s design pattern density is a measure of how much easier it will become.

The talk slides are available as a PDF file and are licensed under the Creative Commons BY-SA 3.0 license.

For a discussion of the talk’s contents I recommend reading the original article.