Dirk Riehle's Industry and Research Publications

Estimating Commit Sizes Efficiently [OSS 2009]

Authors: Philipp Hofmann, Dirk Riehle

Abstract: The quantitative analysis of software projects can provide insights that let us better understand open source and other software development projects. An important variable used in the analysis of software projects is the amount of work being contributed, the commit size. Unfortunately, post-facto, the commit size can only be estimated, not measured. This paper presents several algorithms for estimating the commit size. Our performance evaluation shows that simple, straightforward heuristics are superior to the more complex text-analysis-based algorithms. Not only are the heuristics significantly faster to compute, they also deliver more accurate results when estimating commit sizes. Based on this experience, we design and present an algorithm that improves on the heuristics, can be computed equally fast, and is more accurate than any of the prior approaches.

Reference: In Proceedings of the 5th International Conference on Open Source Systems (OSS 2009). Springer Verlag, 2009. Page 105-115.

Available as a PDF file.

Newsletter subscription


  1. Dirk Riehle Avatar

    Hi Michel: This is indeed a broad brush analysis. Any application to an individual project only should be done very cautiously. We really only use it on the scale of “all open source projects” which us right now is data for about 10000 projects.

  2. Michel S. Avatar

    Very interesting paper! I wonder whether the regression coefficients would vary, within one project, over time (unfortunately the dataset used in the paper does not appear to allow this analysis). The reasoning being twofold: developer turnover, and changing coding practices.
    With smaller projects, there is always the problem of enforcing indentation and coding styles too (and, most annoyingly for cross-platform projects, erroneously large diffs due to line ending differences).

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.


Share the joy

Share on LinkedIn

Share by email

Share on X (Twitter)

Share on WhatsApp

Featured startups

QDAcity makes collaborative qualitative data analysis fun and easy.
EDITIVE makes document collaboration more effective.

Featured projects

Making free and open data easy, safe, and reliable to use
Bringing business intelligence to engineering management
Making open source in products easy, safe, and fun to use