Dirk Riehle's Industry and Research Publications

The Open Source Big Bang

Open source is not only software, but also an approach to software development. The public nature of open source projects lets us show how open source software development scales to the largest project sizes. The following figure illustrates the scalability of open source software development. I call it the big bang of open source.

The figure shows the growth of active well-working open source projects of all sizes over time, as captured in our database [1]. Each line represents a particular year, from 1995 to 2008. Each line shows how many projects of a particular size existed in that given year. The x-axis shows the size of projects, and the y-axis shows the number of projects of that size. So, for 1995, we can see that there were 10 projects of size 1 committer. (The scales are logarithmic [2].) Using the number of registered committers as a proxy for a project’s size is most certainly a conservative assumption. In 1995, there were also 4 projects of size 10 committers. In 1996, there were already more smaller projects and also more larger projects.

As you can see, the number of smallest projects (one committer) kept growing over time and reached about 3.200 in 2008 in our sample. At the same time, some of these smallest projects kept growing, migrating to the right in the figure. In 2008, there were 10 projects of size 1.000 committers! (While in 1995 there were none.) I find this continued growth of open source intriguing. Speculating from the expansion of the year lines there is a constant supply of new projects, and each project grows to the size right for it, including some very large project sizes.

Mathematically, of interest is the gradient over the year lines. The gradient is the formula that captures the year-over-year growth. I call the figure an illustration of the open source big bang, because the gradient captures the expansion speed of the growing open source universe. We have not yet been able to develop an appropriate mathematical model for this apparent growth. However, the figure illustrates how open source projects consistently scale to the largest project sizes. We may not yet know exactly why, but we are measuring that they do.

If you liked this blog post, you might also like reading about

  1. Open source practices for internal software development (a.k.a. inner source)
  2. How to go to to market with an open source strategy
  3. The economic case for open source foundations
  4. My current presentations on open source

Footnotes and References

[1] The data used to generate the figure was taken from an Ohloh.net database snapshot from March 2008. That snapshot contains about 30% of all active open source projects at that time, using Carlo Daffara’s estimate of total population as well as activity. The year lines in the figure are not the result of precise mathematical modeling, rather they are a linear regression fitted into the logarithmic data. Thus, this figure serves eyeballing purposes only. The figure itself was created by my Ph.D. student Carsten Kolassa.

[2] A short reminder on logarithms, in case it got rusty: 10^0 = 1, 10^0.6 = 4 (roughly), 10^1 = 10, etc. The exponents are to be found on the x and y-axes.

Newsletter subscription


  1. Dirk Riehle Avatar

    Hi Marc! Simple answer: We got our database snapshot in March 2008 and none thereafter. (Nor did anyone else.) We are all lined-up outside the doors of BDS…

  2. Marc Laporte Avatar

    Hi Dirk!
    What about after 2008?
    According to this Ohloh.net forum post: https://www.ohloh.net/forums/11/topics/6283, there seems to be something changing in the trend.
    Best regards,

  3. simi Avatar

    The Big Bang in OSS… that’s quite a cool idea 🙂

  4. […] Anspruchsvoller wäre zB die Modellierung des Open-Source-Wachstums, wie hier illustriert: https://dirkriehle.com/2011/06/21/the-open-source-big-bang/ […]

  5. […] Dirk Riehle explained the open source big […]

  6. Dirk Riehle Avatar

    @grismar As the footnote says, they are a simple linear regression on the log-log scale data set (point cloud you mention). Thus, this is not mathematically precise; I consider it sufficiently good for eyeballing purposes only. The lines themselves follow a power law nicely, but we have not yet been able to come up with a proper mathematical model of it.

  7. Grismar Avatar

    I like the visualization, but I do feel the straight lines are a bit misleading Are they the central line through some point cloud? Or do they just connect the upper and lower extremes of the dataset for each year? It would be nice to see a similar colored plot of points besides this one, just to get a sense for the kind of data it was based on.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.


Share the joy

Share on LinkedIn

Share by email

Share on X (Twitter)

Share on WhatsApp

Featured startups

QDAcity makes collaborative qualitative data analysis fun and easy.
EDITIVE makes document collaboration more effective.

Featured projects

Making free and open data easy, safe, and reliable to use
Bringing business intelligence to engineering management
Making open source in products easy, safe, and fun to use