The Open Source Big Bang

Open source is not only soft­ware, but also an approach to soft­ware devel­op­ment. The pub­lic nature of open source projects lets us show how open source soft­ware devel­op­ment scales to the largest project sizes. The fol­low­ing fig­ure illus­trates the scal­a­bil­i­ty of open source soft­ware devel­op­ment. I call it the big bang of open source.

The fig­ure shows the growth of active well-working open source projects of all sizes over time, as cap­tured in our data­base [1]. Each line rep­re­sents a par­tic­u­lar year, from 1995 to 2008. Each line shows how many projects of a par­tic­u­lar size exist­ed in that given year. The x-axis shows the size of projects, and the y-axis shows the num­ber of projects of that size. So, for 1995, we can see that there were 10 projects of size 1 com­mit­ter. (The scales are log­a­rith­mic [2].) Using the num­ber of reg­is­tered com­mit­ters as a proxy for a project’s size is most cer­tain­ly a con­ser­v­a­tive assump­tion. In 1995, there were also 4 projects of size 10 com­mit­ters. In 1996, there were already more small­er projects and also more larg­er projects.

As you can see, the num­ber of small­est projects (one com­mit­ter) kept grow­ing over time and reached about 3.200 in 2008 in our sam­ple. At the same time, some of the­se small­est projects kept grow­ing, migrat­ing to the right in the fig­ure. In 2008, there were 10 projects of size 1.000 com­mit­ters! (While in 1995 there were none.) I find this con­tin­ued growth of open source intrigu­ing. Spec­u­lat­ing from the expan­sion of the year lines there is a con­stant sup­ply of new projects, and each project grows to the size right for it, includ­ing some very large project sizes.

Math­e­mat­i­cal­ly, of inter­est is the gra­di­ent over the year lines. The gra­di­ent is the for­mu­la that cap­tures the year-over-year growth. I call the fig­ure an illus­tra­tion of the open source big bang, because the gra­di­ent cap­tures the expan­sion speed of the grow­ing open source uni­verse. We have not yet been able to devel­op an appro­pri­ate math­e­mat­i­cal mod­el for this appar­ent growth. How­ev­er, the fig­ure illus­trates how open source projects con­sis­tent­ly scale to the largest project sizes. We may not yet know exact­ly why, but we are mea­sur­ing that they do.

If you liked this blog post, you might also like read­ing about

  1. Open source prac­tices for inter­nal soft­ware devel­op­ment (a.k.a. inner source)
  2. How to go to to mar­ket with an open source strat­e­gy
  3. The eco­nom­ic case for open source foun­da­tions
  4. My cur­rent pre­sen­ta­tions on open source

Footnotes and References 

[1] The data used to gen­er­ate the fig­ure was tak­en from an data­base snap­shot from March 2008. That snap­shot con­tains about 30% of all active open source projects at that time, using Car­lo Daffara’s esti­mate of total pop­u­la­tion as well as activ­i­ty. The year lines in the fig­ure are not the result of pre­cise math­e­mat­i­cal mod­el­ing, rather they are a lin­ear regres­sion fit­ted into the log­a­rith­mic data. Thus, this fig­ure serves eye­balling pur­pos­es only. The fig­ure itself was cre­at­ed by my Ph.D. stu­dent Carsten Kolas­sa.

[2] A short reminder on log­a­rithms, in case it got rusty: 10^0 = 1, 10^0.6 = 4 (rough­ly), 10^1 = 10, etc. The expo­nents are to be found on the x and y-axes.

7 thoughts on “The Open Source Big Bang

  1. Grismar

    I like the visu­al­iza­tion, but I do feel the straight lines are a bit mis­lead­ing Are they the cen­tral line through some point cloud? Or do they just con­nect the upper and low­er extremes of the dataset for each year? It would be nice to see a sim­i­lar col­ored plot of points besides this one, just to get a sense for the kind of data it was based on.

  2. Dirk Riehle Post author

    @grismar As the foot­note says, they are a sim­ple lin­ear regres­sion on the log-log scale data set (point cloud you men­tion). Thus, this is not math­e­mat­i­cal­ly pre­cise; I con­sid­er it suf­fi­cient­ly good for eye­balling pur­pos­es only. The lines them­selves fol­low a pow­er law nice­ly, but we have not yet been able to come up with a prop­er math­e­mat­i­cal mod­el of it.

  3. Pingback: 451 CAOS Theory » 451 CAOS Links 2011.06.24

  4. Pingback: Mathematische Modellierung von Open Source « Open Source Research Group

  5. Dirk Riehle Post author

    Hi Marc! Sim­ple answer: We got our data­base snap­shot in March 2008 and none there­after. (Nor did any­one else.) We are all lined-up out­side the doors of BDS


Leave a Reply