The Total Growth of Open Source

Amit Desh­pan­de and Dirk Riehle
SAP Research, SAP Labs LLC

Citable ref­er­ence (includ­ing PDF file)


Soft­ware devel­op­ment is under­go­ing a major change away from a ful­ly closed soft­ware process towards a process that incor­po­rates open source soft­ware in prod­ucts and ser­vices. Just how sig­nif­i­cant is that change? To answer this ques­tion we need to look at the over­all growth of open source as well as its growth rate. In this paper, we quan­ti­ta­tive­ly ana­lyze the growth of more than 5000 active and pop­u­lar open source soft­ware projects. We show that the total amount of source code as well as the total num­ber of open source projects is grow­ing at an expo­nen­tial rate. Pre­vi­ous research showed lin­ear and qua­drat­ic growth in lines of source code of indi­vid­u­al open source projects. Our work shows that open source is expand­ing into new domains and appli­ca­tions at an expo­nen­tial rate.

1. Introduction

Soft­ware devel­op­ment is under­go­ing a major change from being a ful­ly closed soft­ware devel­op­ment process towards a more com­mu­ni­ty dri­ven open source soft­ware devel­op­ment process. Suc­cess­ful open source projects like Lin­ux, Apache, Post­greSQL and many oth­ers are grow­ing super-linearly. Pre­vi­ous research showed that lin­ear and qua­drat­ic growth is the dom­i­nant growth pat­tern of open source soft­ware projects [5] [8] [15] [16] [18] [22].

In this paper, we ana­lyze the com­bined growth of open source soft­ware in terms of lines of source code as well as num­ber of projects. Our data­base con­tains more than 5000 active and pop­u­lar open source projects. The data­base pro­vides fine gran­u­lar data of devel­op­er actions over the last 17 years from 1990 to 2006. We ana­lyze the aver­age amount of source code added per mon­th for the time frame of Jan­u­ary 1995 to Decem­ber 2006 as well as the num­ber of projects added over time.

We find that both the growth rate as well as the absolute amount of source code is best explained using an expo­nen­tial mod­el. Given that pre­vi­ous research showed that most open source projects grow at a poly­no­mi­al rate, we sug­gest and then ver­i­fy that the num­ber of open source projects is grow­ing at an expo­nen­tial rate.

This paper is orga­nized as fol­lows. Sec­tion 2 dis­cuss­es our moti­va­tion, the hypoth­e­sis, and its impli­ca­tions. Sec­tion 3 dis­cuss­es our data­base and approach. Sec­tion 4 presents the results of the analy­sis. Sec­tion 5 dis­cuss­es some lim­i­ta­tions of the analy­sis and Sec­tion 6 dis­cuss­es relat­ed work. Sec­tion 7 con­cludes the paper.

2. The Growth of Open Source

Open source soft­ware is hav­ing a major impact on the soft­ware indus­try and its pro­duc­tion process­es. Many soft­ware prod­ucts today con­tain at least some open source soft­ware com­po­nents. Some com­mer­cial prod­ucts are com­plete­ly open source soft­ware [9]. In some mar­kets, for exam­ple, web servers, open source soft­ware holds a dom­i­nant mar­ket share [10].

Open source soft­ware today has a strong pres­ence in indus­try and gov­ern­ment. Wal­li et al. observe [19]: “Orga­ni­za­tions are sav­ing mil­lions of dol­lars on IT by using open source soft­ware. In 2004, open source soft­ware saved large com­pa­nies (with annu­al rev­enue of over $1 bil­lion) an aver­age of $3.3 mil­lion. Medium-sized com­pa­nies (between $50 mil­lion and $1 bil­lion in annu­al rev­enue) saved an aver­age $1.1 mil­lion. Firms with rev­enues under $50 mil­lion saved an aver­age $520,000.”

Com­mer­cial­ly, the sig­nif­i­cance and growth of open source is mea­sured in terms of rev­enue gen­er­at­ed from it. Law­ton and Notar­fon­zo state that pack­aged open source appli­ca­tions gen­er­at­ed rev­enues of $1.8 bil­lion in 2006 [9]. The soft­ware divi­sion of the Soft­ware & Infor­ma­tion Indus­try Asso­ci­a­tion esti­mates that total pack­aged soft­ware rev­enues were $235 bil­lion in 2006 [4]. Thus, open source rev­enue, while still small com­pared to the over­all mar­ket (~0.7%) is not triv­ial any longer.

How­ev­er, open source soft­ware today is part of many pro­pri­etary (closed) source prod­ucts, and mea­sur­ing its growth sole­ly by pack­aged soft­ware rev­enue is like­ly to under­es­ti­mate its size and growth by a wide mar­gin. To mea­sure the growth of open source we need to look at the total growth of open source projects and their source code.

Sev­er­al stud­ies have been under­tak­en to mea­sure the growth and evo­lu­tion of indi­vid­u­al open source soft­ware projects [5] [15] [16] [18]. Most of the­se stud­ies are exem­plary, focus­ing on a few select­ed projects only. The excep­tion is Koch’s work, which uses a large sam­ple (>4000 projects) to deter­mine over­all growth pat­terns in open source projects, con­clud­ing that poly­no­mi­al growth pat­terns provide good mod­els for the­se projects [8] [20]. Such work is most­ly moti­vat­ed by try­ing to under­stand how indi­vid­u­al open source projects grow and evolve.

The work pre­sent­ed in this paper, in con­trast, ana­lyzes the over­all growth of open source, aggre­gat­ing data from more than 5000 active and pop­u­lar open source projects to deter­mine the total growth of source code and num­ber of projects. Assum­ing a pos­i­tive cor­re­la­tion between work spent on open source, its total growth in terms of code and num­ber of projects, and the rev­enue gen­er­at­ed from it, under­stand­ing the over­all growth of open source will give us a bet­ter indi­ca­tion of how sig­nif­i­cant a role open source will play in the future.

Under­stand­ing over­all open source growth helps more eas­i­ly answer ques­tions about, for exam­ple, future pro­duct struc­tures (how much code of an appli­ca­tion is like­ly to be open source code?), labor eco­nom­ics (how much and which open source skills does a com­pa­ny need?), and rev­enue (what per­cent­age of the soft­ware market’s rev­enue will come from open source?).

The work pre­sent­ed in this paper shows that the total amount of open source code and the total num­ber of projects is grow­ing expo­nen­tial­ly. Assum­ing a base of 0.7% of the market’s rev­enue, expo­nen­tial growth is a strong indi­ca­tor that open source will be of sig­nif­i­cant­ly increas­ing com­mer­cial impor­tance. The remain­der of this paper dis­cuss­es our study and val­i­dates the hypoth­e­sis of expo­nen­tial growth of open source.

3. Data Source and Approach

On Source­Forge, the dom­i­nant open source project host­ing ser­vice, there are more than 150,000 projects reg­is­tered, most of which are con­sid­ered inac­tive [1] [17]. Daf­fara esti­mates that as of today there are only about 18,000 active open source projects in the world [3].

For our analy­sis, we use the data­base of the open source ana­lyt­ics firm, which has been crawl­ing open source soft­ware code repos­i­to­ries since 2005 [11]. Our data­base snap­shot con­tains 5122 active and pop­u­lar open source projects writ­ten in 30 dif­fer­ent pro­gram­ming lan­guages cov­er­ing 103 open source licens­es. All data is updat­ed on at least a week­ly basis.

The data­base con­tains the most pop­u­lar open source projects as mea­sured by the num­ber of in-links to their web­site. The in-links are pro­vid­ed by the Yahoo! search engine. The data­base con­tains data from Jan­u­ary 1990 until May 2007. Of this time hori­zon, we ana­lyze the time frame from Jan­u­ary 1995 to Decem­ber 2006. We omit data before 1995 because it is too sparse to be use­ful. pro­vides high-level data like project struc­tures and devel­op­er infor­ma­tion, but also data that goes down to the lev­el of indi­vid­u­al devel­op­er actions. Specif­i­cal­ly, Ohloh pro­vides each indi­vid­u­al com­mit action of all projects over their entire his­to­ry to the extent that they are pub­licly avail­able.

A com­mit is the action with which a devel­op­er con­tributes a piece of code to the project’s repos­i­to­ry. A developer’s work­week typ­i­cal­ly con­sists of a stream of com­mit actions by which he or she shares the results of their work with the team, con­tribut­ing to the pro­duct or project under way.

We use the amount of source code added to a project (or removed) as an approx­i­ma­tion of the work con­tribut­ed. We count code in source lines of code (SLoC), omit­ting emp­ty or com­ment­ed lines of code. Each com­mit action stored in the data­base lists the num­ber of lines of code added and removed in the com­mit. The num­ber of lines added or removed is cal­cu­lat­ed using the Unix diff com­mand applied to two con­sec­u­tive ver­sions. Emp­ty or com­ment­ed lines of code are ignored. Using this data, we cal­cu­late the change in the size of a source code file by adding or sub­tract­ing the num­ber of lines of code added to or removed from its exist­ing size.

This data col­lec­tion method grace­ful­ly han­dles file and direc­to­ry renam­ing. Such renam­ing is mod­eled as if the file or direc­to­ry was removed and then re-added under a new name. Both code added and code removed will have equal (large) val­ues, so the net change is zero. This avoids any undue bias in the analy­sis.

Libraries are typ­i­cal­ly used across many projects. For instance, the GIMP project and the GNOME project have many libraries in com­mon. If the lines of code for both projects were added up inde­pen­dent­ly we would be double-counting the libraries, lead­ing to skewed results. We make sure that we are not double-counting code by con­sid­er­ing each change to the orig­i­nal library.

How­ev­er, we can­not unam­bigu­ous­ly iden­ti­fy sit­u­a­tions where a devel­op­er adds redun­dant source code to the code base. Copy and paste is a com­mon prac­tice in soft­ware devel­op­ment, inde­pen­dent­ly of whether it is inter­nal, exter­nal, planned or oppor­tunis­tic. To deal with this issue, we adopt two approach­es.

  1. In the first approach we ignore the copy and paste prob­lem and ana­lyze the source lines of code added. The argu­ment is that copy and paste is a real­i­ty of soft­ware devel­op­ment and that the copied code is part of the project. Hence, copy and paste sim­ply needs to be accept­ed.
  2. In the sec­ond approach we find the aver­age and the stan­dard devi­a­tion for the code added over time. We ignore all com­mits where lines of code added is greater than aver­age code added per com­mit plus three times the stan­dard devi­a­tion. The heuristic’s assump­tion is that by not con­sid­er­ing such large com­mits we ignore all com­mits based on copy and paste.

An analy­sis of aver­age code con­tri­bu­tion size in com­mits pro­vides a cut-off val­ue of 3060 SLoC that we use for the heuris­tic. This sec­ond approach is con­ser­v­a­tive in that we ignore not only copy and paste but also com­mits con­tain­ing new code added. So we err on the low­er side of total open source con­tri­bu­tions.

We employ the­se two approach­es to get an upper and a low­er bound for the growth in source lines of code and num­ber of projects. We can there­fore say that prop­er­ties like the expo­nen­tial growth observed in both the upper and low­er bound curve apply to the real curve as well.

4. Analysis and Results

We first ana­lyze growth rate and total growth in open source soft­ware code and then ana­lyze growth rate and total growth in open source soft­ware projects.

4.1 Growth in source code

Fig­ures 1 and 2 show plots that rep­re­sent the growth in source lines of code added using Approach 1 and 2 respec­tive­ly. The Y-axis shows the num­ber of lines of code added each mon­th and the X-axis shows the time. Each data point on the plot rep­re­sents the total num­ber of lines of code added dur­ing that mon­th. The time frame is 1995 through 2006 for all projects. We can see an upward trend in the amount of code added over time. Both Approach 1 and 2 show a sim­i­lar pat­tern of growth.

Fig­ure 1: Graph of source lines of code added [mil­lions] (Approach 1)

Fig­ure 2: Graph of source lines of code added [mil­lions] (Approach 2)

Table 1 shows mod­els for the two plots. In both cas­es, the best fit­ting mod­el is an expo­nen­tial curve with an R-square val­ue of about 0.9, giv­ing us con­fi­dence in the valid­i­ty of the claim that the amount of code added is grow­ing expo­nen­tial­ly.

Table 1: Mod­el of source lines of code added

Fig­ure 3 shows the total num­ber of lines of open source code over time. Table 2 shows the sta­tis­ti­cal mod­els for the two approach­es. The dou­bling time for Approach 1 is 12.5 months, and the dou­bling time for Approach 2 is 14.9 months. We observe that the total code in Approach 2 is low­er than in Approach 1 but fol­lows a sim­i­lar trend. This behav­ior is expect­ed as we elim­i­nat­ed all large com­mits in the sec­ond approach to exclude copy and paste con­tri­bu­tions.

Fig­ure 3: Graph of total source lines of code [mil­lions] (both approach­es)

Table 2: Mod­el of total source lines of code

4.2 Growth in projects

Fig­ure 4 shows the num­ber of projects added over time and Table 3 shows the mod­el and its fit with the data. For each project, there is a first occur­rence of a project action (for exam­ple, the ini­tial com­mit action), and that point of time is con­sid­ered the birth date of the project. This is the point of time when the project is count­ed as added to the over­all set of projects.

Fig­ure 4: Graph of num­ber of open source projects added

Table 3: Mod­el of num­ber of open source projects added

Large dis­tri­b­u­tions like Debian are count­ed as one project. Pop­u­lar projects such as GNU Emacs are count­ed as projects of their own, lit­tle known or obso­lete pack­ages such as the Zoo archive util­i­ty are ignored. Many of the projects that were includ­ed in a Debian dis­tri­b­u­tion around 1998 are not pop­u­lar enough today (as stand-alone projects) to be includ­ed in our copy of the Ohloh data­base.

And again, we get the best fit for the result­ing curve for an expo­nen­tial mod­el with an R-square val­ue of 0.88.

Fig­ure 5 then shows the total num­ber of projects and Table 4 shows the cor­re­spond­ing mod­el and its fit with the data. Again, we get the best fit for an expo­nen­tial mod­el with an R-square val­ue of 0.96. The dou­bling time is 13.9 months.

Fig­ure 5: Graph of total num­ber of open source projects

Table 4: Mod­el of total num­ber of open source projects

4.3 Review of find­ings

This sec­tion shows the growth of source code in open source projects as well as the growth of open source projects itself. We con­sis­tent­ly get the best fit for the data using expo­nen­tial mod­els. The dou­bling time based on the expo­nen­tial mod­els is about 14 months for both the total amount of source code and the total num­ber of projects. It should be not­ed that if we were to break up the data sets into sep­a­rate time peri­ods, we might find bet­ter fits for oth­er mod­els than the expo­nen­tial mod­el. In future work we will ana­lyze the over­all growth in dis­tinct phas­es, each of which is best explained by a sep­a­rate growth mod­el.

In [13] we dis­cuss the size and fre­quen­cy of code con­tri­bu­tions to open source projects. We can use those results to fur­ther increase our con­fi­dence in the results pre­sent­ed above. Specif­i­cal­ly, the lines of code added can be assumed equal to the pro­duct of the aver­age size of a com­mit in terms of source lines of code and the com­mit fre­quen­cy. Our analy­sis shows that the aver­age com­mit size is almost con­stant while the com­mit fre­quen­cy (num­ber of com­mits per week) increas­es expo­nen­tial­ly between Jan 1995 to Dec 2006. This ver­i­fies our find­ings about the expo­nen­tial growth in open source.

5. Limitations of Analysis

The quan­ti­ta­tive analy­sis and the con­clu­sions we draw have the fol­low­ing short­com­ings and lim­i­ta­tions.

  • Sam­ple size. We con­sid­ered 5122 active and pop­u­lar open source projects. The total num­ber of open source projects in the world is much larg­er. How­ev­er, Daf­fara esti­mates that of the total num­ber only 18,000 projects (low­er bound) are actu­al­ly active [3]. So we believe that the sam­ple we are using is rel­e­vant for ana­lyz­ing trends and pat­terns in open source growth.
  • Data incom­plete­ness. Some amount of revi­sion con­trol infor­ma­tion in open source projects has already been lost forever, as projects have moved on from no con­fig­u­ra­tion man­age­ment (CM) to CM with CVS and on to oth­er CM tools, fre­quent­ly drop­ping the his­to­ry with each move. Thus, the project his­to­ry for each project is not always com­plete. How­ev­er, for a cur­rent project, we have the most recent his­to­ry, which is what is most rel­e­vant for our analy­sis. Thus, the lack of some of the ear­ly his­to­ries of some of the open source projects has lit­tle effect on the valid­i­ty of our con­clu­sions.
  • Project source. A cur­rent lim­i­ta­tion of Ohloh is that it only con­nects to CVS, Sub­ver­sion and Git source code repos­i­to­ries. We believe that this lim­i­ta­tion is not a big issue for our pur­pos­es because almost all open source projects are main­tained in one of the­se repos­i­to­ries and our sam­ple size can be con­sid­ered rep­re­sen­ta­tive.
  • Copy and paste. Our approach to elim­i­nat­ing copy and paste issues (Approach 2) is lim­it­ed in its effec­tive­ness: The fil­ter excludes a lot of good val­ues while still allow­ing minor copy and paste to pass. For the pur­pos­es of our analy­sis, how­ev­er, it is not a major issue, because we are inter­est­ed in the over­all trend, and even the con­ser­v­a­tive Approach 2 still val­i­dates our hypoth­e­sis of expo­nen­tial growth.

We are con­tin­u­ing our work to iron out pos­si­ble pit­falls based on the­se lim­i­ta­tions. How­ev­er, we believe that while the respec­tive cri­tiques can be made, the effects are rather lim­it­ed, as argued above in each case.

6. Related Work

Sev­er­al stud­ies of the evo­lu­tion of open source projects have been under­tak­en.

  • González-Barahona et al. esti­mat­ed the lines of code in the Debian 2.0 release and con­clud­ed that the sys­tem rep­re­sents an effort of more than 14,000 person-years, which trans­lates to about 2 bil­lion USD [6].
  • Suc­ci et al. showed a lin­ear growth rate for the GCC and Apache projects. They also showed that Lin­ux has super lin­ear growth [18]. They found that Lin­ux (in 2000) vio­lates Lehman’s fourth law of soft­ware evo­lu­tion.
  • In con­trast to this, Roy and Cordy exam­ined the evo­lu­tion of the Bar­code Library and the zlib project and showed that the­se two small­er projects fol­low Lehman’s laws of soft­ware evo­lu­tion [16].
  • God­frey and Tu showed a super-linear increase in source lines of code over time in the Lin­ux ker­nel and the VIM text edi­tor [5].
  • Rob­les et al. con­firmed that the Lin­ux ker­nel is grow­ing super-linearly [15]. The NetB­SD, FreeB­SD, OpenB­SD (until 2001) and 18 oth­er projects showed an almost lin­ear growth pat­tern.
  • Koch’s study of 4047 open source projects on Source­Forge indi­cates that a qua­drat­ic growth mod­el fits the growth of an indi­vid­u­al project bet­ter than a lin­ear growth mod­el [8] [20].
  • Scac­chi reviews pri­or results on open source evo­lu­tion, sug­gest­ing that the growth pat­terns for large open source projects are not rep­re­sen­ta­tive for all of open source [22]. His dis­cus­sion of the evo­lu­tion of open source soft­ware sug­gests that Lehman’s laws of soft­ware evo­lu­tion based on closed-source sys­tems do not apply to open source, and that fur­ther study is need­ed.

Most of the research list­ed above explores the evo­lu­tion of indi­vid­u­al projects. The growth mod­els of projects are typ­i­cal­ly lin­ear or qua­drat­ic. None of the relat­ed work quan­ti­ta­tive­ly ana­lyzes the total growth of open source soft­ware.

Our analy­sis does not focus on any par­tic­u­lar project but on the gen­er­al trend in open source soft­ware. The projects con­sid­ered are inde­pen­dent of any par­tic­u­lar license, lan­guage, top­ic or size.

7. Conclusion

The sig­nif­i­cance of open source has been con­tin­u­ous­ly increas­ing over time. Our research val­i­dates this claim by look­ing at the total growth of open source. Our work shows that the addi­tions to open source projects, the total project size (mea­sured in source lines of code), the num­ber of new open source projects, and the total num­ber of open source projects are grow­ing at an expo­nen­tial rate. The total amount of source code and the total num­ber of projects dou­ble about every 14 months.

Our results open gates for fur­ther research around the growth of open source and the accep­tance of open source in indus­try and gov­ern­ment. Future research should explore ques­tions like what fac­tors are influ­enc­ing this expo­nen­tial growth, how source code growth relates to the num­ber of engaged soft­ware devel­op­ers, and whether or how long open source can sus­tain this expo­nen­tial growth.


We would like to thank Prem Devan­bu and Gre­go­rio Rob­les for their feed­back on ear­lier ver­sions of the paper as well as their encour­age­ment for the work pre­sent­ed. We also would like to thank Oliv­er Arafat and Mar­io Fer­nan­dez for proof­read­ing the paper.


[1] Comi­no, S, Manen­ti, F.M., Parisi, M. L. From Plan­ning to Mature: On the Deter­mi­nants of Open Source Take Off. Depart­ment of Eco­nom­ics Work­ing Papers 0517, Depart­ment of Eco­nom­ics, Uni­ver­si­ty of Tren­to, Ital­ia. 2005.

[2] Crow­ston, K. and Scozzi, B. Open Source Soft­ware Projects as Vir­tu­al Orga­ni­za­tions: Com­pe­ten­cy Ral­ly­ing for Soft­ware Devel­op­ment. IEE Proceedings—Software Engi­neer­ing, vol. 149, no. 1, 2002: 3–17.

[3] Daf­fara, C. How Many Sta­ble and Active Libre Soft­ware Projects? Retrieved on Sept 13, 2007, from

[4] Soft­ware & Infor­ma­tion Indus­try Asso­ci­a­tion. Pack­aged Soft­ware Indus­try Rev­enue and Growth, 2006. Avail­able from

[5] God­frey, M., Tu, M. Growth, Evo­lu­tion, and Struc­tural Change in Open Source Soft­ware. In Pro­ceed­ings of the 4th Inter­na­tion­al Work­shop on Prin­ci­ples of Soft­ware Evo­lu­tion. ACM Press, 2001: 103–106.

[6] González-Barahona, J., Ortuño Pérez, M., de las Heras Quirós, P., Cen­teno González, J., Matel­lán Oliv­era, V. Count­ing pota­toes: The Size of Debian 2.2. Retrieved on Sept 13, 2007, from

[7] Haru­vy, E., Wu F. and Chakravar­ty S. Incen­tives for Devel­op­ers’ Con­tri­bu­tions and Pro­duct Per­for­mance Met­ric in Open Source Devel­op­ment: An Empir­i­cal Explo­ration. Uni­ver­si­ty of Tex­as Work­ing Paper.

[8] Koch, S. Evo­lu­tion of Open Source Soft­ware Systems—A Large-Scale Inves­ti­ga­tion. In Pro­ceed­ings of the 1st Inter­na­tion­al Con­fer­ence on Open Source Sys­tems (OSS 2005).

[9] Law­ton, M., Notar­fon­zo, R. World­wide Open Source Soft­ware Busi­ness Mod­els 2007–2011 Fore­cast: A Pre­lim­i­nary View. IDC Inc.

[10] Net­craft. Net­craft Web Server Sur­vey. Net­craft, 2007. Retrieved on Sept 13, 2007, from

[11] Ohloh Cor­po­ra­tion. See

[12] Ray­mond, E. S. The Cathe­dral and the Bazaar. O’Reilly & Asso­ciates, 1999.

[13] Desh­pan­de, A. Riehle, D. Con­tin­u­ous Inte­gra­tion in Open Source Soft­ware Projects. Sub­mit­ted to the 4th Inter­na­tion­al Con­fer­ence on Open Source Sys­tems (OSS 2008).

[14] Rob­les, G., Gonzalez-Barahona, J. M., Michlmayr, M., and Amor, J. J. Min­ing Large Soft­ware Com­pi­la­tions Over Time: Anoth­er Per­spec­tive of Soft­ware Evo­lu­tion. In Pro­ceed­ings of the 2006 Inter­na­tion­al Work­shop on Min­ing Soft­ware Repos­i­to­ries (MSR 2006). ACM Press, 2006: 3–9.

[15] Rob­les, G., Amor, J. J., Gonzalez-Barahona, J. M., and Her­raiz, I. Evo­lu­tion and Growth in Large Libre Soft­ware Projects. In Pro­ceed­ings of the Eighth Inter­na­tion­al Work­shop on Prin­ci­ples of Soft­ware Evo­lu­tion (IWPSE 2005). IEEE Com­put­er Soci­ety, 2005: 165–174.

[16] Roy, C. K. and Cordy, J. R. Eval­u­at­ing the Evo­lu­tion of Small Scale Open Source Soft­ware Sys­tems. See

[17] Source­Forge. See

[18] Suc­ci, G., Paulson, J., Eber­lein, A. Pre­lim­i­nary Results From an Empir­i­cal Study on the Growth of Open Source and Com­mer­cial Soft­ware Prod­ucts. In EDSER-3 Work­shop (2001): 14–15.

[19] Wal­li, S., Gynn, D., Rotz, B. V. The Growth of Open Source Soft­ware in Orga­ni­za­tions: A Report. Retrieved on Sept 13, 2007, from (Local copy.)

[20] Koch, S. Soft­ware Evo­lu­tion in Open Source Projects—A Large-Scale Inves­ti­ga­tion. In Jour­nal of Soft­ware Main­te­nance and Evo­lu­tion: Research and Prac­tice 2007; 19: 361–382.

[21] Karim, R., Lakhani, R.G. Wolf. Why Hack­ers Do What They Do: Under­stand­ing Moti­va­tion and Effort in Free/Open Source Soft­ware Projects. In Per­spec­tives on Free and Open Source Soft­ware. MIT Press, 2005: 3–22.

[22] Walt Scac­chi. Under­stand­ing Open Source Soft­ware Evo­lu­tion. In Soft­ware Evo­lu­tion and Feed­back. John Wiley & Sons, 2006.


Adden­dum to Total Growth of Open Source paper.

53 thoughts on “The Total Growth of Open Source

  1. Stephen

    You state with­out jus­ti­fi­ca­tion that “the best fit­ting mod­el is an expo­nen­tial curve”. How­ev­er the line you’ve drawn in Fig­ures 1 and 2 looks like a very poor fit. Hon­est­ly even a sim­ple line cross­ing the x axis around 1998 looks like a bet­ter fit.

  2. Peter Judge

    Do you have any esti­mates of the growth in pro­pri­etary code over the same peri­od? Could it be the case that just “every­thing” in this busi­ness is expo­nen­tial?

  3. Dirk Riehle Post author

    @Stephen: It’s math that is telling us that an expo­nen­tial mod­el is the best fit. We tried dif­fer­ent mod­els, and using the r-square val­ues as the indi­ca­tor, we got the best fit with expo­nen­tial mod­els.

    One thing we would like to inves­ti­gate next is to break up the his­to­ry of open source into dis­tinct phas­es. It is not clear that a sin­gle func­tion is best in explain­ing total growth. Rather, open source may have gone through the­se dif­fer­ent phas­es, each of which is best described with a dif­fer­ent func­tion.

  4. Dirk Riehle Post author

    @Peter Judge: This is a very good ques­tion, and we have been strug­gling to get exact­ly that data so we can make a com­par­ison. This paper shows the total growth, but the ques­tion is of course, growth on what base? Right now, the best we could reli­ably get were the rev­enue num­bers. Get­ting total SLoC for all of open + closed source would give us a bet­ter indi­ca­tor of how much of soft­ware devel­op­ment has already shift­ed to open source.

  5. Dirk Riehle Post author

    @Peter Judge: One more thing. I don’t think every­thing is expo­nen­tial. Pro­gram­mers can only do so much work in a given week, and the num­ber of pro­gram­mers is def­i­nite­ly not grow­ing exponentially—if it is grow­ing at all! This means total growth of open + closed source is poly­no­mi­al, per­haps not even bet­ter than lin­ear.

  6. Linas Vepstas

    Hi, could you redo the graphs as semi-log graphs? A semi-log graph would show expo­nen­tial growth as a straight line, mak­ing it much eas­ier to eye-ball what is going on.

    Also, for the con­clu­sion, please restate the expo­nen­tial as a half-life, i.e. pow­er of 2 instead of power-of-e. From your data, I get that the num­ber of lines of code is dou­bling every 15 months or so. Thats a much punchier con­slu­sion than “exp 0.-46x” which may as well be greek to most read­ers.


  7. Tim Bunce

    I believe a fac­tor in the cur­rent rapid growth is a move­ment of code from pri­vate (often per­son­al) code repos­i­to­ries to pub­lic ones. Mov­ing from the old “email a patch to the author” devel­op­ment mod­el, to a “com­mit a change to a branch” mod­el. Some­times the code moved doesn’t include all the old his­to­ry so the project appears new­er than it real­ly is.

    Of course this is hard to quan­ti­fy.

  8. Dirk Riehle Post author

    @Tim Bunce: You may be right, but I don’t think the­se are the projects cov­ered by in our data­base. Since the­se are by and large the top 5000 projects, they tend to be more mature with a real com­mu­ni­ty.

  9. Pingback: links for 2008-04-26 « Spartakan

  10. Pingback: Wächst Freie Software exponentiell? —

  11. Rob

    Semi­log or log-lin graph­ing (log quan­ti­ty vs lin­ear time) show expo­nen­tial rates as straight lines. Pret­ty handy- give it a try some­time.

  12. Pingback: Ekron Designs Blog » Interviews: Four Open Source Questions for Microsoft

  13. Pingback: What Microsoft can do for Open Source « openworld2010

  14. Pingback: Double-digit Gains for GNU/Linux Magazine

  15. Pingback: Singularity - a 2012 Possibility | 2012 Survival Source

  16. Pingback: Red Hat on Patents and Total Growth of Open Source

  17. Dirk Riehle Post author

    Hi Harki­rat, noth­ing new at hand. We hope to run the num­bers again and will also extend it, but that’s a cou­ple of months into the future. –Dirk

  18. Pingback: JasperForge: Blogs

  19. Collin Tewalt


    I’m Collin and a fresh­man com­put­er sci­ence stu­dent at Col­orado State Uni­ver­si­ty. I’m doing some research on the pro­duc­tiv­i­ty and qual­i­ty of open source projects ver­sus closed source. Infor­mal­ly, would you say that open source projects like Lin­ux and oth­er pro­gram­ming IDEs are vital and nec­es­sary to the field of com­put­er sci­ence? With­out open source projects, do you think we would have as much pro­gress as we do cur­rent­ly? I know that the study doesn’t sup­port any of the­se hypothe­ses direct­ly, but what’s your opin­ion. Thanks and good work!


  20. Dirk Riehle Post author

    Hi Collin,

    a recent Gart­ner study (; report G00156659) has shown that by 2012 more than 90% of all com­pa­nies that use IT will use open source. So yes, open source is crit­i­cal to the func­tion­ing of the­se com­pa­nies and the econ­o­my. From that sig­nif­i­cance fol­lows the impor­tance for com­put­er sci­ence research and teach­ing on open source.

    Good luck with your research,

  21. Pingback: Michael Nielsen » Biweekly links for 01/15/2010

  22. Pingback: How Big Is Open Source « UCOSP

  23. Pingback: OStatic Buffer Overflow… | google android os blog

  24. Pingback: Data is the next Intel Inside « Daniel's Blog

  25. Pingback: Articles I’m Digesting 4/4/2010 |

  26. Pingback: Choosing Open Source to Save Money | The Executive Whisper

  27. Henric Bergenwall

    If source lines of code (SLOC) pro­duc­tiv­i­ty per pro­gram­mer is some­what con­stant and the increase of engaged pro­gram­mers is lin­ear, would­nt qua­drat­ic growth of SLOC nat­u­ral­ly be qua­drat­ic?

      1. Henric Bergenwall

        I assume open source projects are ver­sion con­trolled, and that it, thrue that ver­sion con­trol sys­tem is pos­si­ble to iden­ti­fy each pro­gram­mers first com­mit. Mea­sur­ing how the num­ber of first com­mits over time would may­be indi­cate how the num­ber of engaged pro­gram­mers vary over time.

        1. Dirk Riehle Post author

          Sure, but it is hard to get that data right — for one, pri­or to git, you couldn’t dis­tin­guish author from com­mit­ter.

  28. Pingback: V1 CMD3 - Open Source [blogpost 2]

  29. Pingback: Collaborative Commerce « MassMedia Studios

  30. Pingback: Software May Be Eating The World, But Open Source Software Is Eating Itself | DIGIZENS

  31. Pingback: Software May Be Eating The World, But Open Source Software Is Eating Itself

  32. Pingback: How Can Free Software Be Used in Education ? | Of tea, sci-fi and Free Software

  33. Pingback: Does Intellectual Property Defy Human Nature? | Joseph S. Diedrich

  34. Pingback: Intentionally Irreverent | Does Intellectual Property Defy Human Nature?

  35. Pingback: The technology “disruption” occurring in today’s business world is driven by open source and APIs and a new paradigm of enterprise collaboration | MITCNC Blog

  36. Pingback: Why you should fork your next open-source project | Nagg

  37. Pingback: Companies Are Finally Learning To Share—The Open Source Way « LinuxLife Blog

  38. Pingback: Companies Are Finally Learning To Share—The Open Source Way |

  39. Pingback: Companies Are Finally Learning To Share—The Open Source Way | Linuxoctane

  40. Pingback: Why the Best Companies and Developers Give Away Almost Everything They Do – Y Combinator

Leave a Reply