Dirk Riehle's Industry and Research Publications

The Total Growth of Open Source

Amit Deshpande and Dirk Riehle

SAP Research, SAP Labs LLC

Citable reference (including PDF file)

Abstract

Software development is undergoing a major change away from a fully closed software process towards a process that incorporates open source software in products and services. Just how significant is that change? To answer this question we need to look at the overall growth of open source as well as its growth rate. In this paper, we quantitatively analyze the growth of more than 5000 active and popular open source software projects. We show that the total amount of source code as well as the total number of open source projects is growing at an exponential rate. Previous research showed linear and quadratic growth in lines of source code of individual open source projects. Our work shows that open source is expanding into new domains and applications at an exponential rate.

1. Introduction

Software development is undergoing a major change from being a fully closed software development process towards a more community driven open source software development process. Successful open source projects like Linux, Apache, PostgreSQL and many others are growing super-linearly. Previous research showed that linear and quadratic growth is the dominant growth pattern of open source software projects [5] [8] [15] [16] [18] [22].

In this paper, we analyze the combined growth of open source software in terms of lines of source code as well as number of projects. Our database contains more than 5000 active and popular open source projects. The database provides fine granular data of developer actions over the last 17 years from 1990 to 2006. We analyze the average amount of source code added per month for the time frame of January 1995 to December 2006 as well as the number of projects added over time.

We find that both the growth rate as well as the absolute amount of source code is best explained using an exponential model. Given that previous research showed that most open source projects grow at a polynomial rate, we suggest and then verify that the number of open source projects is growing at an exponential rate.

This paper is organized as follows. Section 2 discusses our motivation, the hypothesis, and its implications. Section 3 discusses our database and approach. Section 4 presents the results of the analysis. Section 5 discusses some limitations of the analysis and Section 6 discusses related work. Section 7 concludes the paper.

2. The Growth of Open Source

Open source software is having a major impact on the software industry and its production processes. Many software products today contain at least some open source software components. Some commercial products are completely open source software [9]. In some markets, for example, web servers, open source software holds a dominant market share [10].

Open source software today has a strong presence in industry and government. Walli et al. observe [19]: “Organizations are saving millions of dollars on IT by using open source software. In 2004, open source software saved large companies (with annual revenue of over $1 billion) an average of $3.3 million. Medium-sized companies (between $50 million and $1 billion in annual revenue) saved an average $1.1 million. Firms with revenues under $50 million saved an average $520,000.”

Commercially, the significance and growth of open source is measured in terms of revenue generated from it. Lawton and Notarfonzo state that packaged open source applications generated revenues of $1.8 billion in 2006 [9]. The software division of the Software & Information Industry Association estimates that total packaged software revenues were $235 billion in 2006 [4]. Thus, open source revenue, while still small compared to the overall market (~0.7%) is not trivial any longer.

However, open source software today is part of many proprietary (closed) source products, and measuring its growth solely by packaged software revenue is likely to underestimate its size and growth by a wide margin. To measure the growth of open source we need to look at the total growth of open source projects and their source code.

Several studies have been undertaken to measure the growth and evolution of individual open source software projects [5] [15] [16] [18]. Most of these studies are exemplary, focusing on a few selected projects only. The exception is Koch’s work, which uses a large sample (>4000 projects) to determine overall growth patterns in open source projects, concluding that polynomial growth patterns provide good models for these projects [8] [20]. Such work is mostly motivated by trying to understand how individual open source projects grow and evolve.

The work presented in this paper, in contrast, analyzes the overall growth of open source, aggregating data from more than 5000 active and popular open source projects to determine the total growth of source code and number of projects. Assuming a positive correlation between work spent on open source, its total growth in terms of code and number of projects, and the revenue generated from it, understanding the overall growth of open source will give us a better indication of how significant a role open source will play in the future.

Understanding overall open source growth helps more easily answer questions about, for example, future product structures (how much code of an application is likely to be open source code?), labor economics (how much and which open source skills does a company need?), and revenue (what percentage of the software market’s revenue will come from open source?).

The work presented in this paper shows that the total amount of open source code and the total number of projects is growing exponentially. Assuming a base of 0.7% of the market’s revenue, exponential growth is a strong indicator that open source will be of significantly increasing commercial importance. The remainder of this paper discusses our study and validates the hypothesis of exponential growth of open source.

3. Data Source and Approach

On SourceForge, the dominant open source project hosting service, there are more than 150,000 projects registered, most of which are considered inactive [1] [17]. Daffara estimates that as of today there are only about 18,000 active open source projects in the world [3].

For our analysis, we use the database of the open source analytics firm Ohloh.net, which has been crawling open source software code repositories since 2005 [11]. Our database snapshot contains 5122 active and popular open source projects written in 30 different programming languages covering 103 open source licenses. All data is updated on at least a weekly basis.

The database contains the most popular open source projects as measured by the number of in-links to their website. The in-links are provided by the Yahoo! search engine. The database contains data from January 1990 until May 2007. Of this time horizon, we analyze the time frame from January 1995 to December 2006. We omit data before 1995 because it is too sparse to be useful.

Ohloh.net provides high-level data like project structures and developer information, but also data that goes down to the level of individual developer actions. Specifically, Ohloh provides each individual commit action of all projects over their entire history to the extent that they are publicly available.

A commit is the action with which a developer contributes a piece of code to the project’s repository. A developer’s workweek typically consists of a stream of commit actions by which he or she shares the results of their work with the team, contributing to the product or project under way.

We use the amount of source code added to a project (or removed) as an approximation of the work contributed. We count code in source lines of code (SLoC), omitting empty or commented lines of code. Each commit action stored in the database lists the number of lines of code added and removed in the commit. The number of lines added or removed is calculated using the Unix diff command applied to two consecutive versions. Empty or commented lines of code are ignored. Using this data, we calculate the change in the size of a source code file by adding or subtracting the number of lines of code added to or removed from its existing size.

This data collection method gracefully handles file and directory renaming. Such renaming is modeled as if the file or directory was removed and then re-added under a new name. Both code added and code removed will have equal (large) values, so the net change is zero. This avoids any undue bias in the analysis.

Libraries are typically used across many projects. For instance, the GIMP project and the GNOME project have many libraries in common. If the lines of code for both projects were added up independently we would be double-counting the libraries, leading to skewed results. We make sure that we are not double-counting code by considering each change to the original library.

However, we cannot unambiguously identify situations where a developer adds redundant source code to the code base. Copy and paste is a common practice in software development, independently of whether it is internal, external, planned or opportunistic. To deal with this issue, we adopt two approaches.

  1. In the first approach we ignore the copy and paste problem and analyze the source lines of code added. The argument is that copy and paste is a reality of software development and that the copied code is part of the project. Hence, copy and paste simply needs to be accepted.
  2. In the second approach we find the average and the standard deviation for the code added over time. We ignore all commits where lines of code added is greater than average code added per commit plus three times the standard deviation. The heuristic’s assumption is that by not considering such large commits we ignore all commits based on copy and paste.

An analysis of average code contribution size in commits provides a cut-off value of 3060 SLoC that we use for the heuristic. This second approach is conservative in that we ignore not only copy and paste but also commits containing new code added. So we err on the lower side of total open source contributions.

We employ these two approaches to get an upper and a lower bound for the growth in source lines of code and number of projects. We can therefore say that properties like the exponential growth observed in both the upper and lower bound curve apply to the real curve as well.

4. Analysis and Results

We first analyze growth rate and total growth in open source software code and then analyze growth rate and total growth in open source software projects.

4.1 Growth in source code

Figures 1 and 2 show plots that represent the growth in source lines of code added using Approach 1 and 2 respectively. The Y-axis shows the number of lines of code added each month and the X-axis shows the time. Each data point on the plot represents the total number of lines of code added during that month. The time frame is 1995 through 2006 for all projects. We can see an upward trend in the amount of code added over time. Both Approach 1 and 2 show a similar pattern of growth.

total-growth-figure-1.jpg

Figure 1: Graph of source lines of code added [millions] (Approach 1)

total-growth-figure-2.jpg

Figure 2: Graph of source lines of code added [millions] (Approach 2)

Table 1 shows models for the two plots. In both cases, the best fitting model is an exponential curve with an R-square value of about 0.9, giving us confidence in the validity of the claim that the amount of code added is growing exponentially.

total-growth-table-1.png

Table 1: Model of source lines of code added

Figure 3 shows the total number of lines of open source code over time. Table 2 shows the statistical models for the two approaches. The doubling time for Approach 1 is 12.5 months, and the doubling time for Approach 2 is 14.9 months. We observe that the total code in Approach 2 is lower than in Approach 1 but follows a similar trend. This behavior is expected as we eliminated all large commits in the second approach to exclude copy and paste contributions.

total-growth-figure-3.jpg

Figure 3: Graph of total source lines of code [millions] (both approaches)

total-growth-table-2.png

Table 2: Model of total source lines of code

4.2 Growth in projects

Figure 4 shows the number of projects added over time and Table 3 shows the model and its fit with the data. For each project, there is a first occurrence of a project action (for example, the initial commit action), and that point of time is considered the birth date of the project. This is the point of time when the project is counted as added to the overall set of projects.

total-growth-figure-4.jpg

Figure 4: Graph of number of open source projects added

total-growth-table-3.png

Table 3: Model of number of open source projects added

Large distributions like Debian are counted as one project. Popular projects such as GNU Emacs are counted as projects of their own, little known or obsolete packages such as the Zoo archive utility are ignored. Many of the projects that were included in a Debian distribution around 1998 are not popular enough today (as stand-alone projects) to be included in our copy of the Ohloh database.

And again, we get the best fit for the resulting curve for an exponential model with an R-square value of 0.88.

Figure 5 then shows the total number of projects and Table 4 shows the corresponding model and its fit with the data. Again, we get the best fit for an exponential model with an R-square value of 0.96. The doubling time is 13.9 months.

total-growth-figure-5.jpg

Figure 5: Graph of total number of open source projects

total-growth-table-4.png

Table 4: Model of total number of open source projects

4.3 Review of findings

This section shows the growth of source code in open source projects as well as the growth of open source projects itself. We consistently get the best fit for the data using exponential models. The doubling time based on the exponential models is about 14 months for both the total amount of source code and the total number of projects. It should be noted that if we were to break up the data sets into separate time periods, we might find better fits for other models than the exponential model. In future work we will analyze the overall growth in distinct phases, each of which is best explained by a separate growth model.

In [13] we discuss the size and frequency of code contributions to open source projects. We can use those results to further increase our confidence in the results presented above. Specifically, the lines of code added can be assumed equal to the product of the average size of a commit in terms of source lines of code and the commit frequency. Our analysis shows that the average commit size is almost constant while the commit frequency (number of commits per week) increases exponentially between Jan 1995 to Dec 2006. This verifies our findings about the exponential growth in open source.

5. Limitations of Analysis

The quantitative analysis and the conclusions we draw have the following shortcomings and limitations.

  • Sample size. We considered 5122 active and popular open source projects. The total number of open source projects in the world is much larger. However, Daffara estimates that of the total number only 18,000 projects (lower bound) are actually active [3]. So we believe that the sample we are using is relevant for analyzing trends and patterns in open source growth.
  • Data incompleteness. Some amount of revision control information in open source projects has already been lost forever, as projects have moved on from no configuration management (CM) to CM with CVS and on to other CM tools, frequently dropping the history with each move. Thus, the project history for each project is not always complete. However, for a current project, we have the most recent history, which is what is most relevant for our analysis. Thus, the lack of some of the early histories of some of the open source projects has little effect on the validity of our conclusions.
  • Project source. A current limitation of Ohloh is that it only connects to CVS, Subversion and Git source code repositories. We believe that this limitation is not a big issue for our purposes because almost all open source projects are maintained in one of these repositories and our sample size can be considered representative.
  • Copy and paste. Our approach to eliminating copy and paste issues (Approach 2) is limited in its effectiveness: The filter excludes a lot of good values while still allowing minor copy and paste to pass. For the purposes of our analysis, however, it is not a major issue, because we are interested in the overall trend, and even the conservative Approach 2 still validates our hypothesis of exponential growth.

We are continuing our work to iron out possible pitfalls based on these limitations. However, we believe that while the respective critiques can be made, the effects are rather limited, as argued above in each case.

6. Related Work

Several studies of the evolution of open source projects have been undertaken.

  • González-Barahona et al. estimated the lines of code in the Debian 2.0 release and concluded that the system represents an effort of more than 14,000 person-years, which translates to about 2 billion USD [6].
  • Succi et al. showed a linear growth rate for the GCC and Apache projects. They also showed that Linux has super linear growth [18]. They found that Linux (in 2000) violates Lehman’s fourth law of software evolution.
  • In contrast to this, Roy and Cordy examined the evolution of the Barcode Library and the zlib project and showed that these two smaller projects follow Lehman’s laws of software evolution [16].
  • Godfrey and Tu showed a super-linear increase in source lines of code over time in the Linux kernel and the VIM text editor [5].
  • Robles et al. confirmed that the Linux kernel is growing super-linearly [15]. The NetBSD, FreeBSD, OpenBSD (until 2001) and 18 other projects showed an almost linear growth pattern.
  • Koch’s study of 4047 open source projects on SourceForge indicates that a quadratic growth model fits the growth of an individual project better than a linear growth model [8] [20].
  • Scacchi reviews prior results on open source evolution, suggesting that the growth patterns for large open source projects are not representative for all of open source [22]. His discussion of the evolution of open source software suggests that Lehman’s laws of software evolution based on closed-source systems do not apply to open source, and that further study is needed.

Most of the research listed above explores the evolution of individual projects. The growth models of projects are typically linear or quadratic. None of the related work quantitatively analyzes the total growth of open source software.

Our analysis does not focus on any particular project but on the general trend in open source software. The projects considered are independent of any particular license, language, topic or size.

7. Conclusion

The significance of open source has been continuously increasing over time. Our research validates this claim by looking at the total growth of open source. Our work shows that the additions to open source projects, the total project size (measured in source lines of code), the number of new open source projects, and the total number of open source projects are growing at an exponential rate. The total amount of source code and the total number of projects double about every 14 months.

Our results open gates for further research around the growth of open source and the acceptance of open source in industry and government. Future research should explore questions like what factors are influencing this exponential growth, how source code growth relates to the number of engaged software developers, and whether or how long open source can sustain this exponential growth.

Acknowledgments

We would like to thank Prem Devanbu and Gregorio Robles for their feedback on earlier versions of the paper as well as their encouragement for the work presented. We also would like to thank Oliver Arafat and Mario Fernandez for proofreading the paper.

References

[1] Comino, S, Manenti, F.M., Parisi, M. L. From Planning to Mature: On the Determinants of Open Source Take Off. Department of Economics Working Papers 0517, Department of Economics, University of Trento, Italia. 2005.

[2] Crowston, K. and Scozzi, B. Open Source Software Projects as Virtual Organizations: Competency Rallying for Software Development. IEE Proceedings—Software Engineering, vol. 149, no. 1, 2002: 3-17.

[3] Daffara, C. How Many Stable and Active Libre Software Projects? Retrieved on Sept 13, 2007, from http://flossmetrics.org/news/11.

[4] Software & Information Industry Association. Packaged Software Industry Revenue and Growth, 2006. Available from http://siia.net/software/

[5] Godfrey, M., Tu, M. Growth, Evolution, and Structural Change in Open Source Software. In Proceedings of the 4th International Workshop on Principles of Software Evolution. ACM Press, 2001: 103-106.

[6] González-Barahona, J., Ortuño Pérez, M., de las Heras Quirós, P., Centeno González, J., Matellán Olivera, V. Counting potatoes: The Size of Debian 2.2. Retrieved on Sept 13, 2007, from http://people.debian.org/~jgb/debian-counting/counting-potatoes/.

[7] Haruvy, E., Wu F. and Chakravarty S. Incentives for Developers’ Contributions and Product Performance Metric in Open Source Development: An Empirical Exploration. University of Texas Working Paper.

[8] Koch, S. Evolution of Open Source Software Systems—A Large-Scale Investigation. In Proceedings of the 1st International Conference on Open Source Systems (OSS 2005).

[9] Lawton, M., Notarfonzo, R. Worldwide Open Source Software Business Models 2007–2011 Forecast: A Preliminary View. IDC Inc.

[10] Netcraft. Netcraft Web Server Survey. Netcraft, 2007. Retrieved on Sept 13, 2007, from http://survey.netcraft.com/Reports/200708/byserver/.

[11] Ohloh Corporation. See http://www.ohloh.net.

[12] Raymond, E. S. The Cathedral and the Bazaar. O’Reilly & Associates, 1999.

[13] Deshpande, A. Riehle, D. Continuous Integration in Open Source Software Projects. Submitted to the 4th International Conference on Open Source Systems (OSS 2008).

[14] Robles, G., Gonzalez-Barahona, J. M., Michlmayr, M., and Amor, J. J. Mining Large Software Compilations Over Time: Another Perspective of Software Evolution. In Proceedings of the 2006 International Workshop on Mining Software Repositories (MSR 2006). ACM Press, 2006: 3-9.

[15] Robles, G., Amor, J. J., Gonzalez-Barahona, J. M., and Herraiz, I. Evolution and Growth in Large Libre Software Projects. In Proceedings of the Eighth International Workshop on Principles of Software Evolution (IWPSE 2005). IEEE Computer Society, 2005: 165-174.

[16] Roy, C. K. and Cordy, J. R. Evaluating the Evolution of Small Scale Open Source Software Systems. See http://citeseer.ist.psu.edu/761885.html.

[17] SourceForge. See http://www.sourceforge.net.

[18] Succi, G., Paulson, J., Eberlein, A. Preliminary Results From an Empirical Study on the Growth of Open Source and Commercial Software Products. In EDSER-3 Workshop (2001): 14-15.

[19] Walli, S., Gynn, D., Rotz, B. V. The Growth of Open Source Software in Organizations: A Report. Retrieved on Sept 13, 2007, from http://optaros.com/en/publications/white_papers_reports/the_growth_of_open_source_software_in_organizations. (Local copy.)

[20] Koch, S. Software Evolution in Open Source Projects—A Large-Scale Investigation. In Journal of Software Maintenance and Evolution: Research and Practice 2007; 19: 361-382.

[21] Karim, R., Lakhani, R.G. Wolf. Why Hackers Do What They Do: Understanding Motivation and Effort in Free/Open Source Software Projects. In Perspectives on Free and Open Source Software. MIT Press, 2005: 3-22.

[22] Walt Scacchi. Understanding Open Source Software Evolution. In Software Evolution and Feedback. John Wiley & Sons, 2006.

Addendum

Addendum to Total Growth of Open Source paper.

Comments

  1. […] found that although in the past open-source software was increasing at a linear rate, it’s now gaining ground exponentially throughout the […]

  2. […] And while these numbers are already impressive, it’s important to remember that open source is growing at an exponential rate, so these numbers are going to get much, much […]

  3. […] number of ways, but one is simply to analyze how many lines of open-source code are being written. Research by Dirk Riehle shows a massive hockey stick for open source […]

  4. […] number of ways, but one is simply to analyze how many lines of open-source code are being written. Research by Dirk Riehle shows a massive hockey stick for open source […]

  5. […] number of ways, but one is simply to analyze how many lines of open-source code are being written. Research by Dirk Riehle shows a massive hockey stick for open source […]

  6. […] 2008, researcher Dirk Riehle found[3] that the population of open-source projects was growing […]

  7. […] there’s open-source software, an industry experiencing exponential growth. Its developers go out of their way to voluntarily relinquish copyright protection that would […]

  8. […] there’s open-source software, an industry experiencing exponential growth. Its developers go out of their way to voluntarily relinquish copyright protection that would […]

  9. […] is also known that this growth has led to another interesting fact: the number of free and open source applications released is growing too. Despite many saying that focusing on these kind of software generates almost no return in terms of […]

  10. […] Dirk Riehle’s analysis of the total growth in open source projects is a few years old, if anything the trend he plots has […]

  11. […] Dirk Riehle’s analysis of the total growth in open source projects is a few years old, if anything the trend he plots has […]

  12. […] just don’t get. The open source movement is founded on this principle and it has generated unknown trillions of value in the last 10 […]

  13. […] Amit Deshpande and Dirk Riehle. “The Total growth of open source”. [2008] Dirkriehle, https://dirkriehle.com/publications/2008-2/the-total-growth-of-open-source/, 25 […]

  14. Henric Bergenwall Avatar
    Henric Bergenwall

    If source lines of code (SLOC) productivity per programmer is somewhat constant and the increase of engaged programmers is linear, wouldnt quadratic growth of SLOC naturally be quadratic?

    1. Dirk Riehle Avatar

      Hi Henric,
      correct, but how do we know about linear growth in labor invested?
      Dirk

      1. Henric Bergenwall Avatar
        Henric Bergenwall

        I assume open source projects are version controlled, and that it, thrue that version control system is possible to identify each programmers first commit. Measuring how the number of first commits over time would maybe indicate how the number of engaged programmers vary over time.

        1. Dirk Riehle Avatar

          Sure, but it is hard to get that data right – for one, prior to git, you couldn’t distinguish author from committer.

  15. Yael Vaya Avatar

    Hi,
    Could you please check reference no. 19*? Couldn’t find this study on the website address provided nor by using Google search.
    Thanks in advance,
    Yael
    *Walli, S., Gynn, D., Rotz, B. V. The Growth of Open Source Software in Organizations: A Report. Retrieved on Sept 13, 2007, from http://optaros.com/en/publications/white_papers_reports/the_growth_of_open_source_software_in_organizations.

      1. Yael Vaya Avatar
  16. […] source software today has a strong presence in industry and government. One of the more thorough studies concluded, “Organizations are saving millions of dollars on IT by using open source software. In 2004, open […]

  17. […] The Total Growth of Open Source by Amit Deshpande and Dirk Riehle at SAP Labs […]

  18. […] doubt that open-source development is on the rise.  In fact, according to Deshpande & Riehle (https://dirkriehle.com/publications/2008/the-total-growth-of-open-source/), it accounts for a large portion of the web server market.  To add to this, they describe the […]

  19. […] The Total Growth of Open Source A recent study of more than 5,000 open source projects indicates "open source is expand­ing into new domains and appli­ca­tions at an expo­nential rate." […]

  20. […] Posted by Greg Wilson on 2010/01/18 Amit Deshpande and Dirk Riehle think they have an answer. […]

  21. […] The Total Growth of Open Source […]

  22. Dirk Riehle Avatar

    Hi Collin,
    a recent Gartner study (http://www.gartner.com/; report G00156659) has shown that by 2012 more than 90% of all companies that use IT will use open source. So yes, open source is critical to the functioning of these companies and the economy. From that significance follows the importance for computer science research and teaching on open source.
    Good luck with your research,
    Dirk

  23. Collin Tewalt Avatar
    Collin Tewalt

    Hi,
    I’m Collin and a freshman computer science student at Colorado State University. I’m doing some research on the productivity and quality of open source projects versus closed source. Informally, would you say that open source projects like Linux and other programming IDEs are vital and necessary to the field of computer science? Without open source projects, do you think we would have as much progress as we do currently? I know that the study doesn’t support any of these hypotheses directly, but what’s your opinion. Thanks and good work!
    Collin

  24. […] the world of software is agile and adept. According to research by Amit Deshpande and Dirk Riehle at SAP Research Labs, during the past five years the number of […]

  25. Dirk Riehle Avatar

    Hi Harkirat, nothing new at hand. We hope to run the numbers again and will also extend it, but that’s a couple of months into the future. –Dirk

  26. Harkirat Singh Bedi Avatar

    Hi,
    Do you have something recent on this research, which i can cite to my clients as a reference ?
    Hope to hear from you soon.
    Thanks,
    Harkirat

  27. […] particularly liked that Red Hat uses Amit Deshpande’s and my work on the Total Growth of Open Source software as evidence of the significance (and significant growth) of open source. An added bonus is that our […]

  28. […] The graph above actually depicts the number of lines of source code (in the millions) added to open source projects over time.  The number of new projects is graphically similar.  Read more here. […]

  29. Rajeev Kumar Avatar
    Rajeev Kumar

    really open source is not only necessary but also inevitable.

  30. […] let’s think big. The Open Source community already has more than a billion lines of source code at its disposal, and it’s doubling every 12.5 months, so I think it’s fair to say […]

  31. […] design, development, licensing, distribution, and user engagement models used today in the top 5000 open source projects). When we talk about an “open source community” I think of this as more like a movement towards […]

  32. Dirk Riehle Avatar

    @Rob: Yes, we should have done it in the paper right away (we were hitting the page limit though). You can find the semi-log graphs in this total growth paper addendum post.

  33. Rob Avatar

    Semilog or log-lin graphing (log quantity vs linear time) show exponential rates as straight lines. Pretty handy- give it a try sometime.

  34. […] Software. Doch hier ist der Code nicht verfügbar — eben Closed Source Software. In der Diskussion vermutet Dirk Riehle, dass die Leistung von Open und Closed Source Softwareentwickler_innen […]

  35. […] The Total Growth of Open Source – Deshpande & Riehle Studied 5000 OS projects and shows that open source is expanding into new domains and applications at an exponential rate: doubling time of 14 month. Does not consider the implications and consequences . (tags: opensource research article 2008) […]

  36. Stephan Schmidt Avatar

    Excellent 🙂
    See you
    -stephan

  37. Dirk Riehle Avatar

    @Linus Vepstas: We are providing answers to your (and other people’s questions) in a separate blog post, the Addendum, follow the link or see above.

  38. Dirk Riehle Avatar

    @Tim Bunce: You may be right, but I don’t think these are the projects covered by in our database. Since these are by and large the top 5000 projects, they tend to be more mature with a real community.

  39. Dirk Riehle Avatar

    @Linus Vepstas: We are at it and will probably post it as a separate blog entry. Give us a bit time.

  40. Tim Bunce Avatar

    I believe a factor in the current rapid growth is a movement of code from private (often personal) code repositories to public ones. Moving from the old “email a patch to the author” development model, to a “commit a change to a branch” model. Sometimes the code moved doesn’t include all the old history so the project appears newer than it really is.
    Of course this is hard to quantify.

  41. Linas Vepstas Avatar
    Linas Vepstas

    Hi, could you redo the graphs as semi-log graphs? A semi-log graph would show exponential growth as a straight line, making it much easier to eye-ball what is going on.
    Also, for the conclusion, please restate the exponential as a half-life, i.e. power of 2 instead of power-of-e. From your data, I get that the number of lines of code is doubling every 15 months or so. Thats a much punchier conslusion than “exp 0.-46x” which may as well be greek to most readers.
    Thanks.

  42. Dirk Riehle Avatar

    @Peter Judge: One more thing. I don’t think everything is exponential. Programmers can only do so much work in a given week, and the number of programmers is definitely not growing exponentially—if it is growing at all! This means total growth of open + closed source is polynomial, perhaps not even better than linear.

  43. Dirk Riehle Avatar

    @Peter Judge: This is a very good question, and we have been struggling to get exactly that data so we can make a comparison. This paper shows the total growth, but the question is of course, growth on what base? Right now, the best we could reliably get were the revenue numbers. Getting total SLoC for all of open + closed source would give us a better indicator of how much of software development has already shifted to open source.

  44. Dirk Riehle Avatar

    @Stephen: It’s math that is telling us that an exponential model is the best fit. We tried different models, and using the r-square values as the indicator, we got the best fit with exponential models.
    One thing we would like to investigate next is to break up the history of open source into distinct phases. It is not clear that a single function is best in explaining total growth. Rather, open source may have gone through these different phases, each of which is best described with a different function.

  45. Peter Judge Avatar

    Do you have any estimates of the growth in proprietary code over the same period? Could it be the case that just “everything” in this business is exponential?

  46. Stephen Avatar
    Stephen

    You state without justification that “the best fitting model is an exponential curve”. However the line you’ve drawn in Figures 1 and 2 looks like a very poor fit. Honestly even a simple line crossing the x axis around 1998 looks like a better fit.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.