What is the most common size of code contributions to open source? Maybe 30 lines of source code? 200 lines? Or just one line? What’s your guess?
In a recent paper on the commit size distribution in open source we show that the most common size of code contributions is one line of source code. Of all commits in our more than 8 million strong sample, one-line source code commits represent more than 12%, two-line commits represent 9%, and three-line commits represent 5.5% of all commits. The following figure shows this data.
In general, small commits dominate open source. The following figure shows commits of sizes 1-100 source code lines in our sample population. The 1-100 source code line commits make up more than 83% of all commits. As one can see, it is an almost strictly falling curve. In fact, the paper shows that this curve can be closely modeled by a power law. But for that you have to dig into the paper itself.
What are your thoughts? Are you surprised or is it obvious to? What theories are on your mind? Maybe we have the data to validate or invalidate your hypotheses.
Leave a Reply