We just finished redoing our original analysis of paid vs. volunteer work in open source for Gitee, a Chinese-dominated code hosting platform from China. We wanted to understand where China stands in open source. Previous blog posts looked at base data, e.g. the half/half split between paid and volunteer work, as well as developer behavior, e.g. that dominantly paid developers still volunteer in their spare time.
In this third and final blog post, I would like to look at projects and how commercially dominated (or not) they are. For the purposes of this analysis, a developer is a (pure) paid developer, if 95% or more of their commits are done during regular working hours, and a developer is a (pure) volunteer, if 95% or more of their commits are done outside of these working hours. Obviously, this is a very conservative definition. How commercial a project is then depends on the percentage of (pure) paid developers and how non-commercial depends on the percentage of (pure) volunteer developers. The following figures shows how many projects exist for the percentage distributions of either pure paid or pure volunteer developers. Please observe the logarithmic y-axis.
Again, we can observe the extremes dominating the data. The largest bin is always the other projects (other pure plus mixed), but the second largest bin are already the pure-breed projects, either purely commercial or purely non-commercial. The absolute numbers are equal, as might have been suspected from the half/half split in paid vs. volunteer work overall. Thus, we might suspect that paid developers tend to band together and to define commercial projects and crowd out volunteer developers, and that volunteer developers define their own projects without paid contributions.
But is this generally true? The following figure displays the contributions of mixed developers, who work both during paid and volunteer time, for large projects. (Here, a large project is a project with more than a thousand commits.)
Looking at his data, we find that the larger the project, the more dominant the mixed pattern of working both during regular and outside regular working hours. Thus, the larger the project, the broader its draw on all types of developers. Given that size and speed perhaps correlates with staying power, this is a good reminder of the need for a healthy and diverse community of contributors.