We just finished redoing our original analysis of paid vs. volunteer work in open source for Gitee, a Chinese-dominated code hosting platform from China. We wanted to understand where China stands in open source. Previous blog posts looked at base data, e.g. the half/half split between paid and volunteer work, as well as developer behavior, e.g. that dominantly paid developers still volunteer in their spare time.
In this third and final blog post, I would like to look at projects and how commercially dominated (or not) they are. For the purposes of this analysis, a developer is a (pure) paid developer, if 95% or more of their commits are done during regular working hours, and a developer is a (pure) volunteer, if 95% or more of their commits are done outside of these working hours. Obviously, this is a very conservative definition. How commercial a project is then depends on the percentage of (pure) paid developers and how non-commercial depends on the percentage of (pure) volunteer developers. The following figures shows how many projects exist for the percentage distributions of either pure paid or pure volunteer developers. Please observe the logarithmic y-axis.
We just finished redoing our original analysis of paid vs. volunteer work in open source for Gitee, a Chinese-dominated code hosting platform from China. We wanted to understand where China stands in open source. The previous blog post explained the half / half split between paid vs. volunteer time in terms of total work on open source.
So far, we only discussed commits, now I would like to discuss committer behavior, in particular, whether there are pure paid developers, who only work Mon-Fri, 9am-5pm, i.e. during regular working hours, and pure volunteers, working only outside those hours. Compared to our data for the Western world, the Chinese data is less conclusive. The following figure bins developers into the respective categories, and the following table spells out the bins (categories) explicitly. For the figure, please note the logarithmic scale of the y-axis.
In 2014 we published a study on paid vs. volunteer work in open source, using a representative sample of open source projects from 2008 (i.e. before GitHub). In 2008, open source activity was decidedly Western, with little contributions from China. In 2017, I finally found a student to redo the analysis for China. More specifically, the student was to use what we had identified as the most popular Chinese language code hosting platform and perform the same analysis we had done years earlier. In this sequence of blog posts, I’ll present some of his results. The full thesis can be found on my research group’s blog.
The analysis is based on data from Gitee, a Chinese-language code hosting platform hosted in China, and one of the leading platforms. A first interesting piece of data is that despite its decidedly Chinese focus, 22.4% of all committers to Gitee projects work overseas. They may well be Chinese (at least they are capable of reading and writing Chinese), and I find this number surprisingly large, but we don’t know more than that.
Most interestingly, but perhaps not surprisingly, the weekly work pattern on Gitee is similar to the one in the Western world. The following figure displays this work rhythm. As we can see, work intensity is highest Monday to Friday during regular working hours, similar to Western work patterns.