Much open source research assumes that all open source projects are alike and that if you take enough of them, you can claim generalizability for your conclusions. GitHub is the main source of such mischief, because of its size and availability.
If GitHub was like Berlin, and projects on GitHub were like the people of Berlin, then treating all projects the same is like saying that a person from Mitte is like a person from Kreuzberg is like a person from Spandau.
You can slice and dice as much as you want by how tall people are (say, lines of code), how wide they are (community size), or how old they are. It doesn’t tell you anything about their soul (community governance).
It is community governance that leads if not outright determines behavior. While people have suggested models for this, they are not used for classifying projects nor are they used as independent variables in analyses.
Try telling a Berliner that people from different neighborhoods have the same needs, have the same goals, are all the same…