From Developer Networks to Verified Communities: A Fine-Grained Approach

Abstract: Effective software engineering demands a coordinated effort. Unfortunately, a comprehensive view on developer coordination is rarely available to support software-engineering decisions, despite the significant implications on software quality, software architecture, and developer productivity. We present a fine-grained, verifiable, and fully automated approach to capture a view on developer coordination, based on commit information and source-code structure, mined from version-control systems. We apply methodology from network analysis and machine learning to identify developer communities automatically. Compared to previous work, our approach is fine-grained, and identifies statistically significant communities using order-statistics and a community-verification technique based on graph conductance. To demonstrate the scalability and generality of our approach, we analyze ten open-source projects with complex and active histories, written in various programming languages. By surveying 53 open-source developers from the ten projects, we validate the authenticity of inferred community structure with respect to reality. Our results indicate that developers of open-source projects form statistically significant community structures and this particular view on collaboration largely coincides with developers’ perceptions of real-world collaboration.

Keywords: Open source, social network analysis, developer networks, developer communities, respository mining, conductance

Reference: Mitchell Joblin, Wolfgang Mauerer, Sven Apel, Janet Siegmund, Dirk Riehle. “From Developer Networks to Verified Communities: A Fine-Grained Approach.” In Proceedings of the 37th International Conference on Software Engineering (ICSE 2015). IEEE Press, to appear.

The paper is available as a PDF file.

How Developers Acquire FLOSS Skills

Abstract: With the increasing prominence of open collaboration as found in free/libre/open source software projects and other joint production communities, potential participants need to acquire skills. How these skills are learned has received little research attention. This article presents a large-scale survey (5,309 valid responses) in which users and developers of the beta release of a popular file download application were asked which learning styles were used to acquire technical and social skills. We find that the extent to which a person acquired the relevant skills through informal methods tends to be higher for free/libre/open source code contributors, while being a professional software developer does not have this effect. Additionally, younger participants proved more likely to make use of formal methods of learning. These insights will help individuals, commercial companies, educational institutions, governments and open collaborative projects decide how they promote learning.

Keywords: Competencies, informal learning, non-formal learning, open source, skills, software developer

Reference: Ann Barcomb, Michael Grottke, Jan-Philipp Stauffert, Dirk Riehle, Sabrina Jahn. “How Developers Acquire FLOSS Skills.” In Proceedings of the 11th International Conference on Open Source Systems (OSS 2015). Springer Verlag, to appear.

The paper is available as a PDF file.

Improving Traceability of Requirements through Qualitative Data Analysis

Abstract: Traceability is an important quality aspect in modern software development. It facilitates the documentation of decisions and helps identifying conflicts regarding the conformity of one artifact to another. We propose a new approach to requirements engineering that utilizes qualitative research methods, which have been well established in the domain of social science. Our approach integrates traceability between the original documentation and the requirements specification and the domain model and glossary and supports adaptability to change.

Keywords: Requirements analysis, requirements traceability, qualitative data analysis

Reference: Andreas Kaufmann, Dirk Riehle. “Improving Traceability of Requirements through Qualitative Data Analysis.” In Proceedings of the 2015 Software Engineering Konferenz (SE 2015). Springer Verlag, to appear.

The paper is available as a PDF file.

The Wall Street Journal and Berlin Reporting

The Wall Street Journal provides a nice infographic on the “billion dollar club“, that is, startups of valuation $1B or above. In Europe, the WSJ counts six >$1B startups, of which one is in Amsterdam (Adyen), one is in Stockholm (Spotify) and two are in London (Powa, Shazam) and Berlin (Delivery Hero, Home24). In addition, the WSJ lists two more Berlin-based companies (Rocket Internet, Zalando), which exited (to the public markets). So 50% of companies worth counting are based out of Berlin.

Now comes the print version of the WSJ with a Feb 19, 2015, article on “Europe’s Tech Startup Landscape for 2015″. The writer of the article discusses the general situation and then proceeds to present Wall Street Journal’s non-obvious picks of companies to watch out for, or in their words, “a useful map of the EMEA tech startup landscape for 2015″. The twelve company list is all over the map and includes Stockholm (KnCMiner), Berlin (Wooga), Serbia (Nordeus), Stockholm (Klarna, Truecaller), France (SigFox, Withings), Great Britian (Kano, Oxbotica), and Israel (SiSense, Fiverr, Lumus).

Somehow this list of companies to watch out does not vibe with the $1B club… If anything, Berlin has been taking up much more speed from the time those in the $1B club were founded. Go figure.

Internal vs. External Validity of Research Funding

So far, most of my research funding has been from industry. Sometimes, I have to defend myself against colleagues who argue that their public funding is somehow superior to my industry funding. This is only a sentiment; they have not been able to give any particular reason for their position.

I disagree with this assessment, and for good reason. These two types of funding are not comparable and ideally you have both.

In research, there are several quality criteria, of which the so-called internal and external validity of a result are two important ones.

  • Internal validity, simplifying, is a measure of how a result is consistent within itself. Are there any contradictions within the result itself? If not so, than you may have high internal validity (which is good).
  • External validity, simplifying, is a measure of how a result is predictive or representative of the reality outside the original research data. If so, your result may have high external validity, which is also good.

Public grants have high internal validity but low external validity. In contrast, industry funding has high external validity, but low internal validity. The following figure illustrates this point:

Continue reading

Public Upcoming Talks on Open Source and Inner Source

A bit belated, I’m happy to announce two upcoming talks:

  • Tomorrow, 2015-02-05, 16:00, at Mills college (California Bay Area, United States) (flyer) about Sustainable Open Source
  • On 2015-02-19 at Lero, the Irish Software Engineering Research Centre (Galway, Ireland) (flyer) about Inner Source at SAP

Both talks are accessible to the public, see the flyers.