Latest in Industry and Research Publications
-
A systematic review of common beginner programming mistakes in data engineering [CSEE&T 2025]
Abstract The design of effective programming languages, libraries, frameworks, tools, and platforms for data engineering strongly depends on their ease and correctness of use. Anyone who ignores that it is humans who use these tools risks building tools that are useless, or worse, harmful. To ensure our data engineering tools are based on solid foundations,…
-
An empirical study of Jayvee, a domain-specific language for data engineering, on understanding data pipeline architectures [SP&E Journal]
Abstract A large part of data science projects is spent on data engineering. Especially in open data contexts, data quality issues are prevalent and are often tackled by non-professional programmers. We introduce and evaluate Jayvee, a domain-specific language for data engineering aimed at reducing barriers to building data pipelines. We show that a structured DSL…
-
Open-source software: The ultimate in reuse or a risk not worth taking? (Mead et al., IEEE Computer)
I’m happy to report that the 33rd article in the open source column of IEEE Computer has been published. As always, please consider writing an article proposal! Title Open-source software: The ultimate in reuse or a risk not worth taking? Keywords None Authors Nancy R. Mead, Carol Woody, Scott Hissam Publication Computer vol. 58, no.…
-
Why and how do organizations create user-led open source consortia? [INFSOF Journal]
Abstract Context User-led open source (OS) consortia (foundations) consist of organizations from industries beyond the software industry collaborating to create open-source software solutions for their internal processes. Initially pioneered by higher education organizations in the 2000s, this concept has gained traction in recent years across various industries. Objective This study has two research objectives (ROs).…
-
“Two hard things in computer science” explained
You may have heard the saying “There are only two hard things in computer science: Cache invalidation and naming things.” The web and Martin Fowler attribute this saying to one Phil Karlton; I actually thought it was Leslie Lamport, but maybe I’m confusing it with this saying “A distributed system is one in which the…
-
Governance practices for open source foundations in the healthcare sector [ICSOB 2024]
Abstract Open source (OS) foundations are non-profit organizations that support open-source software development projects. OS foundations can be categorized based on their membership and governance structures. This study focuses on vendor-led and user-led OS foundations operating in the healthcare sector. The study has two objectives. The first objective is to explore the similarities and differences…