<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Software Research and the Industry &#187; Open Source</title>
	<atom:link href="http://dirkriehle.com/category/open-collaboration/open-source/feed/" rel="self" type="application/rss+xml" />
	<link>http://dirkriehle.com</link>
	<description>Dirk Riehle&#039;s blog about everything computer science, applied and more</description>
	<lastBuildDate>Sun, 05 Feb 2012 20:26:14 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Why Open Source is Good for German Software Businesses</title>
		<link>http://dirkriehle.com/2012/01/16/why-open-source-is-good-for-german-software-businesses/</link>
		<comments>http://dirkriehle.com/2012/01/16/why-open-source-is-good-for-german-software-businesses/#comments</comments>
		<pubDate>Mon, 16 Jan 2012 21:42:01 +0000</pubDate>
		<dc:creator>Dirk Riehle</dc:creator>
				<category><![CDATA[Open Source]]></category>

		<guid isPermaLink="false">http://dirkriehle.com/?p=2760</guid>
		<description><![CDATA[I&#8217;m on the expert advisory committee of one of the German parties for the current &#8220;Internet Enquette&#8221;, a commission tasked by the German parliament with suggesting future directions for Germany&#8217;s stance toward the Internet and everything digital. At a meeting &#8230; <a href="http://dirkriehle.com/2012/01/16/why-open-source-is-good-for-german-software-businesses/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m on the expert advisory committee of one of the German parties for the current &#8220;Internet Enquette&#8221;, a commission tasked by the German parliament with suggesting future directions for Germany&#8217;s stance toward the Internet and everything digital. At a meeting this evening, a lobbyist confided in me: &#8220;Open source is bad for German software vendors!&#8221; I gasped. He couldn&#8217;t be further from the truth. If this was mechanical engineering or electrical engineering, he&#8217;d be right. ME? EE? Germany is top. Software? Not so. Beyond a few selected highlights, Germany is an also-ran internationally. When it comes to software product businesses, German companies would benefit significantly if the dice would be rolled again. Anything that upsets the current order can only be an improvement over the current state of affairs. Open source does just that. More power to open source business models!</p>
]]></content:encoded>
			<wfw:commentRss>http://dirkriehle.com/2012/01/16/why-open-source-is-good-for-german-software-businesses/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Top-Cited Research Articles on This Site</title>
		<link>http://dirkriehle.com/2012/01/14/top-cited-research-articles-on-this-site/</link>
		<comments>http://dirkriehle.com/2012/01/14/top-cited-research-articles-on-this-site/#comments</comments>
		<pubDate>Sat, 14 Jan 2012 09:24:51 +0000</pubDate>
		<dc:creator>Dirk Riehle</dc:creator>
				<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Software Engineering]]></category>

		<guid isPermaLink="false">http://dirkriehle.com/?p=2753</guid>
		<description><![CDATA[According to Google Scholar, in terms of citations, my leading research paper is: Understanding and using patterns in software development (with Heinz Züllighoven) It just reached the 200-citation boundary. Hard on its heels are these: Role model based framework design &#8230; <a href="http://dirkriehle.com/2012/01/14/top-cited-research-articles-on-this-site/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>According to <a href="http://scholar.google.com/citations?user=LUd2FkUAAAAJ">Google Scholar</a>, in terms of citations, my leading research paper is:</p>
<ul>
<li><a href="http://dirkriehle.com/computer-science/research/1996/tapos-1996-survey.html">Understanding and using patterns in software development</a> (with Heinz Züllighoven)
</ul>
<p>It just reached the 200-citation boundary. Hard on its heels are these:</p>
<p><span id="more-2753"></span></p>
<ul>
<li><a href="http://dirkriehle.com/computer-science/research/1998/oopsla-1998.html">Role model based framework design and integration</a> (with Thomas Gross)
<li><a href="http://dirkriehle.com/computer-science/research/1997/oopsla-1997.html">Composite design patterns</a>
<li><a href="http://dirkriehle.com/computer-science/research/2000/plopd-4.html">Role object</a> (with Dirk Bäumer, Wolf Siberski, and Martina Wulf)
</ul>
<p>The fastest growing paper (in terms of citations) is this 2007 paper:</p>
<ul>
<li><a href="http://dirkriehle.com/computer-science/research/2007/computer-2007.html">The economic motivation of open source software: Stakeholder perspectives</a>
</ul>
<p>The &#8220;leading&#8221; papers are all older papers, as implied by using citations as a measure of relevance. Of course I&#8217;m looking forward to my new open source publications catching up on the  software engineering papers. Now back to my employer&#8217;s year end report, sigh.</p>
]]></content:encoded>
			<wfw:commentRss>http://dirkriehle.com/2012/01/14/top-cited-research-articles-on-this-site/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Developer Belief vs. Reality: The Case of the Commit Size Distribution</title>
		<link>http://dirkriehle.com/2011/12/17/developer-belief-vs-reality-the-case-of-the-commit-size-distribution/</link>
		<comments>http://dirkriehle.com/2011/12/17/developer-belief-vs-reality-the-case-of-the-commit-size-distribution/#comments</comments>
		<pubDate>Sat, 17 Dec 2011 14:04:11 +0000</pubDate>
		<dc:creator>Dirk Riehle</dc:creator>
				<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Publication]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Software Engineering]]></category>

		<guid isPermaLink="false">http://dirkriehle.com/?p=2698</guid>
		<description><![CDATA[Abstract:&#160;The design of software development tools follows from what the developers of such tools believe is true about software development. A key aspect of such beliefs is the size of code contributions (commits) to a software project. In this paper, &#8230; <a href="http://dirkriehle.com/2011/12/17/developer-belief-vs-reality-the-case-of-the-commit-size-distribution/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><strong>Abstract:</strong>&nbsp;The design of software development tools follows from what the developers of such tools believe is true about software development. A key aspect of such beliefs is the size of code contributions (commits) to a software project. In this paper, we show that what tool developers think is true about the size of code contributions is different by more than an order of magnitude from reality. We present this reality, called the commit size distribution, for a large sample of open source and selected closed source projects. We suggest that these new empirical insights will help improve software development tools by aligning underlying design assumptions closer with reality.</p>
<p><strong>Reference:</strong>&nbsp;Dirk Riehle, Carsten Kolassa, Michel A. Salim. &#8220;Developer Belief vs. Reality: The Case of the Commit Size Distribution.&#8221; In <i>Proceedings of Software Engineering 2012</i> (SE &#8217;12). Springer Verlag, 2012.</p>
<p>The paper is available as a <a href="/wp-content/uploads/2011/11/se12-sdbr-v14-short-rev-v4-final1.pdf">PDF file</a>. The survey used in the paper is also available as a <a href="/wp-content/uploads/2011/11/Survey-Printout.pdf">PDF file</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://dirkriehle.com/2011/12/17/developer-belief-vs-reality-the-case-of-the-commit-size-distribution/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Business Risks and Governance of Open Source in Software Products (in German)</title>
		<link>http://dirkriehle.com/2011/12/17/business-risks-and-governance-of-open-source-in-software-products-in-german/</link>
		<comments>http://dirkriehle.com/2011/12/17/business-risks-and-governance-of-open-source-in-software-products-in-german/#comments</comments>
		<pubDate>Sat, 17 Dec 2011 14:03:22 +0000</pubDate>
		<dc:creator>Dirk Riehle</dc:creator>
				<category><![CDATA[Industry]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Publication]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Software Engineering]]></category>

		<guid isPermaLink="false">http://dirkriehle.com/?p=2712</guid>
		<description><![CDATA[Titel:&#160;Geschäftsrisiken und Governance von Open-Source in Softwareprodukten Zusammenfassung:&#160;In fast jedem Softwareprodukt, auch in großer Standardsoftware, sind heute Open-Source-Komponenten enthalten. Die Hersteller dieser Software müssen die Geschäftsrisiken, die mit der Integration von Open-Source-Software in kommerzielle Produkte verbunden sind, verstehen und vernünftig &#8230; <a href="http://dirkriehle.com/2011/12/17/business-risks-and-governance-of-open-source-in-software-products-in-german/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><strong>Titel:</strong>&nbsp;Geschäftsrisiken und Governance von Open-Source in Softwareprodukten</p>
<p><strong>Zusammenfassung:</strong>&nbsp;In fast jedem Softwareprodukt, auch in großer Standardsoftware, sind heute Open-Source-Komponenten enthalten. Die Hersteller dieser Software müssen die Geschäftsrisiken, die mit der Integration von Open-Source-Software in kommerzielle Produkte verbunden sind, verstehen und vernünftig managen. Dieser Artikel zeigt ein Modell verschiedener rechtlicher, technischer und sozialer Risiken auf, die durch unkontrollierten Einsatz von Open-Source-Software entstehen und erläutert ausgewählte Erfolgsmethoden der Open-Source-Governance, die von führenden Firmen angewandt werden. Das Modell ist das Analyseergebnis von fünf mit großen deutschen Softwareherstellern geführten Interviews sowie weiterer Literaturrecherche.</p>
<p><span id="more-2712"></span></p>
<p><strong>Stichwörter:</strong>&nbsp;Open-Source-Komponenten, Open-Source-Governance, Geistiges Eigentum, Code-Scanner, Softwareprodukte</p>
<p><strong>Referenz:</strong>&nbsp;Martin Helmreich, Dirk Riehle. &#8220;Geschäftsrisiken und Governance von Open-Source in Softwareprodukten&#8221;. In <i>Praxis der Wirtschaftsinformatik</i> (HMD 283) 49. Jahrgang, Februar 2012.</p>
<h1>Inhaltsübersicht</h1>
<ol>
<li>Open-Source-Komponenten in kommerziellen Produkten</li>
<li>Methodisches Vorgehen</li>
<li>Grundlagen zum geistigen Eigentum</li>
<li>Identifizierte Geschäftsrisiken
<ol>
<li>Unkontrollierter und ungeregelter Einsatz von Open-Source-Komponenten</li>
<li>Aktive Beiträge in der Open-Source-Community</li>
<li>Verwicklung in ein Gerichtsverfahren</li>
<li>Verpflichtung, Source-Code offenzulegen</li>
<li>Verurteilung wegen einer Patentverletzung</li>
</ol>
<li>Beispiele für Erfolgsmethoden</li>
<ol>
<li>Überwachung der Lieferantenschnittstelle</li>
<li>Einsatz von Code-Scannern</li>
<li>Entwicklerausbildung</li>
</ol>
<li>Integration in den Entwicklungszyklus</li>
<li>Literatur</li>
</ol>
<p>Der Artikel ist zur Zeit nicht frei verfügbar. Sie können aber über mich eine Vorabversion erhalten. Dazu nehmen Sie bitte Email-<a href="/about/contact/">Kontakt</a> mit mir auf. Sechs Monate nach Veröffentlichung wird der Artikel dann hier direkt als PDF zur Verfügung stehen.</p>
<p><!-- Der Artikel ist als a $href="/wp-content/uploads/2011/12/HMD-283-Web.pdf$PDF Datei$/a$ verfügbar.--></p>
]]></content:encoded>
			<wfw:commentRss>http://dirkriehle.com/2011/12/17/business-risks-and-governance-of-open-source-in-software-products-in-german/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Call for Papers: OSS 2012</title>
		<link>http://dirkriehle.com/2011/12/07/call-for-papers-oss-2012/</link>
		<comments>http://dirkriehle.com/2011/12/07/call-for-papers-oss-2012/#comments</comments>
		<pubDate>Wed, 07 Dec 2011 16:37:54 +0000</pubDate>
		<dc:creator>Dirk Riehle</dc:creator>
				<category><![CDATA[Announcement]]></category>
		<category><![CDATA[Industry]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Software Engineering]]></category>

		<guid isPermaLink="false">http://dirkriehle.com/?p=2706</guid>
		<description><![CDATA[For your convenience, the OSS 2012 call for papers (I’m on the program committee). THE 8th INTERNATIONAL CONFERENCE ON OPEN SOURCE SYSTEMS Hammamet, Tunisia, 10-13 September 2012 Scope of OSS 2012 Over the past two decades, Free/Libre Open Source Software &#8230; <a href="http://dirkriehle.com/2011/12/07/call-for-papers-oss-2012/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>For your convenience, the OSS 2012 call for papers (I’m on the program committee).</p>
<hr />
<h1><a href="http://oss2012.org/">THE 8th INTERNATIONAL CONFERENCE ON OPEN SOURCE SYSTEMS</a></h1>
<p>Hammamet, Tunisia, 10-13 September 2012</p>
<h2>Scope of OSS 2012</h2>
<p>Over the past two decades, Free/Libre Open Source Software (FLOSS) has introduced new successful models for creating, distributing, acquiring and using software and software-based services. Inspired by the success of FLOSS, other forms of open initiatives have been gaining momentum. Open source systems (OSS) now extend beyond software to include open access, open documents, open science, open education, open government, open cloud, open hardware, open artworks and museum exhibits, open innovation and more. On the one hand, the openness movement has created new kinds of opportunities such as the emergence of new business models, knowledge exchange mechanisms, and collective development approaches. On the other hand, the movement has introduced new kinds of challenges, especially as different problem domains embrace openness as a pervasive problem solving strategy. OSS can be complex yet widespread and often cross-cultural. Consequently, they require an interdisciplinary understanding of their technical, economic, legal and socio-cultural dynamics.</p>
<p><span id="more-2706"></span></p>
<p>The goal of 8th International Conference on Open Source Systems, OSS 2012, the first to be held in Africa, is to provide an international forum where a diverse community of professionals from academia, industry and public sector, and diverse OSS initiatives can come together to share research findings and practical experiences. The conference is also meant to provide information and education to practitioners, identify directions for further research, and to be an ongoing platform for technology transfer, no matter which form of OSS is being pursued.</p>
<p>OSS 2012 accepts submissions in the following categories: research papers, industry papers, formal tool demonstrations, lightning talks and posters. OSS 2012 also invites proposals for tutorials and workshops, submissions to the doctoral symposium, and submissions of panels. Accepted papers will be included in the conference proceedings, which are published by Springer. The major conference theme is long-term sustainability with OSS.</p>
<h2>Topics of Interest</h2>
<h3>OSS sustainability</h3>
<ul>
<li>Sustainability models of OSS</li>
<li>Building sustainable OSS communities</li>
<li>Role of OSS in ICT and sustainable development</li>
<li>Mining sustainability related data from OSS communities</li>
<li>Experience reports and lessons on sustainable OSS ecosystems</li>
</ul>
<h3>OSS as innovation</h3>
<ul>
<li>Adoption/ use / acceptance of OSS</li>
<li>Dissemination / redistribution / crowdsourcing of OSS systems</li>
<li>Expanding scientific research and technology development methods through openness</li>
<li>Adopting innovation in OSS projects</li>
</ul>
<h3>OSS practices and methods</h3>
<ul>
<li>OSS and traditional / agile development methods</li>
<li>OSS and decentralized development</li>
<li>Knowledge and documentation management in OSS</li>
</ul>
<h3>OSS technologies</h3>
<ul>
<li>OSS over the Internet</li>
<li>Security of OSS</li>
<li>Interoperability / portability / scalability of OSS</li>
<li>Open standards / open data / open cloud / open hardware / open exhibits</li>
<li>Reuse in OSS</li>
<li>OSS for entertainment</li>
<li>OSS for education</li>
<li>Architecture and design of OSS</li>
</ul>
<h3>Economic / organizational / social issues on OSS</h3>
<ul>
<li>Economic analysis of OSS</li>
<li>Business models of OSS</li>
<li>Maturity models of OSS</li>
<li>OSS in public sector</li>
<li>OSS intellectual property, copyrights and licensing</li>
<li>Non-Governmental Organizations and OSS</li>
</ul>
<h2>Important Dates (Deadlines)</h2>
<ol>
<li>Submissions due: March 9, 2012</li>
<li>Workshop proposals: March 16, 2012</li>
<li>Panels and tutorials proposals: May 25, 2012</li>
<li>Results to authors: April 13, 2012</li>
<li>Camera-ready copy due: May 11, 2012</li>
<li>Early registration: June 15, 2012</li>
</ol>
<h2>Submission</h2>
<p>Upload contributions in PDF format at http://oss2012.org/.</p>
<h2>Organization</h2>
<h3>General Chairs</h3>
<ul>
<li>Walt Scacchi, University of California, Irvine, USA</li>
<li>Tommi Mikkonen, Tampere University of Technology, Finland</li>
</ul>
<h3>Program Chairs</h3>
<ul>
<li>Imed Hammouda, Tampere University of Technology, Finland</li>
<li>Björn Lundell, University of Skövde, Sweden</li>
</ul>
<h3>Local Organizing Chairs</h3>
<ul>
<li>Said Ouerghi, University of Manouba, Tunisia</li>
<li>Khaled Sammoud, University of Tunis el Manar, Tunisia</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://dirkriehle.com/2011/12/07/call-for-papers-oss-2012/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cloud Computing is not a Business Model</title>
		<link>http://dirkriehle.com/2011/08/11/cloud-computing-is-not-a-business-model/</link>
		<comments>http://dirkriehle.com/2011/08/11/cloud-computing-is-not-a-business-model/#comments</comments>
		<pubDate>Thu, 11 Aug 2011 07:34:29 +0000</pubDate>
		<dc:creator>Dirk Riehle</dc:creator>
				<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Industry]]></category>
		<category><![CDATA[Open Source]]></category>

		<guid isPermaLink="false">http://dirkriehle.com/?p=2596</guid>
		<description><![CDATA[I&#8217;m at the Dagstuhl Seminar &#8220;Information Management in the Cloud&#8221; where I keynoted about cloud computing businesses models. Given that I&#8217;m hardly a cloud computing expert this may seem like a stretch, however, the organizers had asked me to talk &#8230; <a href="http://dirkriehle.com/2011/08/11/cloud-computing-is-not-a-business-model/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m at the <a href="http://www.dagstuhl.de">Dagstuhl</a> Seminar &#8220;Information Management in the Cloud&#8221; where I keynoted about cloud computing businesses models. Given that I&#8217;m hardly a cloud computing expert this may seem like a stretch, however, the organizers had asked me to talk about my open source experience and relate this to cloud computing. This perspective turned out to be surprisingly fruitful. By realizing that both open source and cloud computing are disruptive innovations that enable a new generation of business models, I believe I was able to draw reasonable conclusions on the future of cloud computing from the history of open source. I reason by analogy, and here are the main conclusions: </p>
<p><span id="more-2596"></span></p>
<ol>
<li>Cloud computing, like open source, is not a business model in itself, but an enabler of business models</li>
<li>Cloud computing is not a business model but a distribution (read: sales and marketing) strategy</li>
<li>Cloud computing, like open source, will have a novel type of business model built solely from commodities (distributors and utility computing, respectively)</li>
<li>Cloud computing, like open source, will have a novel type of business model using proprietary software (single-vendor/open core and single-source clouds, respectively)</li>
<li>Truly new businesses built using cloud computing need to educate their customers, i.e. rapidly grow the market; while doing that it is a landgrab</li>
<li>Cloud computing, like open source, will be commoditized over time, where a commoditization frontier drives an innovation frontier to keep expanding</li>
<li>Open source and cloud computing work synergistically, helping each other, as examples like SugarCRM show</li>
</ol>
<p>I expect 2. above to be most controversial. That&#8217;s because many cloud experts talk about cost of providing the cloud service first before they talk about customer value, implying that customer value is a consequence of cost. Which is obviously getting it backwards. The core cloud computing customer values of try-before-you-buy, pay-as-you-go, higher quality of service, etc. are enabled by novel technology, which can also come with a lower cost structure. </p>
<p>Cloud computing is a sales and distribution strategy because the fine-grain provision and releasing of resources and the matching fine-grain pricing schedule drive adoption of cloud services through the line-of-business rather than the IT department. Open source strategy, anyone?</p>
<p>The slides + notes from the talk are available as PDFs (<a href="/wp-content/uploads/2011/08/Business-Model-Slides-Web.pdf">slides</a>, <a href="/wp-content/uploads/2011/08/Business-Model-Notes-Web.pdf">slides + notes</a>). I recommend you read the <a href="/wp-content/uploads/2011/08/Business-Model-Notes-Web.pdf">slides + notes</a> version.</p>
]]></content:encoded>
			<wfw:commentRss>http://dirkriehle.com/2011/08/11/cloud-computing-is-not-a-business-model/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Design and Implementation of the Sweble Wikitext Parser: Unlocking the Structured Data of Wikipedia</title>
		<link>http://dirkriehle.com/2011/07/29/design-and-implementation-of-the-sweble-wikitext-parser/</link>
		<comments>http://dirkriehle.com/2011/07/29/design-and-implementation-of-the-sweble-wikitext-parser/#comments</comments>
		<pubDate>Fri, 29 Jul 2011 15:47:45 +0000</pubDate>
		<dc:creator>Dirk Riehle</dc:creator>
				<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Publication]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Software Engineering]]></category>
		<category><![CDATA[Wikimedia]]></category>
		<category><![CDATA[Wikis]]></category>

		<guid isPermaLink="false">http://dirkriehle.com/?p=2581</guid>
		<description><![CDATA[Abstract:&#160;The heart of each wiki, including Wikipedia, is its content. Most machine processing starts and ends with this content. At present, such processing is limited, because most wiki engines today cannot provide a complete and precise representation of the wiki’s &#8230; <a href="http://dirkriehle.com/2011/07/29/design-and-implementation-of-the-sweble-wikitext-parser/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><strong>Abstract:</strong>&nbsp;The heart of each wiki, including Wikipedia, is its content. Most machine processing starts and ends with this content. At present, such processing is limited, because most wiki engines today cannot provide a complete and precise representation of the wiki’s content. They can only generate HTML. The main reason is the lack of well-defined parsers that can handle the complexity of modern wiki markup. This applies to MediaWiki, the software running Wikipedia, and most other wiki engines. This paper shows why it has been so difficult to develop comprehensive parsers for wiki markup. It presents the design and implementation of a parser for Wikitext, the wiki markup language of MediaWiki. We use parsing expression grammars where most parsers used no grammars or grammars poorly suited to the task. Using this parser it is possible to directly and precisely query the structured data within wikis, including Wikipedia. The parser is available as open source from <a href="http://sweble.org">http://sweble.org</a>.</p>
<p><strong>Keywords:</strong>&nbsp;Wiki, Wikipedia, Wiki Parser, Wikitext Parser, Parsing Expression Grammar, PEG, Abstract Syntax Tree, AST, WYSIWYG, Sweble.</p>
<p><strong>Reference:</strong>&nbsp;Hannes Dohrn and Dirk Riehle. &#8220;Design and Implementation of the Sweble Wikitext Parser: Unlocking the Structured Data of Wikipedia.&#8221; In <em>Proceedings of the 7th International Symposium on Wikis and Open Collaboration</em> (WikiSym 2011). ACM Press, 2011.</p>
<p>The paper is available as a <a href="/wp-content/uploads/2011/07/diwp.pdf">PDF file</a> (preprint).</p>
]]></content:encoded>
			<wfw:commentRss>http://dirkriehle.com/2011/07/29/design-and-implementation-of-the-sweble-wikitext-parser/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Technical Report on WOM: An Object Model for Wikitext</title>
		<link>http://dirkriehle.com/2011/07/29/technical-report-on-wom-an-object-model-for-wikitext/</link>
		<comments>http://dirkriehle.com/2011/07/29/technical-report-on-wom-an-object-model-for-wikitext/#comments</comments>
		<pubDate>Fri, 29 Jul 2011 15:40:08 +0000</pubDate>
		<dc:creator>Dirk Riehle</dc:creator>
				<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Publication]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Software Engineering]]></category>
		<category><![CDATA[Wikimedia]]></category>
		<category><![CDATA[Wikis]]></category>

		<guid isPermaLink="false">http://dirkriehle.com/?p=2576</guid>
		<description><![CDATA[Abstract:&#160;Wikipedia is a rich encyclopedia that is not only of great use to its contributors and readers but also to researchers and providers of third party software around Wikipedia. However, Wikipedia&#8217;s content is only available as Wikitext, the markup language &#8230; <a href="http://dirkriehle.com/2011/07/29/technical-report-on-wom-an-object-model-for-wikitext/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><strong>Abstract:</strong>&nbsp;Wikipedia is a rich encyclopedia that is not only of great use to its contributors and readers but also to researchers and providers of third party software around Wikipedia. However, Wikipedia&#8217;s content is only available as Wikitext, the markup language in which articles on Wikipedia are written, and whoever needs to access the content of an article has to implement their own parser or has to use one of the available parser solutions. Unfortunately, those parsers which convert Wikitext into a high-level representation like an abstract syntax tree (AST) define their own format for storing and providing access to this data structure. Further, the semantics of Wikitext are only defined implicitly in the MediaWiki software itself. This situation makes it difficult to reason about the semantic content of an article or exchange and modify articles in a standardized and machine-accessible way. To remedy this situation we propose a markup language, called XWML, in which articles can be stored and an object model, called WOM, that defines how the contents of an article can be read and modified.</p>
<p><strong>Keywords:</strong>&nbsp;Wiki, Wikipedia, Wikitext, Wikitext Parser, Open Source, Sweble, Mediawiki, Mediawiki Parser, XWML, HTML, WOM</p>
<p><strong>Reference:</strong>&nbsp;Hannes Dohrn and Dirk Riehle. <em>WOM: An Object Model for Wikitext.</em> University of Erlangen, Technical Report CS-2011-05 (July 2011).</p>
<p>The technical report is available as a <a href="/wp-content/uploads/2011/07/wom-tr.pdf">PDF file</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://dirkriehle.com/2011/07/29/technical-report-on-wom-an-object-model-for-wikitext/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>On the Open Cloud Principles: Every Real-World Specification is an Underspecification</title>
		<link>http://dirkriehle.com/2011/07/29/on-the-open-cloud-principles-every-real-world-specification-is-an-underspecification/</link>
		<comments>http://dirkriehle.com/2011/07/29/on-the-open-cloud-principles-every-real-world-specification-is-an-underspecification/#comments</comments>
		<pubDate>Fri, 29 Jul 2011 10:32:11 +0000</pubDate>
		<dc:creator>Dirk Riehle</dc:creator>
				<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Industry]]></category>
		<category><![CDATA[Open Content]]></category>
		<category><![CDATA[Open Data]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Software Engineering]]></category>

		<guid isPermaLink="false">http://dirkriehle.com/?p=2557</guid>
		<description><![CDATA[Trying to wrap my head around the Open Cloud Principles put out by the revamp of the Open Cloud Initiative, I&#8217;m happy to note that software engineering research has something to say to the challenges these principles will face. Every &#8230; <a href="http://dirkriehle.com/2011/07/29/on-the-open-cloud-principles-every-real-world-specification-is-an-underspecification/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Trying to wrap my head around the <a href="http://www.opencloudinitiative.org/principles">Open Cloud Principles</a> put out by the revamp of the <a href="http://www.opencloudinitiative.org/">Open Cloud Initiative</a>, I&#8217;m happy to note that software engineering research has something to say to the challenges these principles will face.</p>
<blockquote><p>Every real-world specification is an underspecification.</p></blockquote>
<p>So, well, I say that, but I doubt that I&#8217;m the first one to have learned this from 30+ years of software engineering research. This principle leads us directly to the challenges anyone is facing who is trying to be truthful to the intentions behind the Open Cloud Principles.</p>
<p><span id="more-2557"></span></p>
<p>The principles ask that all data be available using open formats and accessible through open interfaces, all based on open standards. If so, a cloud computing provider can call its services an open cloud. The intention is right. The issue is open standards, though. The hope is that you could completely specify format of and access to data and that it can be replicated by another cloud provider. Which is not going to play out that easily.</p>
<h1>Underspecification</h1>
<p>Given that every specification is an underspecification, any open standard will be an underspecification. It will be missing out on relevant aspects. It is unlikely to be the data layout; usually it is semantics and the meaning of the data. Between an SAP Business Suite and an underlying Oracle database, who controls the data? It is SAP, because its code realizes the interpretation of the data, not the plain storage. </p>
<p>If some specification is well-intentioned, it will simply not be complete enough. If a specification is ill-intentioned, all it will specify are a format for key/value pairs and leave the interpretation of such data to an application. Reading the principles does not make clear to me how to avoid such intentions. (It is probably not possible nor intended. Players who deliberately play badly will eventually be recognized as such.)</p>
<p>I don&#8217;t know but I&#8217;m assuming that the OCI is trying to address this issue by requiring an open source implementation for handling the data. This is the last bullet item in the definition of open standard. It is debatable whether this gets you around key/value pairs; I can imagine an open source library for handling key/value pairs that stops right where it gets interesting, i.e. the data gets interpreted. But lets assume that the open source library provides decent abstractions, e.g. object-oriented classes, whose implementation truthfully captures the semantics of the underlying domain concept. The principle of underspecification above stipulates that subtle semantics will escape those classes and will be caught by surrounding code interpreting the data. That code is unlikely to be available as open source as it is likely to be competitively differentiating.</p>
<h1>Necessary Extensions</h1>
<p>The second problem is that application providers simply won&#8217;t stop with standardized data types. Have you ever tried to get two business units of some company to agree on the notion of &#8220;customer&#8221;? You won&#8217;t succeed. It is the reason why we have design patterns like <a href="/computer-science/research/2000/plopd-4.html">Role Object</a>. The definition of &#8220;customer&#8221; will differ between different companies and even between different business units of the same company. So you need to provide extension mechanisms and you are back to storage using key/value pairs and/or running client-specific code to properly interpret client-specific extensions.</p>
<p>The principles are well-intentioned and send people on the right road but they not a guarantee that you can take your data from one cloud to another.</p>
<h1>A Pragmatic Response</h1>
<p>It is not that this isn&#8217;t a known problem. Anyone who has worked on standardization efforts has run into this. You may think that the C programming language has been specified a long-time ago and is rock-solid. But that&#8217;s not true, it is still evolving, as the recent ambiguities around the volatile keyword showed. However, long-running standardization efforts do show a pragmatic way forward: Effective standardization is not paperwork, but is effective working groups&#8212;experts and community, debating and documenting specifications, and moving forward, mole-whacking the loopholes and bugs as they keep occurring. It is a never ending effort, but a necessary one.</p>
<p>I&#8217;m missing this notion of working group in the list of requirements for an open standard, but I&#8217;m sure it won&#8217;t take long for them to appear respectively get channeled there.</p>
]]></content:encoded>
			<wfw:commentRss>http://dirkriehle.com/2011/07/29/on-the-open-cloud-principles-every-real-world-specification-is-an-underspecification/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Controlling and Steering Open Source Projects</title>
		<link>http://dirkriehle.com/2011/07/18/controlling-and-steering-open-source-projects/</link>
		<comments>http://dirkriehle.com/2011/07/18/controlling-and-steering-open-source-projects/#comments</comments>
		<pubDate>Mon, 18 Jul 2011 16:51:53 +0000</pubDate>
		<dc:creator>Dirk Riehle</dc:creator>
				<category><![CDATA[Industry]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Publication]]></category>
		<category><![CDATA[Research]]></category>

		<guid isPermaLink="false">http://dirkriehle.com/?p=2286</guid>
		<description><![CDATA[The IEEE just published a short version of the &#8220;control points and steering mechanisms&#8221; article. Here is the abstract. Please see the original for more details. Abstract:&#160;Open source software has become an important part of the software business. In a &#8230; <a href="http://dirkriehle.com/2011/07/18/controlling-and-steering-open-source-projects/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>The IEEE just published a short version of the <a href="/publications/2010/control-points-and-steering-mechanisms-in-open-source-software-projects/">&#8220;control points and steering mechanisms&#8221;</a> article. Here is the abstract. Please see the original for more details.</p>
<p><strong>Abstract:</strong>&nbsp;Open source software has become an important part of the software business. <a href="http://www.forrester.com/rb/Research/open_source_software_goes_mainstream/q/id/54205/t/2">In a 2009 survey, Forrester Research</a> found that 46 percent of all responding enterprises were using or implementing open source software. Moreover, <a href="http://www.gartner.com/DisplayDocument?id=1359127">in 2009, the Gartner Group estimated</a> that by 2012, at least 80 percent of all software product firms will use open source software. Thus, it’s important to understand how software product firms depend on open source and how they manage that dependency to meet their business goals. There are three main types of software product firms. [...]</p>
<p><span id="more-2286"></span></p>
<p><strong>Keywords:</strong> Open source, open source projects, single-vendor open source, community open source, commercial open source, open source business models.</p>
<p><strong>Reference:</strong> Dirk Riehle. &#8220;Controlling and Steering Open Source Projects.&#8221; <em>IEEE Computer</em> vol. 44, no. 7 (July 2011). Page 93-96.</p>
<p>The paper is available as a <a href="/wp-content/uploads/2011/07/Controlling-and-Steering-Open-Source-Projects-Preprint-r7iab2.pdf">PDF file</a> and in the aforementioned longer <a href="/publications/2010/control-points-and-steering-mechanisms-in-open-source-software-projects/">web-based version</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://dirkriehle.com/2011/07/18/controlling-and-steering-open-source-projects/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

