Call for Proposals: 2nd Open Data Dialog

November 18-19, 2013, Berlin

“Open data has the potential to transform society, government and the economy, from how we travel to work to how we decide to vote,” declared Rufus Pollock, co-founder of Open Knowledge Foundation, at the 1st International Open Data Dialog, which took place in December 2012 in Berlin.

With this year’s motto THINK OPEN, THINK BUSINESS the Dialog emphasizes the high potential of Open Data for businesses. The dialog likes to challenge our view that open data is not only a matter for administration, but also for enterprises, NGOs and science. No one will be able to take this step on his own. Administrations, economies, and societies must come together to open up the potential of data.

As in the past year, we invite all free thinkers from industry, civil society, government and research institutes to join the dialog and to share your ideas and projects with other open data enthusiasts. We invite you to give your ideas, approaches or results for example on, but not excluding: Opening, transforming or visualizing data – Data research or journalism – Data to support transparency and participation – Open data platforms and tools – Data-intensive services and applications – Secure integration of open, closed and private data – Business cases and legal settings.

We are seeking proposals for presentations, demonstrations, workshops and tutorials for the 2nd Open Data Dialog, November 18-19, 2013.

Sumission Deadline: July 15, 2013

Read more…

On the Technology Behind the Wikipedia Sexism Debate on “American Women Novelists”

The English Wikipedia is currently embroiled in a debate on sexism (local copy), because of classifying female American novelists as “American Women Novelists” while leaving male American novelists in the more general category “American Novelists”, suggesting a subordinate role of female novelists. I find this debate regrettable for the apparent sexism but also interesting for the technology underlying such changes, which I would like to focus on here.

With technology, I mean bureaucratic practices, conceptual modeling of the world and Wikipedia content, and software tools to support changes to those models.

Continue reading “On the Technology Behind the Wikipedia Sexism Debate on “American Women Novelists””

Call for Participation: OC13 – Open Commons Kongress in Linz, Austria, 2013-05-14

Please consider participating in the Open Commons Kongress, OC13, in Linz, Austria (I’m on the advisory board.) More information below (in German). [DR]

OC13 – Open Commons Kongress

14.05.2013, 9:00 – 16:30 Uhr

Wissensturm Linz, Austria

Lernen und Leben mit digitalen Gemeingütern

Zum zweiten Mal veranstaltet die Johannes Kepler Universität Linz und die Open Commons Region Linz den Open Commons Kongress. Der heurige Titel lautet “OC13: Lernen und Leben mit digitalen Gemeingütern”. Die Veranstaltung findet am Dienstag, 14. Mai im Wissensturm statt.

Read on…

Looking Back on One Year of Public Policy Consulting

2012 was the year when I first did some serious public policy consulting. I found it quite informative to see how politicians work and what the impact of lobbyists is.

I’m a professor of computer science at a German technical university. I also have an M.B.A. from Stanford. I consult on open source, software development, and the software industry. I’m also a civil servant of the state of Bavaria in Germany. Thus, I try to maintain a policy-neutral stance, consulting on mechanism more than on policy. The German people elect politicians, politicians choose policy, and I help politicians choose and define mechanisms that will turn those policies into reality.

Continue reading “Looking Back on One Year of Public Policy Consulting”

Why Open Source is Good for German Software Businesses

I’m on the expert advisory committee of one of the German parties for the current “Internet Enquette”, a commission tasked by the German parliament with suggesting future directions for Germany’s stance toward the Internet and everything digital. At a meeting this evening, a lobbyist confided in me: “Open source is bad for German software vendors!” I gasped. He couldn’t be further from the truth. If this was mechanical engineering or electrical engineering, he’d be right. ME? EE? Germany is top. Software? Not so. Beyond a few selected highlights, Germany is an also-ran internationally. When it comes to software product businesses, German companies would benefit significantly if the dice would be rolled again. Anything that upsets the current order can only be an improvement over the current state of affairs. Open source does just that. More power to open source business models!

Design and Implementation of the Sweble Wikitext Parser: Unlocking the Structured Data of Wikipedia

Abstract: The heart of each wiki, including Wikipedia, is its content. Most machine processing starts and ends with this content. At present, such processing is limited, because most wiki engines today cannot provide a complete and precise representation of the wiki’s content. They can only generate HTML. The main reason is the lack of well-defined parsers that can handle the complexity of modern wiki markup. This applies to MediaWiki, the software running Wikipedia, and most other wiki engines. This paper shows why it has been so difficult to develop comprehensive parsers for wiki markup. It presents the design and implementation of a parser for Wikitext, the wiki markup language of MediaWiki. We use parsing expression grammars where most parsers used no grammars or grammars poorly suited to the task. Using this parser it is possible to directly and precisely query the structured data within wikis, including Wikipedia. The parser is available as open source from http://sweble.org.

Keywords: Wiki, Wikipedia, Wiki Parser, Wikitext Parser, Parsing Expression Grammar, PEG, Abstract Syntax Tree, AST, WYSIWYG, Sweble.

Reference: Hannes Dohrn and Dirk Riehle. “Design and Implementation of the Sweble Wikitext Parser: Unlocking the Structured Data of Wikipedia.” In Proceedings of the 7th International Symposium on Wikis and Open Collaboration (WikiSym 2011). ACM Press, 2011. Page 72-81.

The paper is available as a PDF file (preprint).

Technical Report on WOM: An Object Model for Wikitext

Abstract: Wikipedia is a rich encyclopedia that is not only of great use to its contributors and readers but also to researchers and providers of third party software around Wikipedia. However, Wikipedia’s content is only available as Wikitext, the markup language in which articles on Wikipedia are written, and whoever needs to access the content of an article has to implement their own parser or has to use one of the available parser solutions. Unfortunately, those parsers which convert Wikitext into a high-level representation like an abstract syntax tree (AST) define their own format for storing and providing access to this data structure. Further, the semantics of Wikitext are only defined implicitly in the MediaWiki software itself. This situation makes it difficult to reason about the semantic content of an article or exchange and modify articles in a standardized and machine-accessible way. To remedy this situation we propose a markup language, called XWML, in which articles can be stored and an object model, called WOM, that defines how the contents of an article can be read and modified.

Keywords: Wiki, Wikipedia, Wikitext, Wikitext Parser, Open Source, Sweble, Mediawiki, Mediawiki Parser, XWML, HTML, WOM

Reference: Hannes Dohrn and Dirk Riehle. WOM: An Object Model for Wikitext. University of Erlangen, Technical Report CS-2011-05 (July 2011).

The technical report is available as a PDF file.

The Parser That Cracked The MediaWiki Code

I am happy to announce that we finally open sourced the Sweble Wikitext parser. You can find the announcement on my research group’s blog or directly on the Sweble project site. This is the work of Hannes Dohrn, my first Ph.D. student, who I hired in 2009 to implement a Wikitext parser.

So what about this “cracking the MediaWiki code”?

Wikipedia aims to bring the (encyclopedic) knowledge of the world to all of us, for free. While already ten years old, the Wikipedia community is just getting started, and we have barely seen the tip of the iceberg, there is so much more to come. All that wonderful content is being written by volunteers using a (seemingly) simple language called Wikitext (the stuff you type in once you click on edit). Until today, Wikitext had been poorly defined.

Continue reading “The Parser That Cracked The MediaWiki Code”

Revamping German Copyright Law #EIDG

The German Enquete commission “Internet and Digital Society” is a multilateral commission instituted by the German parliament to discuss and make recommendations on, well, Internet and digital society. I’m a member of an expert advisory council for one of the parties involved in the commission. I received the following catalog of questions and thought I’d share the questions here and maybe we can have a good discussion. For international readers, it may be helpful to read Wikipedia on German copyright law. So, here are the questions.

Continue reading “Revamping German Copyright Law #EIDG”

My Position on Privacy (Seven Things About Me)

Stormy Peters recently tagged me to post seven items about my life. This is a “viral” pyramid scheme; you are supposed to write these seven items and then tag seven other people to do the same. It is not the first time I got such a request; I also got tagged on Facebook to post 25 items about my life, and in general it is quite tempting to let your personal thoughts hang out on a blog like this.

I usually ignore such requests for reasons of privacy. Everything you do or say on the Internet can be used at some future point in time. The saying “on the Internet, nobody knows you are a dog” is completely wrong; on the Internet anyone with enough resources cannot only know you are a dog but can also know everything about you down to hereditary diseases—even things you may not know yourself. Or, as Scott McNealy is famous for saying: “You have no privacy. Get over it.”

Here then seven things about my take at privacy in the Internet age:

Continue reading “My Position on Privacy (Seven Things About Me)”