Call for Papers: WikiSym 2012

8th International Symposium on Wikis and Open Collaboration

August 27-29, 2012 | Linz, Austria

The International Symposium on Wikis and Open Collaboration (WikiSym) is the premier conference on open collaboration and related technologies. In 2012, WikiSym celebrates its 8th year of scholarly, technical and community innovation in Linz, Austria.  We are excited this year to be collocated with Ars Electronica, the premier digital art and science meeting that attracts over 35,000 attendees per year.

Submissions are invited for the following categories:

Continue reading

Design and Implementation of the Sweble Wikitext Parser: Unlocking the Structured Data of Wikipedia

Abstract: The heart of each wiki, including Wikipedia, is its content. Most machine processing starts and ends with this content. At present, such processing is limited, because most wiki engines today cannot provide a complete and precise representation of the wiki’s content. They can only generate HTML. The main reason is the lack of well-defined parsers that can handle the complexity of modern wiki markup. This applies to MediaWiki, the software running Wikipedia, and most other wiki engines. This paper shows why it has been so difficult to develop comprehensive parsers for wiki markup. It presents the design and implementation of a parser for Wikitext, the wiki markup language of MediaWiki. We use parsing expression grammars where most parsers used no grammars or grammars poorly suited to the task. Using this parser it is possible to directly and precisely query the structured data within wikis, including Wikipedia. The parser is available as open source from http://sweble.org.

Keywords: Wiki, Wikipedia, Wiki Parser, Wikitext Parser, Parsing Expression Grammar, PEG, Abstract Syntax Tree, AST, WYSIWYG, Sweble.

Reference: Hannes Dohrn and Dirk Riehle. “Design and Implementation of the Sweble Wikitext Parser: Unlocking the Structured Data of Wikipedia.” In Proceedings of the 7th International Symposium on Wikis and Open Collaboration (WikiSym 2011). ACM Press, 2011.

The paper is available as a PDF file (preprint).

Technical Report on WOM: An Object Model for Wikitext

Abstract: Wikipedia is a rich encyclopedia that is not only of great use to its contributors and readers but also to researchers and providers of third party software around Wikipedia. However, Wikipedia’s content is only available as Wikitext, the markup language in which articles on Wikipedia are written, and whoever needs to access the content of an article has to implement their own parser or has to use one of the available parser solutions. Unfortunately, those parsers which convert Wikitext into a high-level representation like an abstract syntax tree (AST) define their own format for storing and providing access to this data structure. Further, the semantics of Wikitext are only defined implicitly in the MediaWiki software itself. This situation makes it difficult to reason about the semantic content of an article or exchange and modify articles in a standardized and machine-accessible way. To remedy this situation we propose a markup language, called XWML, in which articles can be stored and an object model, called WOM, that defines how the contents of an article can be read and modified.

Keywords: Wiki, Wikipedia, Wikitext, Wikitext Parser, Open Source, Sweble, Mediawiki, Mediawiki Parser, XWML, HTML, WOM

Reference: Hannes Dohrn and Dirk Riehle. WOM: An Object Model for Wikitext. University of Erlangen, Technical Report CS-2011-05 (July 2011).

The technical report is available as a PDF file.

The Parser that Cracked the MediaWiki Code

I am happy to announce that we finally open sourced the Sweble Wikitext parser. You can find the announcement on the OSR Group blog or directly on the Sweble project site. This is the work of Hannes Dohrn, my first Ph.D. student, who I hired in 2009 to implement a Wikitext parser.

So what about this “cracking the MediaWiki code”?

Wikipedia aims to bring the (encyclopedic) knowledge of the world to all of us, for free. While already ten years old, the Wikipedia community is just getting started, and we have barely seen the tip of the iceberg, there is so much more to come. All that wonderful content is being written by volunteers using a (seemingly) simple language called Wikitext (the stuff you type in once you click on edit). Until today, Wikitext had been poorly defined.

Continue reading

Call for Papers: WikiSym 2011, the 7th International Symposium on Wikis and Open Collaboration

The 7th International Symposium on Wikis and Open Collaboration

October 3-5, 2011 | Mountain View, California

The International Symposium on Wikis and Open Collaboration (WikiSym) is the premier conference on open collaboration and related technologies. In 2011, WikiSym celebrates its 7th year of scholarly, technical and community innovation in Mountain View, California at the Microsoft Research Campus in Silicon Valley.

Submissions are invited for the following categories:

Continue reading

MediaWiki and Commercial Open Source Innovation

You may be surprised to hear that the dominant public Internet wiki engine, MediaWiki, only plays a minor role in the enterprise. Within the corporate firewalls, TWiki, Confluence, DokuWiki, TikiWiki, and others are running the show. Why is that? It is certainly not the lack of commercial customer interest in MediaWiki, which everyone already knows as the software running Wikipedia. It is also not an anti-commercial stance by the creators of MediaWiki (and its effective owner, the Wikimedia Foundation).

Continue reading

Call for Papers: ACM CHIMIT 2010

The ACM CHIMIT 2010 organizers are soliciting submissions for Papers, Short Papers, Panels, Courses, Posters, and presentations of recently published papers in other venues. Please see the submission page for detailed submission instructions on each kind of contribution. I’m on the program committee.

The Paper & Short Paper Deadline is July 3.

ACM CHIMIT ’10

Computer-Human Interaction for Management of Information Technology

November 12-13, 2010, San Jose, CA (co-located with USENIX LISA in San Jose)

Continue reading

WikiSym 2010 Program Announced!

The WikiSym 2010 program has been announced. Keynotes are by Cliff Lampe and Andrew Lih, and the program is full of research talks, workshops, posters, and demos. And, of course, there is a continuous track of open space available for everyone to discuss their wiki and open collaboration interests and issues. Check it out! And see you at WikiSym 2010, July 7-9, in Gdansk, Poland!

My Open Source Research Agenda (as of 2009)

As you may seen in an earlier blog post, I’m starting in a new position as a professor of software engineering focussing on open source software at the University of Erlangen. In this post, I’m laying out my abbreviated research agenda as of September 2009.

The overarching goal of my group’s research is to comprehensively define “the next big” software development method. To that end, we will work to unify agile software development methods with open source software development. Agile methods can cope with changing requirements but don’t scale up well. Open source methods can cope with changing requirements and also scale up well. However, open source remains poorly understood as a development method and practices vary significantly from project to project. Agile methods are increasingly being adopted in the enterprise, but it is open source methods that innovate intra- and inter-company collaboration as well as vendor-customer relationships. Given prior significant research on agile methods, the focus of my group’s work will be on understanding open source methods and practices in both an engineering and a business context.

Continue reading

Professor for Open Source Software at University of Erlangen

After 12 years of working in the high-tech industry, I’m changing gears. I left my prior industry job and am starting today, September 1st, as the “professor for open source software” in the computer science department of the Friedrich Alexander University of Erlangen-Nuremberg in Bavaria, Germany. This is a free (not tied to a chair) full (fully tenured) professorship. I’m looking forward to joining the department and collaborating with my new colleagues at the university, local industry, and beyond.

The professorship is well-funded and I’ll be seeking to hire Ph.D. students right away. For my research plans, please see the upcoming blog post. For now, I’ll let my favorite (ex-)Stanford comic strip do the talking. If you aren’t reading Ph.D. comics yet, check it out.