Dirk Riehle's Industry and Research Publications

WOM: An Object Model for Wikitext [Technical Report]

Abstract: Wikipedia is a rich encyclopedia that is not only of great use to its contributors and readers but also to researchers and providers of third party software around Wikipedia. However, Wikipedia’s content is only available as Wikitext, the markup language in which articles on Wikipedia are written, and whoever needs to access the content of an article has to implement their own parser or has to use one of the available parser solutions. Unfortunately, those parsers which convert Wikitext into a high-level representation like an abstract syntax tree (AST) define their own format for storing and providing access to this data structure. Further, the semantics of Wikitext are only defined implicitly in the MediaWiki software itself. This situation makes it difficult to reason about the semantic content of an article or exchange and modify articles in a standardized and machine-accessible way. To remedy this situation we propose a markup language, called XWML, in which articles can be stored and an object model, called WOM, that defines how the contents of an article can be read and modified.

Keywords: Wiki, Wikipedia, Wikitext, Wikitext Parser, Open Source, Sweble, Mediawiki, Mediawiki Parser, XWML, HTML, WOM

Reference: Hannes Dohrn and Dirk Riehle. WOM: An Object Model for Wikitext. University of Erlangen, Technical Report CS-2011-05 (July 2011).

The technical report is available as a PDF file.

Subscribe!

Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Navigation

Share the content

Share on LinkedIn

Share by email

Share on X (Twitter)

Share on WhatsApp

Featured startups

QDAcity makes collaborative qualitative data analysis fun and easy.

Featured projects

Open data, easy and social
Engineering intelligence unleashed
Open source in products, easy and safe