Some Progress on Wikipedia Editing

Wikipedia has long been suffering from its rather raw “wiki markup” editing experience. The reason is that the underlying software is stuck in the mud and any progress is slow and painful. Right now there is some excitement over progress on the “visual editor” of Mediawiki. As you can see in the video below the look and feel is 2016, while the functionality is still 1999. How we will catch-up with Google Docs or Medium or any reasonable editing experience this way remains a mystery to me.

Why You Should Not Cite Research Work on Wikipedia That is Not Freely Available

I recommend that Wikipedia articles do not reference research papers that are not freely available, just like research papers should not cite research work that is not freely available. Anyone who cites non-open-access, non-free research bases their work and argument on materials not accessible to the vast majority of people on this planet. By doing so, authors exclude almost everyone else from verifying and critiquing their work. They thereby stop science and progress dead in their tracks.

My advice is that authors need to understand that non-open-access, non-free research articles have not been published, they have been buried behind a paywall. With the vast majority of people not having access to such paid-for materials, any such buried article is not a contribution to the progress of science and should be ignored.

Continue reading “Why You Should Not Cite Research Work on Wikipedia That is Not Freely Available”

Call for Papers: OpenSym 2015, the 11th International Symposium on Open Collaboration

OpenSym 2015, the 11th International Symposium on Open Collaboration

August 19-21, 2015 | San Francisco, California, U.S.A.

http://opensym.org/os2015

About the Conference

The 11th International Symposium on Open Collaboration (OpenSym 2015) is the premier conference on open collaboration research and practice, including free/libre/open source software, open data, IT-driven open innovation research, wikis and related open collaborative media, and Wikipedia and related Wikimedia projects.

OpenSym brings together the different strands of open collaboration research and practice, seeking to create synergies and inspire new collaborations between computer science and information systems researchers, social scientists, legal scholars, and everyone interested in understanding open collaboration and how it is changing the world.

OpenSym 2015 will be held in San Francisco, California, on August 19-21, 2015.

Continue reading “Call for Papers: OpenSym 2015, the 11th International Symposium on Open Collaboration”

World Views Are Not Data Inconsistencies

I’m at Wikimania 2013, listening in on the WikiData session. WikiData is the Wikimedia Foundation’s attempt to go beyond prose in Wikipedia pages and provide a reference data source. An obvious problem is that any such data source needs an underlying model of the world, and that sometimes it is not only hard to gain consensus on that model, sometimes it is impossible. Basically, different world-views are simply incompatible. When asked about this fundamental problem, the audience was told that such inconsistencies are handled using multi-valued properties. Ignoring for a second, that world-views cannot be reduced to individual properties, my major point here is that world-views are not inconsistencies in the data. Different world-views are real and justified, and there will never be only one world view. The moment we all agree on one world-view, we have become the borg.

Update: Daniel Kinzler corrected me that this must be a misunderstanding: WikiData can handle multiple world views well by way of multi-valued properties.

On the Technology Behind the Wikipedia Sexism Debate on “American Women Novelists”

The English Wikipedia is currently embroiled in a debate on sexism (local copy), because of classifying female American novelists as “American Women Novelists” while leaving male American novelists in the more general category “American Novelists”, suggesting a subordinate role of female novelists. I find this debate regrettable for the apparent sexism but also interesting for the technology underlying such changes, which I would like to focus on here.

With technology, I mean bureaucratic practices, conceptual modeling of the world and Wikipedia content, and software tools to support changes to those models.

Continue reading “On the Technology Behind the Wikipedia Sexism Debate on “American Women Novelists””

Design and Implementation of the Sweble Wikitext Parser: Unlocking the Structured Data of Wikipedia

Abstract: The heart of each wiki, including Wikipedia, is its content. Most machine processing starts and ends with this content. At present, such processing is limited, because most wiki engines today cannot provide a complete and precise representation of the wiki’s content. They can only generate HTML. The main reason is the lack of well-defined parsers that can handle the complexity of modern wiki markup. This applies to MediaWiki, the software running Wikipedia, and most other wiki engines. This paper shows why it has been so difficult to develop comprehensive parsers for wiki markup. It presents the design and implementation of a parser for Wikitext, the wiki markup language of MediaWiki. We use parsing expression grammars where most parsers used no grammars or grammars poorly suited to the task. Using this parser it is possible to directly and precisely query the structured data within wikis, including Wikipedia. The parser is available as open source from http://sweble.org.

Keywords: Wiki, Wikipedia, Wiki Parser, Wikitext Parser, Parsing Expression Grammar, PEG, Abstract Syntax Tree, AST, WYSIWYG, Sweble.

Reference: Hannes Dohrn and Dirk Riehle. “Design and Implementation of the Sweble Wikitext Parser: Unlocking the Structured Data of Wikipedia.” In Proceedings of the 7th International Symposium on Wikis and Open Collaboration (WikiSym 2011). ACM Press, 2011. Page 72-81.

The paper is available as a PDF file (preprint).

Technical Report on WOM: An Object Model for Wikitext

Abstract: Wikipedia is a rich encyclopedia that is not only of great use to its contributors and readers but also to researchers and providers of third party software around Wikipedia. However, Wikipedia’s content is only available as Wikitext, the markup language in which articles on Wikipedia are written, and whoever needs to access the content of an article has to implement their own parser or has to use one of the available parser solutions. Unfortunately, those parsers which convert Wikitext into a high-level representation like an abstract syntax tree (AST) define their own format for storing and providing access to this data structure. Further, the semantics of Wikitext are only defined implicitly in the MediaWiki software itself. This situation makes it difficult to reason about the semantic content of an article or exchange and modify articles in a standardized and machine-accessible way. To remedy this situation we propose a markup language, called XWML, in which articles can be stored and an object model, called WOM, that defines how the contents of an article can be read and modified.

Keywords: Wiki, Wikipedia, Wikitext, Wikitext Parser, Open Source, Sweble, Mediawiki, Mediawiki Parser, XWML, HTML, WOM

Reference: Hannes Dohrn and Dirk Riehle. WOM: An Object Model for Wikitext. University of Erlangen, Technical Report CS-2011-05 (July 2011).

The technical report is available as a PDF file.

Learning from Wikipedia: Open Collaboration within Corporations

Wikipedia is the free online encyclopedia that has taken the Internet by storm. It is written and administered solely by volunteers. How exactly did this come about and how does it work? Can it keep working? And maybe more importantly, can you transfer its practices to the workplace to achieve similar levels of dedication and quality of work? In this presentation I describe the structure, processes and governance of Wikipedia and discuss how some of its practices can be transferred to the corporate context.

This presentation represents the next step in the evolution of two Wikimania tutorials/workshops, see Presentations/Tutorials. If the slideshow doesn’t play, please use the PDF file download below.

Reference: Dirk Riehle. “Learning from Wikipedia: Open Collaboration within Corporations.” Invited talk at Talk the Future 2008. Krems, Austria: 2008.

The slides are available as a PDF file.

Bringing Wikipedia to Work: Open Collaboration within Corporations

This upcoming Wikimania 2008 tutorial discusses the three principles of “open collaboration” which I believe are underlying wikis, open source, and other forms of peer production. It is a follow-up to last year’s tutorial about open collaboration at Wikimania 2007.

If the slideshow doesn’t play, please use the PDF file download below.

Reference: Dirk Riehle. “Bringing Wikipedia to Work: Open Collaboration in Corporations.” In Proceedings of Wikimania 2008, forthcoming.

Also available as a PDF file.