The One Publisher to Boycott @ReedElsevierHQ

If there is one for-profit publisher to boycott, it is Elsevier. Here is the proof. My university, the Friedrich-Alexander-University Erlangen-Nürnberg, just published a list of the most expensive journals it is subscribed to. 19 out of 20 are Elsevier journals (page in German). My university’s library is in a negotiation stale-mate with Elsevier, which is not budging on the price of these journals. This is for research papers delivered to Elsevier for free, reviewed and edited for free, all by the scientific community.

I ask you not to submit your research work to Elsevier journals. I ask you not even to cite papers from Elsevier journals unless you absolutely have to. Please. In the name of science, scientific freedom, and equal access for all to research publications.

Thank you for fighting the good fight!

elsevier

Publishers, E-Books, and DRM

2012-02-18: Updated the post with translations from the original letter.

I’m an Addison-Wesley author and just received a letter from Pearson, the owner of Addison-Wesley, informing me about their thoughts and steps towards e-books and the digital age. The letter is written as an open letter with no apparent secrets, so I’m making it available here for anyone interested to read and to comment on it.

In general, I have sympathies with companies trying to sustain their revenue streams. I do expect them, however, to understand that change is inevitable and to flexibly react to and to lead that change for their customers’ sake and not just their shareholders’ sake. As an author, I’m naturally in a similar or at least related situation.

The PDF is marked up with numbers. The following list relates to what the (German) letter says on the respective issues:

Continue reading “Publishers, E-Books, and DRM”

Call for Papers: WikiSym 2012

8th International Symposium on Wikis and Open Collaboration

August 27-29, 2012 | Linz, Austria

The International Symposium on Wikis and Open Collaboration (WikiSym) is the premier conference on open collaboration and related technologies. In 2012, WikiSym celebrates its 8th year of scholarly, technical and community innovation in Linz, Austria.  We are excited this year to be collocated with Ars Electronica, the premier digital art and science meeting that attracts over 35,000 attendees per year.

Submissions are invited for the following categories:

Continue reading “Call for Papers: WikiSym 2012”

Design and Implementation of the Sweble Wikitext Parser: Unlocking the Structured Data of Wikipedia

Abstract: The heart of each wiki, including Wikipedia, is its content. Most machine processing starts and ends with this content. At present, such processing is limited, because most wiki engines today cannot provide a complete and precise representation of the wiki’s content. They can only generate HTML. The main reason is the lack of well-defined parsers that can handle the complexity of modern wiki markup. This applies to MediaWiki, the software running Wikipedia, and most other wiki engines. This paper shows why it has been so difficult to develop comprehensive parsers for wiki markup. It presents the design and implementation of a parser for Wikitext, the wiki markup language of MediaWiki. We use parsing expression grammars where most parsers used no grammars or grammars poorly suited to the task. Using this parser it is possible to directly and precisely query the structured data within wikis, including Wikipedia. The parser is available as open source from http://sweble.org.

Keywords: Wiki, Wikipedia, Wiki Parser, Wikitext Parser, Parsing Expression Grammar, PEG, Abstract Syntax Tree, AST, WYSIWYG, Sweble.

Reference: Hannes Dohrn and Dirk Riehle. “Design and Implementation of the Sweble Wikitext Parser: Unlocking the Structured Data of Wikipedia.” In Proceedings of the 7th International Symposium on Wikis and Open Collaboration (WikiSym 2011). ACM Press, 2011. Page 72-81.

The paper is available as a PDF file (preprint).

Technical Report on WOM: An Object Model for Wikitext

Abstract: Wikipedia is a rich encyclopedia that is not only of great use to its contributors and readers but also to researchers and providers of third party software around Wikipedia. However, Wikipedia’s content is only available as Wikitext, the markup language in which articles on Wikipedia are written, and whoever needs to access the content of an article has to implement their own parser or has to use one of the available parser solutions. Unfortunately, those parsers which convert Wikitext into a high-level representation like an abstract syntax tree (AST) define their own format for storing and providing access to this data structure. Further, the semantics of Wikitext are only defined implicitly in the MediaWiki software itself. This situation makes it difficult to reason about the semantic content of an article or exchange and modify articles in a standardized and machine-accessible way. To remedy this situation we propose a markup language, called XWML, in which articles can be stored and an object model, called WOM, that defines how the contents of an article can be read and modified.

Keywords: Wiki, Wikipedia, Wikitext, Wikitext Parser, Open Source, Sweble, Mediawiki, Mediawiki Parser, XWML, HTML, WOM

Reference: Hannes Dohrn and Dirk Riehle. WOM: An Object Model for Wikitext. University of Erlangen, Technical Report CS-2011-05 (July 2011).

The technical report is available as a PDF file.

Teaching Note for Case "User-Generated Content Systems at Intuit(A)" E-381(A)

Abstract: This is a teaching note for the free case “User-Generated Content Systems at Intuit(A)”, E-381(A), from the Stanford Free Case collection available at ECCH. The original case is a product management case in which Intuit, maker of consumer and small business financial software, faces the decision to “go social or not” for user help in its tax preparation software. The original case discusses the pros and cons of such a disruptive innovation. This teaching note provides pertinent questions to ask your students as well as my summary answers to these questions. I could not find an original teaching note hence I wrote this one. This is my first such note so any suggestions for improvement are welcome. The note is licensed CC-BY-SA 3.0; feel free to use it in your own teaching. The note’s home is my website. For attribution, please link to it.

Continue reading “Teaching Note for Case "User-Generated Content Systems at Intuit(A)" E-381(A)”

The Java IP Story

Every year, I teach the AMOS class, a lab course on “Agile Methods and Open Source” that combines lectures with a real software project that ideally turns into a startup (see the AMOS Project concept, in German). To explain open source, I have to introduce students to intellectual property rights, of which most have been blissfully unaware of until then. Nothing teaches concepts better than a colorful story, and so I have been using the IP strategies around Java to make this dry topic come alive. For fun, comments, and corrections, I’m providing the short version of my talk below, including commentary. (You can also download a PDF version of the talk, licensed as CC-BY 3.0. If you find this useful for teaching, please tell me.) Students at this point have a basic working understanding of intellectual property and exclusion rights. Please let me know what you think! Finally, IANAL.

Java is an important technology powering the modern web and in particular enterprise applications. It has a checkered intellectual property history, and with the recent acquisition of Sun, the Java creator and owner, by Oracle, things only stand to heat up. This slide set discusses some of the more interesting issues around Java intellectual property and its strategic use in business.

  1. What is Java?
  2. Short Java IP Story Time-Line
  3. Three Substories
  4. Java’s Challenge to the Windows Platform
  5. Microsoft and Java
  6. The OpenJDK Strategy (Open Core Model)
  7. Certification of Compatible Implementations
  8. Threats to Commercial Revenue
  9. Main Tools to Curtail “Competitors”
  10. Problems for Alternative Implementations
  11. Problems for OpenJDK Forks
  12. Thank you! and References

Continue reading “The Java IP Story”

The Open Source Big Bang

Open source is not only software, but also an approach to software development. The public nature of open source projects lets us show how open source software development scales to the largest project sizes. The following figure illustrates the scalability of open source software development. I call it the big bang of open source.

Continue reading “The Open Source Big Bang”

The Parser that Cracked the MediaWiki Code

I am happy to announce that we finally open sourced the Sweble Wikitext parser. You can find the announcement on the OSR Group blog or directly on the Sweble project site. This is the work of Hannes Dohrn, my first Ph.D. student, who I hired in 2009 to implement a Wikitext parser.

So what about this “cracking the MediaWiki code”?

Wikipedia aims to bring the (encyclopedic) knowledge of the world to all of us, for free. While already ten years old, the Wikipedia community is just getting started, and we have barely seen the tip of the iceberg, there is so much more to come. All that wonderful content is being written by volunteers using a (seemingly) simple language called Wikitext (the stuff you type in once you click on edit). Until today, Wikitext had been poorly defined.

Continue reading “The Parser that Cracked the MediaWiki Code”

Open Commons Region Linz is Starting

The region of and around Linz, Austria, has declared itself the Open Commons Region Linz. The opening festivities, including talks, free-of-charge, will take place on April 11th, 2011, in Linz (naturally). Read more about it on the blog of the Open Commons Region Linz! I’m a member of the academic advisory council of the Open Commons Region Linz and applaud and support the effort. I’m also happy to say that it will me bring to Linz in person once in a while.