Wiki Creole Grammar, Schema, and Transformations

Please find below an EBNF grammar, XML schema, and XSLT transformations for Wiki Creole, currently in version 1.0. Wiki Creole is the first (and only) community standard for wiki markup.

These specifications were taken from the following two technical reports:

Wiki Creole EBNF grammar

Wiki Creole XML schema definition

Wiki Creole to XHTML transformation

XML to Wiki Creole transformation

It is pretty likely that despite our test suites, some bugs remain. So, if you find some, please let us know!

The best way to get in contact with us is to address Martin Junghans through the SourceForge WikiCreole project and cc: Dirk Riehle.

21 thoughts on “Wiki Creole Grammar, Schema, and Transformations”

  1. The EBNF links are broken: From the first one “wiki-creole/” must be removed, from the latter the quotation marks at the end.

  2. Generating code using ANTLRWorks 1.1.7 results in errors in rule text_bolditalcontent.

    I am a absolute novice on the subject and appreciate your help.

  3. @Martien
    ANTLRWorks does not always show the same behavior as ANTLR does. I couldn’t use ANTLRWorks for development as the grammar size magnified. Further you’ll probably have to scale up the size of heap memory for code generation, e.g. use the JVM option -Xmx1024M.

  4. Im trying to implement wikicreole using ocamlyacc or menhir (an LR1 parser for OCaml). But Wikicreole’s EBNF is using some ANTLR specific features (I think only this input.LA thing) that makes difficult to adapt to another parser generator … Does somebody have an idea about that?

  5. V, first you have to consider that ANTLR generates LL(*) parsers, in contrast to your LR(1).
    We had to use the explicit input.LA statements since ANTLR chooses a derivation too early, i.e. even if other derivations are not wrong in respect to the current look-ahead. It does not increase the look-ahead as long as necessary to identify the only applicable derivation. Thus using the wrong derivation will result in an exception, even though the grammar is correct.

    The _grammar_ does not need these explicit queries to the look-ahead. But the _ANTLR grammar specification_ seems to need it auxiliary.
    If I recall LR(1)-parsers correctly, you can’t rely on the grammar specification for your parser generator only, because the WikiCreole language specification needs more look-ahead than 1.
    Nevertheless, good luck for your work!

  6. Hi guys. I’m interested in using this XML format as my intermediate (stored) stage. However, it seems that I’ll have to write the code (Java) to translate from Creole to ‘X-Creole’. I realise there’s an XSD, but would some of you have some example XML files for me to test with? Thanks!

  7. Hi Greg, you are right. We do rely on some code to transform a representation of the wiki after parsing to the XML representation. I’m not sure whether you can transform the parse tree (or AST) to XML by ANTLR’s domain specific language. We implemented a Java DFS parse tree walker that generates the XML file simultaneously. Regards, and I mail some files to you.

  8. Greg, Martin: I had been wondering about this for a while—we knew that the tree walker is missing from our publications, but then you can’t really publish some Java code.

    What you can do, of course, is create an open source project. So for that reason I created a SourceForge project called wikicreole a while back. Is there any interest on writing Tree Walker code for that project?

  9. Hi,
    I am very interested in your work. I went to the sourceforge project and I saw no file was committed. Is there anywhere I could get the
    Tree Walker for the ANTLR grammar that produces the XML.
    Regards

  10. Hello all,

    I had one question, might sound trivial to you. I would like to know whats the different between “with_extension” and “without_extension” in above files..?

    Second, Martin you had mailed (see above comments) Greg some XML documents he asked for, can you mail me the same, please?

    Thanks!

  11. The difference between with_extension and without_extension is that the latter is a grammar (and related files) for the plain original Wiki Creole while with_extension is prepared by way of an additional token to be extended with new syntactic elements. So if all you want is what Wiki Creole 1.0 can give you, use without_extension, if you are thinking about adding your own new syntactic elements try using with_extension. The differences are minimal, just the hook for the extended syntax.

    As to the files, I’ll ask Martin, I think we might have to clean up this page; maybe you can already find what you are looking for at sf.net/projects/wikicreole?

  12. Hey people, great work..

    .. but:
    I’ve a problem with ANTLR and this grammar. When I try to generate the code via ANTLR 2.7.7 there’s always a nondeterminism between a huge amount of rules..
    which lookahead (k) should i take to wipe these out, but keep the efficiency high?

    thanks!
    luke

  13. ok.. there’s n other thing i wanted to know..

    when i look past those warnings and try to compile the .java files in the directory, i get 13 errors, every single one about:

    Cannot find symbol
    Symbol: input

    What do I have to do to avoid this one?
    Why doesn’t
    input.LA(1)==DASH
    or sth. work?

    thanks
    luke

  14. Processing the ‘without extensions’ .g file,
    using antlrworks 1.2.3 I’m getting 5 errors and 180 warnings.

    that’s from the command line.

    Has something changed?

    TIA DaveP

  15. Hi Dave,
    we recommend not to use ANTLRworks for generating the scanner/parser. I do not remember why, but using ANTLR from command line is the proper way to go. Maybe you will also have to increase the size of memory for generation.

    Cheers, Martin

  16. I’m having trouble compiling this with ANTLR 3.2 (from the command line as recommended with increased heap size).

    The repeated error I’m seeing (in addition to about a hundred warnings) is “error(201): creole10.g:61:2: The following alternatives can never be matched: 2”.

    Is there a previous verson of ANTLR that will compile this without errors? Or am I barking up a dead tree?

Leave a Reply to marko Cancel reply