chris blogs

April 2004

20apr2004 · A file format for integrating non-XML data in XML workflows

Basing the data of a CMS or documentation initiative on XML isn’t a bad idea, as there are many useful tools around to handle it. From a single XML source, the documents can be converted to (X)HTML, PDF using XML-FO, TeX using TeXML and many more.

However, XML is inconvenient to enter, even with advanced tools like Emacs and its nxml-mode.

But there is a solution: Most documents are only simple structured, and there are many ways of marking up them using plain ASCII, for example

  • Markdown
  • Textile
  • Wiki-like syntaxes
  • selfmade formats

Ruby, my favorite implementation language (XSLT excluded, which has many advantages in XML processing), supports all of these: Markdown due BlueCloth, Textile using RedCloth, RDoc is Wiki-like and selfmade formats can be implemented easily because of powerful string processing features.

Therefore, I propose a RFC2822 like format to turn all these formats into XML easily and support metadata. Here a document with some features:

Title: A sample document
Author: Christian Neukirchen <>
Date: Tue, 20 Apr 2004 13:54:54 +0200
X-Comment: make better version

This is some example document of
marking up non XML data...

This will get transformed to something like this:

<document xmlns="\..." xmlns:dc="\...">
    <dc:title>A sample document</dc:title>
    <dc:creator>Christian Neukirchen
    <comment>make better version</comment>
  <body xmlns="\...">
    <p>This is some example document of
    marking up non XML data...</p>

… or something like that. (Don’t take the format for granted, but it will be something in that style.) By default all backends will create simple XHTML which easily can be transformed to DocBook without big problems. The *Cloth allow embedding of arbitrary tags, so all features of the XML workflow can be used (for example, generation of a ToC, insertion of automatically generated data etc.).

More to follow…

Copyright © 2004–2016