leah blogs: The Dark Side of Atom

Yesterday antifuchs told me about a problem with the Atom feed of Anarchaia, that now and then includes IRC quotes like this:

#ruby-de

12:18 <ionas_> alles was nicht analog ist ist lossy ;p

12:18 <ionas_> und alles was analog ist geht schnell kaputt ,p

In raw HTML, this looks like that, this code is directly taken from the generated HTML:

<div class="ircquote">
<span class="channel">#ruby-de</span>
<div class="line">12:18 &lt;ionas_&gt;  alles was nicht analog ist ist lossy ;p</div>
<div class="line">12:18 &lt;ionas_&gt;  und alles was analog ist geht schnell kaputt ,p</div>
</div>

In default IRC style, I quote the nickname with < and >, but antifuchs tells me he doesn’t see any nicks when he looks at my blog with Bloglines. Weird, I think, and decide to have a look at it.

Just for fun, I subscribe to my blog in NetNewsWire and I see, …no nicknames! Now, how is my Atom feed generated? The snippet looks about like that:

<entry>
<title>25</title>
<!-- ... --->
<content mode="xml" xmlns="http://www.w3.org/1999/xhtml">
  <div class="ircquote">
  <span class="channel">#ruby-de</span>
  <div class="line">12:18 &lt;ionas_&gt;  alles was nicht analog ist ist lossy ;p</div>
  <div class="line">12:18 &lt;ionas_&gt;  und alles was analog ist geht schnell kaputt ,p</div>
  </div>
</content>
</entry>

And I start to wonder. My Atom feed is perfectly valid, and I just inserted the raw (and valid) XHTML as-is. This should be OK. To quote the Atom specification (emphasis mine):

3) If the value of “type” is “xhtml”, the content of atom:content MUST be a single XHTML div element [XHTML], and SHOULD be suitable for handling as XHTML. The XHTML div element itself MUST NOT be considered part of the content. Atom Processors that display the content MAY use the markup to aid in displaying it. The escaped versions of characters such as “&” and “>” represent those characters, not markup.

Now, apparently, both Bloglines and NetNewsWire somehow pass the XHTML to a rendering engine, in either case my browser respective HTMLKit. And those seem to parse it again, thereby creating the tag <ionas_>. Now, I fixed that by escaping all & in my Atom feeds with &, so now the nick reads &lt;ionas_&gt;. Which is more than ugly and really pisses me off.

When I see such stuff, sometimes I think, RSS really did it better when they just decided to escape the whole stuff and stray their entities all over. That would be consistent, at least.

The civilization of today surely will go down because escaping doesn’t work (and don’t even get me started on encodings, oh my…).

NP: Le Tigre—Phanta

leah blogs

27jun2005 · The Dark Side of Atom