Thursday, April 17, 2003

Characters slipping away


On Netflix's American Psycho DVD sleeve I noticed 'fiancée' rather than 'fiancé', a character entity escaped, apparently a slipup in publication production. I wonder what part of publication process went wrong?  If we assume their publication tool used HTML as output format, http://www.w3.org/TR/REC-html40/sgml/entities.html says

<!ENTITY eacute CDATA "&#233;" -- latin small letter e with acute,
                                  U+00E9 ISOlat1 -->

as expected.  A simplest explanation is that the output generator botched or it was a 7bit-limited junk. A bit more convoluted conjecture is: One tool in the pipeline probably decided to play it safe and turn & to &amp;.  In case of XSLT pipelining maybe there was an unnecessary disable-output-escaping attribute set in xsl:text element and subsequent tools didn't have any chance to correct the mistake.  And of course, proofreaders and QAs missed the obvious blot.

This kind of glitches carry serious implications in information exchange, retrieval, archiving, etc.  Constant insistance on treating characters as integers will generate aftershocks long after ASCII is dead and buried.  Pity those archaeologists from 25th century.

I hope Radio will publish these funky character entities right...

P.S. I stopped the movie after 20 minutes.


11:02:28 PM  #