Y. B. Normal
Ziv Caspi can't keep his mouth shut.
Click here to visit the Radio UserLand website. Subscribe to "Y. B. Normal" in Radio UserLand. Click to see the XML version of this web page. Click here to send an email to the editor of this weblog. blogchalk: Ziv/Male/31-35. Lives in Israel/Tel Aviv/Central and speaks Hebrew. Spends 20% of daytime online. Uses a Normal (56k) connection.  
Updated: 2002-09-22; 2:33:31 PM.
 

Tuesday, August 13, 2002
Painfully Parsing RSS 3:51:07 PM • comment []Google It!

Mark Pilgrim writes (and rants) of the pain that is parsing RSS.

For some reason, even people well-aware of XML, HTML and the gap that's between them don't bother to check that their RSS feeds are actually well-formed XML. People leave stray '&' around; people include HTML entities (such as ") although XML only has five built-in entities; they do other mistakes.

This is, in fact, one of the first changes I made in Aggie (and perhaps the largest) -- I added a "massage" stage before loading each RSS feed into the .NET XML parser to perform partial HTML entity decoding. I can tell you that was a pain to debug.

Then again, perhaps that's not people's fault? A lot of this pain would have been eliminated had XML supported HTML entities, and could have handled stray ampersands. Not to mention our favorite subject of encoding. If a person like Dave Winer has an RSS feed that MSXML/IE refuses to display (apparently, in his current feed lacks an encoding declaration, which means that the parser assumes UTF-8; some characters in the feed itself are not UTF-8), perhaps the tools need to be modified, not people.

Ignore 12:29:25 AM • comment []Google It!

Radio is going south on me again. Please ignore this test.

© Copyright 2002 Ziv Caspi.

 
August 2002
Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31
Jul   Sep


About
FOAF
RSS and News Aggregators
Radio & Friends
Blogging
Daily
Monthly
Search


miniXmlCoffeeMug.gif miniXmlButton.gif BillSaysThis