Y. B. Normal

Y. B. Normal
Ziv Caspi can't keep his mouth shut.

blogchalk: Ziv/Male/31-35. Lives in Israel/Tel Aviv/Central and speaks Hebrew. Spends 20% of daytime online. Uses a Normal (56k) connection.

Updated: 2002-09-22; 2:33:31 PM.

Tuesday, August 13, 2002

Painfully Parsing RSS 3:51:07 PM • • comment [] • Google It!

Mark Pilgrim writes (and rants) of the pain that is parsing RSS.

For some reason, even people well-aware of XML, HTML and the gap that's between them don't bother to check that their RSS feeds are actually well-formed XML. People leave stray '&' around; people include HTML entities (such as ") although XML only has five built-in entities; they do other mistakes.

This is, in fact, one of the first changes I made in Aggie (and perhaps the largest) -- I added a "massage" stage before loading each RSS feed into the .NET XML parser to perform partial HTML entity decoding. I can tell you that was a pain to debug.

Then again, perhaps that's not people's fault? A lot of this pain would have been eliminated had XML supported HTML entities, and could have handled stray ampersands. Not to mention our favorite subject of encoding. If a person like Dave Winer has an RSS feed that MSXML/IE refuses to display (apparently, in his current feed lacks an encoding declaration, which means that the parser assumes UTF-8; some characters in the feed itself are not UTF-8), perhaps the tools need to be modified, not people.

Ignore 12:29:25 AM • • comment [] • Google It!

Radio is going south on me again. Please ignore this test.

© Copyright 2002 Ziv Caspi.

About

FOAF

RSS and News Aggregators

RSS-DEV (Yahoo Group)

Aggregators (Yahoo Group)

Radio & Friends

Blogging

Daily

Monthly

Software Development Magazine

Search

Amazon via kokogiak

RSS Engine Blog

www.davidwatson.org

Internet Alchemy

Paolo Valdemarin: Paolo's Weblog

Dave Winer: Radio UserLand

It's Like Déjà Vu All Over Again

Bright Eyed Mister Zen

Patrick Logan's Radio Weblog

The Wagner Blog

Ernie the Attorney

Gordon Weakliem's Radio Weblog

Matt Pope's Radio Weblog

MSDN Headlines (via RssDistiller)

Curiouser and curiouser!

System.Error.Emit

Michael Helfrich's Radio Weblog

Rob Fahrni, at the core.

Tommy’s Thoughts

Radio.root Updates

BillSaysThis

chaos is a state of mind

Digital Identity World

Disenchanted's Recent Referers

Ingo Rammer's DotNetCentric

IUnknown.com: John Lam's Weblog on Software Development

Joel on Software

Michael Bernstein's Weblog

Better Living Through Software

Windows Informant

Meerkat: An Open Wire Service

Meerkat: An Open Wire Service

Meerkat: An Open Wire Service

peerfear.org - Kevin A. Burton

Perceive Designs

Python Community Server (development progress)

Peter Drayton's Radio Weblog

Web Voice: internet business models and technical marketing - a blog by Olivier Travers