The following is a post and a response to myself that I made in the Radio Userland Discussion Board about doulbe decoding in Radio's Aggregator. When pulling in an RSS feed, decoding for some entities is done when xml.compile breaks up the XML into a table internal to Radio and then again when xml.rss.decodeString calls xml.decodeEntity. The most glaring symptom of this is when a post in the RSS feed contains markup examples. These would be encoded with an amp entity followed by lt;,for exmaple, so that when the amp entity is decoded the result is an lt entity that ought to be displayed as a left angle bracket in the browser. Instead, the lt entity is caugt in the second decode and turns the code example for diplay into actuall markup, and likely messes up the whole page.
I can try to fix this by reaching into radio.root and changing things, but that will only last until the next update.
===
While from the comments in the script it looks like the additional decoding of entities in radio.html.viewNewsItems was removed on 5/31/02, it seems that there is still double decoding going on somehwere in xml.rss.compileService or something that it calls. I am noting this specifically with the feed for Dive into Mark which has had a lot of code examples lately.
The compilation.xmlstruct for a service looks like it has one pass of decoding already done to it (with the exception of "quot" entities) when compared to the actual RSS that has been compiled as well as to xmltext for the service. Does xml.compile, the only call between xmltext and xmlstruct, do any decoding of it's own? If so, compileService needs a decoder that will take cate of the quot but not lt, gt, and amp. If not, then I have no idea what could be doing this decoding.
===
Now that I've had time to try it out, I've that that xml.compile does in fact decode lt, gt, and amp entites (but not quot entities). I also made a feed, posted something with quoted entities for displayed markup, and then subscribed to myself to demonstrate that the problem is in fact with Radio's aggregator.
When making the outgoing RSS Radio sensibly runs the contents through xml.encode. However, when the aggregator pulls an incoming RSS feed it calls
1) xml.compile, which decodes any lt, gt, and amp entities, and then
2) xml.decode, which will then procede to decode any entities that now exist by virtue of amp entities having just been decoded.
This means that what originally were entities for diaply have been turned into markup by Radio's aggregator, resulting in Bad Things. Fixing this would require supplying xml.rss.compileService with a decode function that does NOT decode lt, gt, and amp functions since the call the xml.compile has already done this. The decode function would still need to decode quot entities.
1:26:06 AM Categories: Radio
|