Clarifications on RSS 0.94 Proposal
Dave raises some good points which help clarify my understanding of current practice for RSS. In response to his notes (Dave's in italics):
1. Let's just deal with descriptions and not titles.
I agree. It has never been clear to me whether or not titles were intended to be strictly plain text or could have mark-up in them. If it's the former, then there is no need to extend the "type" attribute to cover titles. If titles can contain mark-up, then they should be identical to descriptions, I think.
There is one caveat. That's using the type attribute to specify character set or language choice in addition to the media type. This is important for non-Western character sets and languages. In this case, you can argue that any text that may be viewable by the user should be able to have a type attribute that specifies the character set/language as per RFC 2616. Otherwise, RSS is only for syndicating Western content
An example is:
Of course, an argument against this is that the character set specified in the enclosing XML container defines the character set for all text within titles and descriptions, so maybe it's unnecessary to specify it for individual elements.
I guess the question more properly is "Can RSS documents contain titles and descriptions in more than one language?"
2. There's a temptation to turn RSS into something other than what it is.
Yep. I assumed there may be arguments to cloud elements or other RSS elements that might be non-text formats, hence the perceived need for the encoding attribute. If everything that can appear in a description field is restricted to textual data of some form that can be encoded using XML entity encoding, then there is no need for an encoding attribute.
3. Then the question is, what are the types?
I think the types are "text/*". That is, all registered MIME types that derive from the basic definition of text. Some burden rests with the syndicator to provide a universally readable form if they want the widest distribution. But special purpose syndications (i.e., those aimed at specific clients or devices) shouldn't be precluded from using other text-based mark-ups like RTF, XML, or even something crufty like troff or TeX.
Regarding the question about what "treated as plain text" means, I meant that this is text that is assumed to be free of any stylistic or structural mark-up beyond white space. It can be "printf'ed" to the screen without appearing to have embedded tags or extraneous syntax. No parser required. Once XML entities have been removed by the XML parsing process (e.g., angle bracket and quote conversions), the resulting text string is devoid of any mark-up. (Unlike text/html, text/rtf, text/xml, or any other structured text format.)
4. Example 4 provides a good example for comment 2 above.