Recent talk about RSS feed splicing and the ineluctable need for filtering open feeds got me thinking about the variety of operations one might want to perform on feeds.
Taking a cue from the operations of set theory we could for instance define the following:
Splicing (union): I want feed C to be the result of merging feeds A and B.
Intersecting: Given primary feeds A and B, I want feed C to consist of all items that appear in both primary feeds.
Subtracting (difference): I want to remove from feed A all of the items that also appear in feed B. Put the result in feed C.
Splitting (subset selection): I want to split feed D into feeds D1 and D2, according to some binary selection criterion on items.
The ultimate RSS bricolage tool would give users an interface to derive feeds from other feeds using the above operations, and spit out a working URL for the resulting feed.
I'm not sure how all of it would work, or even if all of it can work in practice. I'm completely abstracting out technical considerations here. While I'm not sure how large the space of useful applications of this could be, here are a couple example uses:
Splicing: All of the posts on the Many-to-many blog have to do with social software, so it would make sense to send its posts over to the social software channel. Now, since the blogging tool we use for that blog doesn't support TrackBack, it can't automatically ping the Topic Exchange. A workaround would be to merge both channels into a new one. In general, this would enable any combination of category feeds from various sources to be constructed very simply. A feed splicer can also serve as a poor man's aggregator.
Intersecting: Say I want to subscribe to all of Mark's posts that make the Blogdex Top 40; I'd just have to intersect the feeds. Or I could filter a Waypath keyword search feed in the same manner.
Subtracting: I'm interested in some topic that has an open channel, but find the items by one particular author uninteresting. (This is equivalent to the killfile idea from good ol' USENET.) Subtraction could also be used if you don't want to see your own contributions to a feed.
Splitting: One might want to manually split a feed into "good" and "bad" subfeeds according to a subjective assessment of quality or relevance, or automatically split according to language, author, etc. Note that this one doesn't qualify as an example of pure feed algebra, as it involves inputs beyond feeds.
The recall engine is quite neat; it generates graphs à la Buzz Maker, but on a longer timescale (the archive started back in 1996). Not sure how accurate these graphs are, though. To the right you can see the graph for the query "all your base".
And this one on the left is the result for the query "weblog". Makes sense. But the additional graphs of (I presume) co-occurent terms that appear on the results pages seem a little odd. Then again, I haven't been eyeing the net constantly since 1996, as these guys' robots have.
It's also interesting to search for the names of folks who have been active on the Net for a while and see the associated topics that show up in the right sidebar; tryit.
The Information Today article ends thus:
The Internet Archive relies on corporate donations, government and foundation grants, and donations from generous and talented individuals. It represents one of the great success stories on the Love side of the “For Love or Money” saga of the Internet. With the addition of an effective search engine, it also represents a site that serious Internet searchers should carry high on their lists of Favorites or Bookmarks.