Programming Projects

You asked for it! (Ok, maybe you didn't)

Tuesday, February 18, 2003

Personalized Collaborative Ranking

Give me personalized collaborative ranking. This is what I want: resources of all kinds that are filtered and ranked according to people I trust and respect.I assume it is a complicated problem, since I don't have it yet, because I'm for sure not the first person to think about it. But I believe it can be solved, if some capable person can work out the math...[Ming the Mechanic]

Sometimes I think I'm on the same Wavelength as Flemming. And yeah... sometimes that scares me a bit ;-)

Anyway, again he's struck a chord that resonates in my own source code tree something fierce. Those statistical algorithms for matching sets of "interesting" items and weighting them according to individual rating, then pushed and pulled based on both the "trust" level and what Ming refers to as "qualitative judgements of people" who are the sources of that information. (the two ARE orthogonal. For example: I trust my sister implicitly, she wouldn't know a good movie if it bit her in the ass.)

Then there's the idea of recommending across-media. If you and I have movie recommendations that are precisely matched, should that promote our recommendations in other media, or should that be configurable (well, maybe and yes).

Enter The Interest Engine. I've done alot of interface-free work over the last several years in hammering these traits and behaviors into shape. Most of the work I've done has to do with automatic classification of text based on content analysis, but it's precisely the same thing. (In essence the Engine "recommends" text bodies to categories based on lists of key words and phrases common to other 'documents' in that category.)

What I've been doing is working with iTunes track databases (you'd be surprised the kind of stuff people post on p2p networks) and match music recommendations based on collaborative filtering (hate that term.) So far my biggest problem isn't the code, it's the fact that I can't find track & album listings that are close enough to mine (or to each other) to produce meaningful recommendations.

The set math and statistics are just a bit off of trivial once you've set up the abstraction layers right. An important fact is that once those semantics are hammered out, there's no particular binding on the type of information you're dealing with. Bookmarks, songs, movies, books, restaurants, art, events, etc. Everything's fair game.

The problem? Data collection and aggregation. It's not a technical issue at all. But it's a massive buy-in problem. You've got to get people to recommend all these things, to post their interests in some common format (say RSS and I'll smack you. RDF maybe.) I'm perfectly willing to have an automated engine posting my iTunes.xml file, a merge of my bookmark databases, etc.

But all of these other features Ming is talking about: The configurable weighting of data sources based upon arbitrarily complex, personally configurable criteria, is really no big deal.

Anybody have thoughts on the presentation layer? (ugghh, why does it always come down to gui work :-/ )

1:18:38 PM

comment []