GIGO: words unreadable aloud

Shun Spam

Don't "Assassinate" Spam — just shun it. Paul Graham has a plan that seems to work pretty well for him.

A few months ago, Propel was hiring for someone to implement a spam-shunning service that involved decoy email addresses. I wonder how well that effort is working.

I wonder how well Bayesian content-based filtering, like Graham uses to separate his Spam, would work in a "whitelist" sort of sense. His algorithm uses a corpus of messages where he has separated the Wheat from the Chaff (i.e., the Spam), so to speak. For how many different categories of "Wheat" could this work? For example, I have interests in Functional Programming, in the Python programming language, in music, in mountain biking, and in hiking and backpacking. If I choose to set those six categories as separate points of interest in the world of messages that I receive, how well would Graham's techniques scale to these "six degrees of separation" :-) instead of his two degrees? Would be a fun project to study.

My eventual goal is to be able to treat weblogs (and/or RSS feeds) and mailing lists the same way that MT-Newswatcher treats newsgroups. But weblog entries aren't pre-categorized like newsgroups are. Could these personal, adaptive Bayesian filters eventually grow to help me automate the work of doing that categorization? The key is to get them to grow into that role without me (or other potential users) having to do a whole lot of extra categorization work.
9:27:09 PM comment/