Shun Spam
Don't "Assassinate" Spam — just
shun it.
Paul Graham has
a plan that seems to work pretty well for him.
A few months ago,
Propel was hiring for
someone to implement a spam-shunning service that involved
decoy email addresses. I wonder how well that effort is working.
I wonder how well Bayesian content-based filtering,
like Graham uses to separate his Spam,
would work in a
"whitelist" sort of sense. His algorithm uses a corpus of
messages where he has separated the Wheat from the Chaff (i.e., the Spam),
so to speak. For how many different categories of "Wheat" could this work?
For example, I have interests in Functional Programming,
in the Python programming language,
in music, in mountain biking, and in hiking and backpacking. If I choose to
set those six categories as separate points of interest in the world of
messages that I receive, how well would Graham's techniques scale
to these "six degrees of separation" :-) instead of his two degrees?
Would be a fun project to study.
My eventual goal is to be able to treat weblogs (and/or RSS feeds)
and mailing lists the same way that
MT-Newswatcher treats newsgroups. But weblog
entries aren't pre-categorized like newsgroups are. Could these
personal, adaptive Bayesian filters eventually grow to help me
automate the work of doing that categorization?
The key is to get them to grow into that role without
me (or other potential users) having to do a whole lot of extra
categorization work.
9:27:09 PM