Data-mining or finding the right information (and only the right information) is the issue.
Sebastian Fiedler has initiated discussion around a new RSS prototyping tool. He discusses the idea here.
My contribution could not that of creating new code. Frankly, my coding vocabulary and thinking style is undeveloped--and, shame on me, I'm not even sure that getting better in that arena would be the best investment of my interests and skills as we struggle for collaborative uplift. Thus, I suggest another form of contribution: a sketch of the specs of a [Tower of] Babel buster.
My focus will be on the questions, "How do I get information the perfectly suits my inquiry and, how do I categorize a submission so that those who enter the category information will always find it ?". The general answer, "Tell me what category to search in(to place your submission in), as specifically as you possibly can" is the rub. What category do we use? Shouldn't we use the same one? If we have a category system that suits 98% of all situtations, general or specific, shouldn't we all be using it?
Because of this problem we , or at least I, have trouble mining personal writings for platforms for further growth. We have even more difficulty communicating with others (live or asynchronously through their submitted entries. Lack of compatiblity of thought systems, grammar, and categories of thought. These already large troubles are exploded when we try to work across cultural and linquistic divides. Enough, I hope, of my sketch of the general background.
You see, now, that I believe that a good category system would help in personal data-mining and would help exponentially more as we move to data mining/submission which is used by more than one person, or members of more than one culture.
At this point in my klogging life I use Matt Mower's liveTopics categorization system. I am going to suggest that we each adopt (after it's developed) an AI assistant--labelled the Universal Category Robot.
What's a Universal Category Robot? Some initial notes:
An AI software system on your own machine. It takes you through a structured interview (like Via Voice-- the interviews will get shorter the decisions of the software faster as your profile is developed) re an entry you wish to make, a weblog item you wish to publish. It will then affix a universal category designator to the item you submit... to the RSS version of a weblog-- the category space would be filled by the software. Ultimately it would be able to make category decisions from phrase analysis of the entry itself and from cross-checking its judgements against the cognitive/categorical map of thoughts you have expressed or noted in the past.If you are searching your own material for all X that deal with [ here you cast about a bit, remember your assistant -- it has perhaps prodded you via an officious but not too annoying pop-up at this point] such and such a phenomena. The robot's interview produces the exact Universal Category tag which then becomes the parameters of your into your search. As a result all material that has been placed in the "UCS" space would be searched and findings passed along to you. The process would be more complex but conceptually similar if you were searching all rss space or all web space.
Extension: Crawlers and Spiders are let loose and, where expresslly allowed, will interview and/or use context clues to categorize rss items that have not been categorized by the author. The tag is added to the item; the author is notified and the body of cleanly classified entries expands.
Mind you, I'm not fixated on this as a panacea. But to have aid in classifying what we've written and what it is we're searching for may well add measurably to the utility and accessibility of what we publish as well as to the quality and utility of our web searches.
Such a system should be applicable in any culture, any language.
More, depending on response and /or follow-up issues.