![]() |
Wednesday, July 31, 2002 |
More content categorizers.. Information overload. We cope but it isn't getting much better. And sometimes finding what we're looking for is like a needle in a field of haystacks. Or a leaf in forest of trees. Search alone is rarely enough to find what you need in very large data spaces. For example, Google search results and Monster candidate listings often return thousands of close hits. Matching engines efficiently apply criteria to a two-sided search (both employer and worker have demands to be met and supply ways to meet the others' demands). Taxonomies are another approach. Yahoo! and Open Directory show the value of navigating through clumps and clusters of related sites. But you have your own data to mine. And creating a taxonomy by hand is expensive and slow. Enter taxonomy helpers. They do several things:
Here's a roundup on some shipping categorizers. First, I noted Quiver, a tool that recommends topics for human review and approval. Back in April, eContent Magazine wrote a piece on Taxonomy's Role in Content Management.
They mentioned taxonomy vendors:
They also pointed out taxonomy visualization sites.
Now eWeek reviews three more products in this space:
eWeek's overview of the comparison findings is worth reading as is their eVal Scorecard: Content Categorization. Note they used very small record sets, the low thousands. Even a small company will organize hundreds of thousands of records, if not millions. One last note. Standards in this area are few and rarely implemented. These few are RDF (Resource Description Framework), DAML (DARPA Agent Markup Language), and DAML+ OIL (Ontology Inference Layer). Now where should I categorize this post? [a klog apart]12:59:22 PM ![]() |
What She Means Is "Access" Will Cost. Factiva CEO: News Will Cost in Two Years
If Hart is right about this to any degree, then it's going to cause an even bigger rift between publishers/aggregators and libraries. Expect libraries to continue being the hot cyber-battleground as everyone works through the digital rights management & fee-based model versus fair use & information in the "commons" debate. [The Shifted Librarian]12:47:58 PM ![]() |
PHP Class 'AmazonLiteXMLParser' released. Give an XML from Amazon's Web Service, this class parses the XML and creates and array of with product information. [XML News by CodingTheWeb.com] 12:39:16 PM ![]() |
What is a weblog?. Some good news. I've been given permission to republish Meg Hourihan's excellent essay on weblogs. At the time it came out I was getting ready to write something similar, it was the right time for the weblog world to define weblogs, because so many journalists had been trying to do it. Meg did such a great job, and I want to carry more voices through DaveNet, so I asked her, and then her editor at O'Reilly for permission, and this morning they said yes. From there, I want to start an outline about what a weblog is, because there's more to say. Maybe it'll be a three-column table. In column 1, a topic. For example: Fact-checking. In the second column, how centralized journalism does it; and in the third column, how it works in the weblog world. That way, if someone understands how fact-checking works in the print world, they have a basis for understanding how it works when done in the open. Perhaps you see more errors in weblogs, but they can get corrected quickly. I guess the diff is that you can see the process in weblogs. Some people say this is a bad thing, but I think it's good. When I see writing that's too polished, where the grammar is too perfect, I am suspicious that at a deeper level it has been sanitized and dumbed-down. I like getting my news and opinion straight from the source without the middleman. Another row. In column 1, "Research". In column 2, "A reporter spends two weeks interviewing experts, with transcription errors, dumbing-down, etc added." In column 3, "Experts spend a lifetime trying new ideas, learning from their mistakes, and learning how to explain their philosophy. Weblogs let them publish their ideas without intermediaries." [Scripting News]12:33:33 PM ![]() |