Thoughts on the Google "Personal Search Engine" Option
Its so interesting how ideas come around again in high tech. The current notion that what might save Google is a local version to integrate Internet searching with desktop searching has been around. Its been around so much that I'd call it promiscous. First, lets talk about the vendors that have tried this:
- Dataware (to some extent)
- Seymore Rubenstein's flopped company (can't remember the name). Historical note -- he started Wordstar and then the spreadsheet company bought out by Borland
- Several newer, Internet metasearch companies, have tried this
So the first thing to note is that this isn't a new idea -- at all. Everyone who has tried it has had some teeny tiny, minimal level of success -- if you can even call it that. I'd call them total flops. I don't think most of these products are even available anymore -- certainly Verity abandoned the desktop, Fulcrum was acquired, etc.
So, the obvious question is why? Here are some reasons why this is just a lousy business as well as some technical issues that nail everyone badly:
- File Format Support. Indexing the net really means HTML, PDF, maybe .DOC. For the desktop you need:
- Outlook PST (and that's just a damn nightmare)
- Project (icky)
- Outlook Express
- Other vendors than Microsoft
- MP3 id tags
- All these file formats are difficult at best. Most are undocumented or incorrect (i.e. Microsoft Word's format documentation is wrong).
- Microsoft includes free indexing that everyone just has to turn on w/ every copy of Windows 2K pro and higher. True no one does turn it on and it blows little green chunks when you use it, but getting a large corporate sale is going to run into this argument from IT ("why don't we just ...").
- Its a low priced product at best.
- Running an indexer in the background is annoying and time consuming (even though machines are dramatically faster, remember that the data volumes are much more).
- Index files are large -- they're a %age of the overall content volume. Lets say you have a 20 gig drive of which 2 gigs are programs and 8 gigs are documents. Your index file could be as large as 4 gigs if a 50% ratio is assumed (it could also be tiny since some formats are bloated).
Another issue to consider is that Google may not actually have the technical chops to do this on the desktop -- remember their unique ranking isn't designed for this. There aren't the hyperlinks to analyze. Additionally, their multi-cpu approach doesn't exist on the desktop.
Finally, always remember, we are currently getting along without this right now. We all may "need it" but we get by without it. This is one of the things that make it a low priced product.
If Google really wanted to do this and asked me for a recommendation then I'd say perhaps take a look at an Outlook indexer only. Constrain the problem, solve a small bit first and see if people will really pay for it.
NOTE: Everyone should also keep in mind that Microsoft is currently publishing some of the best papers in search right now. Check out http://research.microsoft.com/