In the August 19 issue of The Scientist, Mignon Fogarty and Christine Bahls survey a range of methods for coping with information overload (in the science literature). "Better search engines, free journal access, proprietary databases and E-mail alerts are all helping scientists get what they want. But some worry that they are not getting all they need." [FOS News]
A vendor to a large pharmaceutical company says that the firm wasted almost two years trying to isolate a compound, not realizing that fellow colleagues had already obtained a patent for it. University of Minnesota researchers, as many others do, discovered after three years of research that results they were writing up had already been published.
Information overload is the central knowledge management issue in the scientific sphere. Hundreds of thousands of scientists push out millions of scientific articles every year. For any topic there's no lack of relevant materials. The problem is how to find them, and even if you can do that, to properly select among them the ones you should read.
The article rightly points out that the need for "secondary" (i.e. review or roadmap) content is increasingly acute. And indeed, organizations have sprung up to fulfill that need. But most of them it seems are private companies, and as a result the information is not freely accessible. The consequence is that small players with little funding are locked out.
Really, who is best positioned to filter the literature? I'd say the scientists themselves know their immediate turf better than third parties. An organized community of scientists who each openly chart out their little corner of knowledge (say, using a weblog) could provide very useful (and free) secondary material.
[...] murmurings are surfacing that online searching is stifling serendipitous discovery. Keyword searching has serious limitations, many agree. "We should be focusing on tools for discovery, which include search engines and alerting services that go well beyond keyword searching,"
Another reason for scientists to jump in. Human beings generate serendipity in a way that machines cannot.
A company called Collexis, based in the Netherlands, uses a system not dependent on keyword searching, but on concept searching. [...] each concept--regardless of whether it is a disease, protein, or gene--is assigned a number. Epstein-Barr and human herpes virus 4 would have the same number, as would a gene that is known by one name in the United States, and another name in Spain or France. These concepts are then combined to form a fingerprint, which is sold to users.
Conceptual indexing, with precisely defined concepts, is the holy grail as far as I'm concerned (This is what I've been describing here and there.) But information like this has so much value that it should not be hoarded by a private entity.
What I'm dreaming of will not happen on a large scale unless scientists are rewarded for such hard work. Some may object that reviewing, distilling and mapping out others' findings is not actually doing original work, but I vehemently disagree. New, much-needed knowledge is created in the process. The value in such activity is obvious, and eloquently illustrated for example by this series of comments to Stephen Downes.