Friday, May 02, 2003

Hartford Paper Tells Employee to Kill Blog. After losing his column, Denis Horgan decided to set up his own Web page. The Courant had another idea. Update: OJR's Mark Glaser reports on the Courant and dealing with blogophobia. [Hypergene MediaBlog]
9:38:29 PM    

TECH TALK: Constructing the Memex: What’s Missing.

In our daily quest for information, a few years ago in the early days of the Internet, we used to go to Yahoo, navigating through the multiple levels of its directory to reach the site(s) we wanted. As time passed, we started using search engines - first Altavista and Excite, and now Google, which has become our “other memory”. It can, in fact, be thought of as a knowledge operating system”, according to Elwyn Jenkins of Microdoc News:


In general terms, an operating system is a management system. The operating system that runs your computer manages the demands that each of the different programs you are running at the same time, handles your filing system, hard drives, printers and more. Applying the concept of "operating system" to Google, a Knowledge Operating System (KOS) manages your knowledge activity on the Internet. Google, as a KOS, manages your requests for information, indexes your web pages, responds to applications you may be running on your computer that interface to it via the Google APIs, and integrates knowledge and information from millions of computers into a single large managed database.

Website owners and webmaster who build more static websites do not gain the same degree of operating system-ness of Google, as do bloggers who have a closer relationships with Google. I can write a page today, and have my page indexed and readily available for recall in the Google Database within a day.

This is like a massive disk drive directory -- only there is a time lag between when I saved the file and when it is accessible. As Google becomes more adept at sending Googlebot around to collect new pages, this sense of "saving something to disk" will increase, thus making Google not only indispensable for others to find my pages, but also, a great tool for me to locate my own pages.

Already I use Google as a bookmark manager. No longer do I remember URLs - it is much simpler to remember how to obtain a site's listing by remembering a word to locate that site…I go to all my favorite sites with a single word or two-word combination.

What are the benefits of considering Google an operating system? From a user perspective, it places Google in a position of centrality to my tasks. It is where my knowledge is indexed, it is where I locate new knowledge, and it is the system that underlies my writing in Word, preparation of weblogs, and so on.


Yahoo and Google, in some ways, represent the two extremes. Navigating through directories like Yahoo has its limitations. There is a single global directory (or at best, country-level directories). Also, they do not take us to the document - they will leave us at the site's home page. Most of the directories are also not scalable because of their centralisation and manual updation process. In fact, this is what created the opportunity for automatons like Google - the web had simply grown too big.

In relying on Google so extensively now, we are also losing out on something important. Of course, it is reasonably accurate in what we are looking for most of the time. Or at least that is what we think because we have no way to tell. But the results are the same irrespective of who does the search. We do not have an easy way of specifying clusters of documents to search, or a time period. In short, what is missing is a "context" for the search.

Google has centralised search, which is good, because we do need a single place to turn to. But the Web and the people who have built it are much more complex and distributed. Documents and websites have associated people and ideas with them. As search has become narrower and we have focused on Google to provide our results, the wider view of the world which a directory used to offer has been somewhat lost.

What the Web and Google have done is exposed us to the amazing richness and depth of information that is out there. This has only us hungrier for creating a memory which extends our own – and is our own.

Tomorrow: Imagine

[E M E R G I C . o r g]
9:15:50 PM    

TECH TALK: Constructing the Memex: Imagine.

We have our own memory and we have Google as our other memory. (We also have the option of the Yahoo and DMOZ directories.) Now imagine, if we could bridge the chasm between directories and search engines, making it much more customized to our likes and trails that we leave as we surf the Internet, and also taking into account all that we write in emails, blogs or otherwise.

Imagine a system that uses our memory and knowledge as the starting point. We begin by outlining our interest areas - the topics that form the ecosystem of our lives. This is akin to the Yahoo or DMOZ directory of topics – only, much more relevant to us. For example, in my case the main categories of this list would be something like this: Affordable Computing, ICT for Development, Emerging Markets, Enterprise Software, Information Management, New Technologies and India.

If one were to search these topics in Google, the resulting set of links would be helpful only to a small degree and only for the first few times that we did the search (since the results would be nearly the same each time in a short span of time).

These topics are wide topics, and need to be narrowed down. What is needed is a taxonomy for each of the topics, which helps in further refining our interests. The Google search results, perhaps the Yahoo (or DMOZ directory) and our own knowledge form the basis of this hierarchy. For example, my outline for Affordable Computing could look like this: Hardware (Thin Clients, Refurbished PCs, PDAs), Software (Linux, Applications, Language Computing), Communications (Ethernet, WiFi, WLL, VSAT).

This hierarchy of topics serves as the basis for our interests. It gives a unique lens and context to the information that we browse on the Web, write in emails and receive as attachments. These topics will evolve as our interests change and as we come across experts who may have done a better job in building out a certain part of the information ecosystem.

This is an evolving information base – built not by a centralised organization, but in a distributed manner by each of us. We all have expertise in specific areas. This was manifested in the early days of the Web through the millions of home pages created on Geocities and Tripod. At that time, the only way to build out these pages were by explicit and time-consuming personal involvement – something few of us were prepared to do. (Basically, the web was good for reading, but not as friendly for writing.)

So, now, imagine if each of us could build out these personal directories – outlines of topics and connections to other directories, people and documents. Much of this would happen automatically as we browsed and marked pages of interest, embellishing them with our comments. When we search, it would first scan our world of relevant information rather than the world wide web of documents.

In other words, each of us would have a microcosm of the information space, created and updated continuously by what we did. It would ensure that our ideas would have a context, that we would never forget something, and that we could leverage on similar work done by millions of others like us. This is the real two-way web – linking not just documents, but people, ideas and information.

Vannevar Bush imagined just such a system – in 1945. He called it the Memex.

Next Week: Constructing the Memex (continued)

[E M E R G I C . o r g]
9:08:20 PM