Found Objects as collected by John Lawlor :: business blog marketing consultant ::

:: BlogAnswerMan :: Blog About Blogs :: Random Interests Blog :: Online Marketing Blog ::

>

Thursday, February 13, 2003

>

AllTheWeb/FAST and Google: In Practice AllTheWeb Best.

A bench test study of Google and contender AllTheWeb/FAST reveals some weaknesses in Google's offering. From a user's perspective, all the Web/FAST is, in practice, a sounder search engine than Google. However, Google delivers fresher content with a greater percentage of pages being present in its database from individual sites.

The study was completed on Thursday 13 February 2003 by university students hired by Verity Intellectual Properties Pty Ltd an Australian Internet Publishing Company. The objectives of the test were to evaluate in practice how ready each of these two search engines are to host user searches of a target site and how efficiently each search engine would reveal a target page on that site using known search terms that people have used already to get to that particular page through a search engine.

In the study design, six key variables were considered that may alter whether or not a potential user may arrive at a target page and/or a target site where that page is located:

  1. How many pages from the target site are present in the search engine's database?
  2. Is the target page, upon which the search test is going to be performed, present in the database?
  3. Which pages of the target site are not listed and how recent are those pages that are present?
  4. Are there any noted problems with the listing of these pages that could hinder someone searching?
  5. When using a multi-word conceptual search terms in what position does our target search page arrive in the results pages?
  6. What is the total number of documents realized in that search?

There are two major parts of a search engine we are testing: numbers one to four above have to do with robot collection of pages across the Internet, and five and six have to do with the algorithms and tools used by the search company to obtain a set of results from the search engine's database.

Robot Activity
The target site used in this particular study is a new weblog site listed in the Radio Weblog pages. According to Webmaster World a robot, dubbed 'freshbot', visits weblogs on a semi-daily basis indexing pages. The site, Google Village, has been in operation since 2 January 2003 and is a site containing 44 pages. Essentially about a page a day is produced and in some days two pages. It has been noted from a study of the logs that the robots for Google and AllTheWeb have visited the site.

AllTheWeb has databased 61% of the total 44 pages that could have been located by the robot; the 44 pages represent every page produced between January 2 and 26, 2003. No pages are listed in AllTheWeb since January 26. That means that 18 pages have been produced by the site webmaster that are not able to be found in AllTheWeb.

  AllTheWeb Google
Percentage of Pages in Database 61% 79%
Target Page Present (Global Village) Yes Yes
Pages not listed 18/44 10/44
Problems in page listings No pages after 26 Jan
No pages missed before 26 Jan
8 pages missed in Jan
Duplicates in Jan

Google has databased more pages than AllTheWeb; every page produced in February has been databased and most pages in January. There are, however, 7 pages in January and 1 page in February that is missing. There are however, 9 duplicate pages in the Google system. Effectively, the information on those missed 8 pages does not exist as far as Google is concerned.

Essentially, AllTheWeb has a much cleaner and methodical listing of pages of this site than Google has. However, AllTheWeb does not have the most recent 18 pages. Google shines in listing fresh material, while AllTheWeb shines in getting a listing of every page in its listing time period. Should AllTheWeb increase the number of visits to sites, it would have a much more accurate listing than Google.

Pinpointing the Target Page
The next stage of the study involved using a set of 50 multi-word concept searches that ideally would bring a target page from the site into one of the first five listings in a search result page. The target page we selected is a text rich page that has numbers of concepts that we can use in a search test.

An example of the types of searches carried out in these 50 searches include:

Search Terms AllTheWeb Google
  Position

No of Docs

Position

No of Docs

autocratic traditional media dominate 1 21,671 3 1,970
mcluhan analysis media global village 7 2,930 52 3,570
mcluhan information across national borders 4 1,884 37 1,370

These samples are representative of the 50 searches conducted for this test. That is, AllTheWeb consistently brought the target document within the first ten documents and in 65% of the tests the target document was located in the first 5 results. The worst positioning for one of the searches of the target page was position 35. The target page was listed within the first 35 results 100% of the time.

The variability of locating the target document was far wider and more inconsistent in Google than in AllTheWeb. In just 22% of the cases, Google listed the target document in the first ten results; in 28% of the tests Google listed the target document in the first twenty documents. For 50% of the time the target page was listed in position 45 or worse, and in two instances the target page was listed in position 181 or 189.

AllTheWeb would seem to be a far more predictable search tool to locate information from within the FAST database than Google's search tools.

Assessment of Results
In practice, Google delivers approximately 22% of Google Village's traffic, while AllTheWeb delivers none that we have detected to date. We regularly receive referrals from Google where multi-word search terms such as in the examples above are used. The test page has been found with each of the test 50 examples used in the study.

AllTheWeb/FAST has a sounder search engine in practice. It lists pages without fault, and it does not miss pages. In addition, AllTheWeb is far better at pinpointing a specific document close to a multi-word conceptual term. In theory, AllTheWeb should be delivering referrals to Google Village.

On the other hand, Google is far better at actually listing current pages. Google is not perfect in actually locating pages and listing them, but it gets the majority of them there within a day of a page being released onto the Internet. Google misses pages and carries duplicate pages.

For each of the referrals to actually arrive at Google Village, searchers have had to work their way through to at least the 189th result to get to Google Village. Google searchers are dedicated and work hard seemingly. If those same people had used AllTheWeb they would have only had to work through to the 35th result -- a lot less work. It could be concluded that if Google had a better device to pinpoint relevant pages, they could deliver to this site lots more people -- the ones that gave up in the process.

Conclusion
Simply put, AllTheWeb just needs to increase the frequency of their robot crawls and Google would have some formidable competition. AllTheWeb could, it seems, give searchers in practice a more efficient service than could Google.

AllTheWeb Google
Lists every page
No duplicates
Visits less frequently
Does not have fresh pages
More efficient in locating a target page

Misses some pages
Carries duplicates of other pages
Visits each second day
Has pages a day old
Less efficiency and reliability in locating a target page

However, Google has the 'techo' appeal the market seems to be looking for and its staff are the stars of a well branded search engine. It would seem that the sheer volume of people using Google delivers referrals to Google Village. In addition, being an independent online magazine that critiques Google this also gives Google an unfair starting advantage over AllTheWeb.

Incidentally, Google Village has never been submitted to either of these two search engines. Any listings are there because the respective robot found the site and its pages.

Look out Google, AllTheWeb/FAST does have some good stuff there. In practice, it seems that AllTheWeb could deliver -- it is that close; it remains to be seen whether AllTheWeb can become a market darling. Perhaps we can conclude that it is not just a matter of getting it right in practice. There is also mind share that has also to be won.++

[Elwyn Jenkins: Google™ Village]

>

Why We Blog -- BlogLogic is a new site that is asking "Why do you blog?"

-Serif size=2>So many people are now into online journals/weblogs. It seems to be the latest and quickest growing trend in the endless Internet community. Wether you're a teenager in high school and you want to let all of your friends know about the cute guy that you met over the weekend, or you love to stand over an over and you love new, spicy recipies, weblogs provide communication for everyone, for any reason imaginable.

>

Technorati: Top 100 Interesting Newcomers.
"A list of interesting blogs that you may not know about, but people are talking about. It is biased towards blogs that have a moderate number of people linking to them, but who have had some interesting original content in the last few days." [Scripting News]

>

Can the spam
Can the Spam, Say Office Workers. A recent two-part survey finds that 9 out of 10 Americans who access e-mail at work want Congress to enforce the elimination of spam e-mail. [internetnews.com: Internet Advertising Report]

Recent Posts from
Blog Answer Man
 5/27/03
 5/24/03
 4/25/03
 4/7/03
 4/2/03
 3/21/03
 3/10/03
 3/10/03
 3/5/03
 3/3/03
 2/28/03
 2/26/03
 2/25/03

Recent Posts from our
Blog about Blogs
 7/17/03
 6/22/03
 6/19/03
 5/27/03
 5/27/03
 5/27/03
 5/27/03
 5/27/03
 5/27/03
 5/26/03
 5/26/03
 5/26/03
 5/26/03
 5/26/03
 5/26/03
 5/26/03
 5/26/03
 5/25/03
 5/25/03
 5/23/03
 5/19/03
 5/16/03
 5/2/03
 4/30/03
 4/30/03
 4/29/03

Recent Posts from
John Lawlor's Random Interests Blog
 11/25/03
 11/25/03
 11/25/03
 11/2/03
 10/18/03
 10/11/03
 10/11/03
 10/11/03
 10/11/03
 10/11/03
 10/11/03
 10/11/03
 10/11/03
 10/11/03
 10/11/03
 8/26/03
 8/25/03
 8/25/03
 7/25/03
 7/25/03
 7/14/03
 7/11/03
 6/25/03
 6/25/03
 6/22/03
 6/20/03