Genesis of PageRank.
I've been playing with Grokker Preview Release 2 this evening. It's a big improvement over PR1 in many ways although I still wouldn't recommend that anyone other than a search tool nut buy it at this point. However it did lead me to an interesting paper by Sergey Brin about Google. This was written in, I guess, 97/98 i.e. well before Google became the monster it is today.
However it does have the best description of the page rank algorithm and how it is calculated that I have seen so far. I'm guessing it's a good deal more sophisticated these days but this might be of interest for others like myself who wonder about the inner workings.
To quote from that paper:
2.1.1 Description of PageRank Calculation
Academic citation literature has been applied to the web, largely by counting citations or backlinks to a given page. This gives some approximation of a page's importance or quality. PageRank extends this idea by not counting links from all pages equally, and by normalizing by the number of links on a page. PageRank is defined as follows:
We assume page A has pages T1...Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. There are more details about d in the next section. Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows:
PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))
Note that the PageRanks form a probability distribution over web pages, so the sum of all web pages' PageRanks will be one.
PageRank or PR(A) can be calculated using a simple iterative algorithm, and corresponds to the principal eigenvector of the normalized link matrix of the web. Also, a PageRank for 26 million web pages can be computed in a few hours on a medium size workstation. There are many other details which are beyond the scope of this paper. [Curiouser and curiouser!]
11:35:54 AM
|