Download google-pagerank.zip - 515.4 KB
Introduction
Google�s PageRank (PR) is a "link analysis algorithm measuring the relative importance" (PR @wikipedia). The importance of PR nowadays is a lot lower, than one or two years ago. Never the less, PR is the only Ranking value, that is public to all audience, which means it�s the only factor with some transparency. For those, who don�t know: a PR of 10 is the highest around (like apple.com) and 0 the lowest - those sites, who don�t even have a PR of 0 are in a kind of sandbox (a special filter to punish the site) or not indexed by google.
Please forgive me for beeing lazy in the english lessons @ school as I�m trying my best :)
Background
Google tries to measure the relevance of a domain/site by counting the links pointing to the site/domain. This is influenced by the number of links, that link to the linking site - in fact this kind of procedure is an iterative process, which needs a lot computing power.
Many webmasters believe, their ranking depends on the PR of their site - this, today, is not true. PR never was the only factor for google�s ranking, but it was the most important factor. Right now, it�s not. And many people believe, that google tries to get rid of the RageRank, because link traders are measuring the value (in $) of a link by PR - which is just stupid.
If you�re interested in buying links, go with following factors:
- Linkpopularity (how often is the site, you�re willing to buy a link from, linked?)
- Domainpopularity (^ + by different domains)
- IP-Popularity (^ + on different IPs)
- has the Domain an "authority status"?
- Is the content of the domain relevant for your content?
- Has this domain a good ranking for keywords you wanna rank good at?
- How many outgoing links does this site have?
Because PR is the one and only factor, we can have a look at, it�s pretty nice to check it. And it�s even more nice, if we can do that on more than one google Datacenter at the same time.
Requesting the PR
Well, the easy part it, how the PR get�s requestet: it�s just a simpel HTTP-Request, with a little problem in it: here�s the request for www.codeproject.com
http:
&oe=UTF-8&features=Rank&q=info:http%3A%2F%2Fwww.codeproject.com%2F
Well, this seems to be easy, but there�s this little
ch=6771535612
which is a hash value, referencing the domain we want to get the PR for. This hashing algorithm was NOT developed by google, it�s the perfect hashing algorithm by Bob Jenkins
After some folks ported the code to php, I tried to do a port to C# - and here we go.
At first I need to mention, that (after I finished my coding) I found another port my Miroslav Stompar, which you can find here
To be honest, his port was better, so I modified my version and here comes the solution, that�s my favorite:
Ported to C#
"Google_Pagerank/google-pagerank.zip">Download google-pagerank.zip - 515.4 KBusing System;
using System.Collections.Generic;
using System.Text;
using System.Text.RegularExpressions;
using System.Net;
using System.IO;
namespace GooglePR
{
class GetPR
{
private const UInt32 myConst = 0xE6359A60;
private static void _Hashing(ref UInt32 a, ref UInt32 b, ref UInt32 c)
{
a -= b; a -= c; a ^= c >> 13;
b -= c; b -= a; b ^= a << 8;
c -= a; c -= b; c ^= b >> 13;
a -= b; a -= c; a ^= c >> 12;
b -= c; b -= a; b ^= a << 16;
c -= a; c -= b; c ^= b >> 5;
a -= b; a -= c; a ^= c >> 3;
b -= c; b -= a; b ^= a << 10;
c -= a; c -= b; c ^= b >> 15;
}
public static string PerfectHash(string theURL)
{
url = string.Format("info:{0}", theURL);
int length = url.Length;
UInt32 a, b;
UInt32 c = myConst;
int k = 0;
int len = length;
a = b = 0x9E3779B9;
while (len >= 12)
{
a += (UInt32)(url[k + 0] + (url[k + 1] << 8) + (url[k + 2] << 16) + (url[k + 3] << 24));
b += (UInt32)(url[k + 4] + (url[k + 5] << 8) + (url[k + 6] << 16) + (url[k + 7] << 24));
c += (UInt32)(url[k + 8] + (url[k + 9] << 8) + (url[k + 10] << 16) + (url[k + 11] << 24));
_Hashing(ref a, ref b, ref c);
k += 12;
len -= 12;
}
c += (UInt32)length;
switch (len)
{
case 11:
c += (UInt32)(url[k + 10] << 24);
goto case 10;
case 10:
c += (UInt32)(url[k + 9] << 16);
goto case 9;
case 9:
c += (UInt32)(url[k + 8] << 8);
goto case 8;
case 8:
b += (UInt32)(url[k + 7] << 24);
goto case 7;
case 7:
b += (UInt32)(url[k + 6] << 16);
goto case 6;
case 6:
b += (UInt32)(url[k + 5] << 8);
goto case 5;
case 5:
b += (UInt32)(url[k + 4]);
goto case 4;
case 4:
a += (UInt32)(url[k + 3] << 24);
goto case 3;
case 3:
a += (UInt32)(url[k + 2] << 16);
goto case 2;
case 2:
a += (UInt32)(url[k + 1] << 8);
goto case 1;
case 1:
a += (UInt32)(url[k + 0]);
break;
default:
break;
}
_Hashing(ref a, ref b, ref c);
return string.Format("6{0}", c);
}
public static int MyPR(string myURL)
{
string strDomainHash = PerfectHash(myURL);
string myRequestURL = string.Format("http://toolbarqueries.google.com/
search?client=navclient-auto&ch={0}&features=Rank&q=info:{1}",
strDomainHash, myURL);
try
{
HttpWebRequest myRequest = (HttpWebRequest)WebRequest.Create(myRequestURL);
string myResponse = new StreamReader(myRequest.GetResponse().GetResponseStream()).ReadToEnd();
if (myResponse.Length == 0)
return 0;
else
return int.Parse(Regex.Match(myResponse, "Rank_1:[0-9]:([0-9]+)").Groups[1].Value);
}
catch (Exception)
{
return -1;
}
}
}
}
So many thx to Miroslav, who did the better job :)
Example: an ASP.NET Version
Here you can find the ASP.NET-Version of a PR-Checker - this one checks the PR of the given Domain/Site of different IPs, which means different google Datacenters. Because google only updates the shown PR (Toolbar PR) about every 3 months, this tool is nice to check, if there�s an update running - while the update runs, you�ll get different PRs for the same page (in case, that the PR raises or falls) - interesting, isn�t it?
To check more than one Datacenter, I just created a loop and dynamically replace the
toolbarqueries.google.com
part of the request with a google IP - a list of IPs can be found via google :)
If the tool shows "-1" the PR couldn�t be retrieved, due to any reason
History
0.2 Uploaded Source-Code
I got to mention something first: if you�re using the uploaded example, then u�re
using the code by miro stampar - for some reasons my Code is blown up with other
things, I�m still working on. So don�t worry, why the code differs from the code in this article
0.1 correction of a variable name (myURL to url) - thx to CP-user ploufs :)
|
|
 |
 | Google is disabling the IP's empee | 21:06 18 Feb '10 |
|
 |
Hi,
I have been using this for about 6 months.
Since Google updates his ranks every 3 months,
I have to do a mass update of my database too (about 112 000 sites). My program tests all the IP's and then uses only the ones which responded.
Now i have a problem. None of the 620 Google IP's I have is responding. They all return -1
I think Google are blocking requests from my IP. I cannot change my own IP so easily. (maybe put the DB on a notebook and go plug it somewhere...)
What do you suggest?
|
|
|
|
 |
 | Java Kris Reid | 8:41 27 Jan '10 |
|
 |
That looks fantastic. Any chance of a port to Java? Would be a life saver!
|
|
|
|
 |
 | hi nice work abu subh | 11:25 29 Mar '09 |
|
 |
hi
any one can help me to implemement Weighted PageRank algorithm
truru
|
|
|
|
 |
 | What about keywords? shaychen | 2:20 4 Mar '09 |
|
 |
Hi,
I've notice that some PR tools has the ability to get keyword along with the site url and calculate the PR. Do you know how they do it? Is there a code for that too?
See the following link with the keywords: http://www.googlerankings.com/
thanks, Shay
|
|
|
|
 |
|
 |
Well - PageRank has nothing to do with keywords, it´s just about URLs. The service you´ve mentioned is a Google SERP-Scraper, that displays the PageRank for the first results for a specific keyword (searchterm). To achieve the same functionality you need to parse googles search engine result pages (SERP) and get the PageRank for every URL in the resultset.
|
|
|
|
 |
|
|
 |
|
 |
No - PageRank is a metric invented by Google to measure the linkwheight a URL has (measured by the number of inbound links and the wheight of the urls where these links are placed on). What you´re looking for is a Ranking-Check thing, for which, as I described, you´ll need to scrape Googles Result-Pages (SERPs).
|
|
|
|
 |
|
|
 |
 | 403 on Server 2003 joehinder | 8:15 5 Jan '09 |
|
 |
I've come across an odd problem with this code that I've see on php message boards, but not in .Net code. Interestingly, it works fine on my dev machines (Windows Vista), however I get 403 Forbidden errors in production (Windows 2003). From what I've seen in PHP (http://www.hm2k.com/projects/pagerank[^]) this has to do with bitwise operations being handled in different ways on different OSes. Think this could be a problem for Windows as well?
|
|
|
|
 |
 | Great source jeffwow | 14:13 2 Jul '08 |
|
 |
Many thanks!
As my solution is in VB.net, I embeded this in a class and it works perfectly. Just had to make my class public.
Great work
Jeff Tardif
|
|
|
|
 |
 | Code don't work and error on it (int length = url.Length;) ploufs | 14:58 16 Aug '07 |
|
 |
url variable is not set.
using System.Collections.Generic; is not use in code.
|
|
|
|
 |
|
 |
Well,
I´m sorry - I mixed my source files up and copied from the wrong source.
I corrected the snippet in the article, thx for poiting me to it
Sry for that
|
|
|
|
 |
 | Nice. But one problem Irfan Faruki | 5:51 16 Aug '07 |
|
 |
Hi This is really nice. Exactly what i was looking for but i am getting error with this statement
string myResponse = new StreamReader(myRequest.GetResponse().GetResponseStream()).ReadToEnd();
it always fails on this line and the exception is caught. But if i run the generated URL directly then PR is retrived correctly
Any ideas? Irfan
I am still learning
|
|
|
|
 |
|
 |
Hi,
sry for yout trouble. Please post the exception as this code works fine for me
Beste regards
|
|
|
|
 |
|
 |
Hi It does not through an exception as its handle by the code below. But when ever program gets to that line of code, it just jumps out directly to exception
I am still learning
|
|
|
|
 |
|
 |
Okay, my thought was, that you catch the thrown exception, like
... catch(Exception exp) { Console.WriteLine(exp.ToString); }
A different problem is, that googl´s server sometimes answers correctly, and sometimes not. It doesn´t have to do with the Hashing-Thing, so maybe our HTTPRequest-thing needs to be better?
|
|
|
|
 |
 | Suggestion Glen Harvy | 4:16 16 Aug '07 |
|
 |
Hi,
I would like to have seen this article include a completed working project. I presume you have one so why not include it?
Without this the article is not really complete although it was informative as to what google are using for their hash code and how to use it.
Also, why is this article so wide on the screen?
Glen Harvy
|
|
|
|
 |
|
 |
Hi,
thx for your suggestion, a really thought about publishing my little project.
At first, my idea was to develop the best PR checker there is - but than I found several good tools, that really fit my needs - not only PR, but ranking, backlinking, and so on.
So my little project is just not worth beeing published, I thought. The intention of my article was to tell, how this little PR-Hashing thing could be solved and how we can play with google to identify the values that have the highest influence on the sites ranking. Maybe one of yours will now find a way to build the best "I want to check my sites ranking factors"-Tool there is Go for it!
So, if you really want to have the code, I´ll upload it - but you won´t find even a single line of code, that is worth to be published execpt the lines already published. *wondering* - I hope it´s clear, what I mean
I tried to make the article smaller - sry for that, it´s my first article on CP
|
|
|
|
 |
|
|
 |
|
 |
Okay, Glen, you got me. I added the zip-file to the article with a stable version of that code. I deleted some addiotional things, I´m working on, but the main idea of this approach is in this zip-file.
|
|
|
|
 |
 | Nice work. Secrets | 4:07 16 Aug '07 |
|
 |
a really nice article. i was wondering about how this stuff works from a long time. Thanks Man.
|
|
|
|
 |
|
 |
Secrets wrote: a really nice article. i was wondering about how this stuff works from a long time. Thanks Man.
thx a lot
|
|
|
|
 |
|
|
Last Updated 16 Aug 2007 |
Advertise |
Privacy |
Terms of Use |
Copyright ©
CodeProject, 1999-2010