Click here to Skip to main content
Email Password   helpLost your password?

Download google-pagerank.zip - 515.4 KB
Screenshot - pagerank-cp.jpg

Introduction

Google�s PageRank (PR) is a "link analysis algorithm measuring the relative importance" (PR @wikipedia). The importance of PR nowadays is a lot lower, than one or two years ago. Never the less, PR is the only Ranking value, that is public to all audience, which means it�s the only factor with some transparency. For those, who don�t know: a PR of 10 is the highest around (like apple.com) and 0 the lowest - those sites, who don�t even have a PR of 0 are in a kind of sandbox (a special filter to punish the site) or not indexed by google.

Please forgive me for beeing lazy in the english lessons @ school as I�m trying my best :)

Background

Google tries to measure the relevance of a domain/site by counting the links pointing to the site/domain. This is influenced by the number of links, that link to the linking site - in fact this kind of procedure is an iterative process, which needs a lot computing power.

Many webmasters believe, their ranking depends on the PR of their site - this, today, is not true. PR never was the only factor for google�s ranking, but it was the most important factor. Right now, it�s not. And many people believe, that google tries to get rid of the RageRank, because link traders are measuring the value (in $) of a link by PR - which is just stupid.

If you�re interested in buying links, go with following factors:

Because PR is the one and only factor, we can have a look at, it�s pretty nice to check it. And it�s even more nice, if we can do that on more than one google Datacenter at the same time.

Requesting the PR

Well, the easy part it, how the PR get�s requestet: it�s just a simpel HTTP-Request, with a little problem in it: here�s the request for www.codeproject.com

//

 http://toolbarqueries.google.com/search?client=navclient-auto&hl=en&ch=6771535612&ie=UTF-8

    &oe=UTF-8&features=Rank&q=info:http%3A%2F%2Fwww.codeproject.com%2F
//

Well, this seems to be easy, but there�s this little

ch=6771535612 

which is a hash value, referencing the domain we want to get the PR for. This hashing algorithm was NOT developed by google, it�s the perfect hashing algorithm by Bob Jenkins

After some folks ported the code to php, I tried to do a port to C# - and here we go.

At first I need to mention, that (after I finished my coding) I found another port my Miroslav Stompar, which you can find here

To be honest, his port was better, so I modified my version and here comes the solution, that�s my favorite:

Ported to C#

"Google_Pagerank/google-pagerank.zip">Download google-pagerank.zip - 515.4 KBusing System;
using System.Collections.Generic;
using System.Text;  
using System.Text.RegularExpressions;
using System.Net;
using System.IO;

namespace GooglePR
{
    class GetPR
    {
        private const UInt32 myConst = 0xE6359A60;
        private static void _Hashing(ref UInt32 a, ref UInt32 b, ref UInt32 c)
        {
            a -= b; a -= c; a ^= c >> 13;
            b -= c; b -= a; b ^= a << 8;
            c -= a; c -= b; c ^= b >> 13;
            a -= b; a -= c; a ^= c >> 12;
            b -= c; b -= a; b ^= a << 16;
            c -= a; c -= b; c ^= b >> 5;
            a -= b; a -= c; a ^= c >> 3;
            b -= c; b -= a; b ^= a << 10;
            c -= a; c -= b; c ^= b >> 15;
        }
        public static string PerfectHash(string theURL)
        {
            url = string.Format("info:{0}", theURL);

            int length = url.Length;
            
            UInt32 a, b;
            UInt32 c = myConst;

            int k = 0;
            int len = length;

            a = b = 0x9E3779B9;

            while (len >= 12)
            {
                a += (UInt32)(url[k + 0] + (url[k + 1] << 8) + (url[k + 2] << 16) + (url[k + 3] << 24));
                b += (UInt32)(url[k + 4] + (url[k + 5] << 8) + (url[k + 6] << 16) + (url[k + 7] << 24));
                c += (UInt32)(url[k + 8] + (url[k + 9] << 8) + (url[k + 10] << 16) + (url[k + 11] << 24));
                _Hashing(ref a, ref b, ref c);
                k += 12;
                len -= 12;
            }
            c += (UInt32)length;
            switch (len) 
            {
                case 11: 
                    c += (UInt32)(url[k + 10] << 24); 
                    goto case 10;
                case 10: 
                    c += (UInt32)(url[k + 9] << 16); 
                    goto case 9;
                case 9: 
                    c += (UInt32)(url[k + 8] << 8); 
                    goto case 8;
                case 8: 
                    b += (UInt32)(url[k + 7] << 24); 
                    goto case 7;
                case 7: 
                    b += (UInt32)(url[k + 6] << 16); 
                    goto case 6;
                case 6: 
                    b += (UInt32)(url[k + 5] << 8); 
                    goto case 5;
                case 5: 
                    b += (UInt32)(url[k + 4]); 
                    goto case 4;
                case 4: 
                    a += (UInt32)(url[k + 3] << 24); 
                    goto case 3;
                case 3: 
                    a += (UInt32)(url[k + 2] << 16); 
                    goto case 2;
                case 2: 
                    a += (UInt32)(url[k + 1] << 8); 
                    goto case 1;
                case 1: 
                    a += (UInt32)(url[k + 0]); 
                    break;
                default: 
                    break;
            }
            
            _Hashing(ref a, ref b, ref c);

            return string.Format("6{0}", c);
        }

        public static int MyPR(string myURL)
        {
            string strDomainHash = PerfectHash(myURL);
            string myRequestURL = string.Format("http://toolbarqueries.google.com/
    search?client=navclient-auto&ch={0}&features=Rank&q=info:{1}", 
                strDomainHash, myURL);

            try
            {
                HttpWebRequest myRequest = (HttpWebRequest)WebRequest.Create(myRequestURL);
                string myResponse = new StreamReader(myRequest.GetResponse().GetResponseStream()).ReadToEnd();
                if (myResponse.Length == 0)
                    return 0;
                else
                    return int.Parse(Regex.Match(myResponse, "Rank_1:[0-9]:([0-9]+)").Groups[1].Value);
            }
            catch (Exception)
            {
                return -1;
            }
        }

    }
}
 
So many thx to Miroslav, who did the better job :)

Example: an ASP.NET Version

Here you can find the ASP.NET-Version of a PR-Checker - this one checks the PR of the given Domain/Site of different IPs, which means different google Datacenters. Because google only updates the shown PR (Toolbar PR) about every 3 months, this tool is nice to check, if there�s an update running - while the update runs, you�ll get different PRs for the same page (in case, that the PR raises or falls) - interesting, isn�t it?

To check more than one Datacenter, I just created a loop and dynamically replace the

toolbarqueries.google.com 

part of the request with a google IP - a list of IPs can be found via google :)

If the tool shows "-1" the PR couldn�t be retrieved, due to any reason

History

0.2 Uploaded Source-Code

I got to mention something first: if you�re using the uploaded example, then u�re
using the code by miro stampar - for some reasons my Code is blown up with other
things, I�m still working on. So don�t worry, why the code differs from the code in this article

0.1 correction of a variable name (myURL to url) - thx to CP-user ploufs :)

You must Sign In to use this message board.
 
 
Per page   
 FirstPrevNext
GeneralGoogle is disabling the IP's
empee
21:06 18 Feb '10  
Hi,

I have been using this for about 6 months.

Since Google updates his ranks every 3 months, I have to do a mass update of my database too (about 112 000 sites).
My program tests all the IP's and then uses only the ones which responded.

Now i have a problem. None of the 620 Google IP's I have is responding.
They all return -1

I think Google are blocking requests from my IP.
I cannot change my own IP so easily.
(maybe put the DB on a notebook and go plug it somewhere...)

What do you suggest?
GeneralJava
Kris Reid
8:41 27 Jan '10  
That looks fantastic. Any chance of a port to Java?
Would be a life saver!
Generalhi nice work
abu subh
11:25 29 Mar '09  
hi
any one can help me to implemement Weighted PageRank algorithm 


truru

GeneralWhat about keywords?
shaychen
2:20 4 Mar '09  
Hi,

I've notice that some PR tools has the ability to get keyword along with the site url and calculate the PR. Do you know how they do it? Is there a code for that too?

See the following link with the keywords: http://www.googlerankings.com/

thanks,
Shay
GeneralRe: What about keywords?
hartertobak
2:49 4 Mar '09  
Well - PageRank has nothing to do with keywords, it´s just about URLs.
The service you´ve mentioned is a Google SERP-Scraper, that displays the PageRank for the first results for a specific keyword (searchterm).
To achieve the same functionality you need to parse googles search engine result pages (SERP) and get the PageRank for every URL in the resultset.


GeneralRe: What about keywords?
shaychen
3:32 4 Mar '09  
Hi,

Maybe the url I gave as an example was not what I meant.

Please look at this url http://www.mikes-marketing-tools.com/ranking-reports/[^]

Is it the same thing??
hartertobak wrote:
Google SERP-Scraper

GeneralRe: What about keywords?
hartertobak
5:28 4 Mar '09  
No - PageRank is a metric invented by Google to measure the linkwheight a URL has (measured by the number of inbound links and the wheight of the urls where these links are placed on).
What you´re looking for is a Ranking-Check thing, for which, as I described, you´ll need to scrape Googles Result-Pages (SERPs).


GeneralRe: What about keywords?
shaychen
6:52 4 Mar '09  
Got it.

thanks dude
General403 on Server 2003
joehinder
8:15 5 Jan '09  
I've come across an odd problem with this code that I've see on php message boards, but not in .Net code. Interestingly, it works fine on my dev machines (Windows Vista), however I get 403 Forbidden errors in production (Windows 2003). From what I've seen in PHP (http://www.hm2k.com/projects/pagerank[^]) this has to do with bitwise operations being handled in different ways on different OSes. Think this could be a problem for Windows as well?
GeneralGreat source
jeffwow
14:13 2 Jul '08  
Many thanks!

As my solution is in VB.net, I embeded this in a class and it works perfectly. Just had to make my class public.

Great work

Jeff Tardif
GeneralCode don't work and error on it (int length = url.Length;)
ploufs
14:58 16 Aug '07  
url variable is not set.

using System.Collections.Generic; is not use in code.

AnswerRe: Code don't work and error on it (int length = url.Length;)
hartertobak
20:19 16 Aug '07  
Well,

I´m sorry - I mixed my source files up and copied from the wrong source.

I corrected the snippet in the article, thx for poiting me to it

Sry for that
GeneralNice. But one problem
Irfan Faruki
5:51 16 Aug '07  
Hi
This is really nice. Exactly what i was looking for but i am getting error with this statement

string myResponse = new StreamReader(myRequest.GetResponse().GetResponseStream()).ReadToEnd();

it always fails on this line and the exception is caught. But if i run the generated URL directly then PR is retrived correctly

Any ideas?
Irfan

I am still learning

GeneralRe: Nice. But one problem
hartertobak
10:27 16 Aug '07  
Hi,

sry for yout trouble. Please post the exception as this code works fine for me

Beste regards
GeneralRe: Nice. But one problem
Irfan Faruki
12:04 16 Aug '07  
Hi It does not through an exception as its handle by the code below. But when ever program gets to that line of code, it just jumps out directly to exception

I am still learning

GeneralRe: Nice. But one problem
hartertobak
20:16 16 Aug '07  
Okay, my thought was, that you catch the thrown exception, like

...
catch(Exception exp)
{
Console.WriteLine(exp.ToString);
}

A different problem is, that googl´s server sometimes answers correctly, and sometimes not. It doesn´t have to do with the Hashing-Thing, so maybe our HTTPRequest-thing needs to be better?
GeneralSuggestion
Glen Harvy
4:16 16 Aug '07  
Hi,

I would like to have seen this article include a completed working project. I presume you have one so why not include it?

Without this the article is not really complete although it was informative as to what google are using for their hash code and how to use it.

Also, why is this article so wide on the screen?

Glen Harvy

GeneralRe: Suggestion
hartertobak
4:31 16 Aug '07  
Hi,

thx for your suggestion, a really thought about publishing my little project.

At first, my idea was to develop the best PR checker there is - but than I found several good tools, that really fit my needs - not only PR, but ranking, backlinking, and so on.

So my little project is just not worth beeing published, I thought. The intention of my article was to tell, how this little PR-Hashing thing could be solved and how we can play with google to identify the values that have the highest influence on the sites ranking. Maybe one of yours will now find a way to build the best "I want to check my sites ranking factors"-Tool there is Smile
Go for it!

So, if you really want to have the code, I´ll upload it - but you won´t find even a single line of code, that is worth to be published execpt the lines already published.
*wondering* - I hope it´s clear, what I mean Confused

I tried to make the article smaller - sry for that, it´s my first article on CP
GeneralRe: Suggestion
Glen Harvy
11:58 16 Aug '07  
I appreciate your reasonings for not publishing but I would have prefered to see a complete article so that I can learn how others construct the programs they use and how they go about achieving their goals.

There doesn't seem to be a great deal of programing code on the web that deal with hashing and your project would be an excellent example of it's use.

Your program doesn't have to be the best *anything* but rather a tool that others can learn from and no doubt improve upon if they wish. No doubt, feedback you get may even improve your own programming skills.

The extra effort in uploading a working example is minimal.

You should let the user decide as to the worth of the code.

Thank you for contributing to CP, I don't know what I would have done with CP's help and all contributions should be valued. They are by me.

Cheers.

BTW: Don't worry about your english. If I can teach myself C# then broken english isn't going to worry me Smile

Also, can you please discover how to not have your article spread outside of the normal window. I hate horizontal scrolling Mad



Glen Harvy

GeneralRe: Suggestion
hartertobak
20:38 16 Aug '07  
Okay, Glen, you got me.
I added the zip-file to the article with a stable version of that code. I deleted some addiotional things, I´m working on, but the main idea of this approach is in this zip-file.

Smile
GeneralNice work.
Secrets
4:07 16 Aug '07  
a really nice article. i was wondering about how this stuff works from a long time. Thanks Man.
GeneralRe: Nice work.
hartertobak
4:24 16 Aug '07  
Secrets wrote:
a really nice article. i was wondering about how this stuff works from a long time. Thanks Man.

thx a lot Smile



Last Updated 16 Aug 2007 | Advertise | Privacy | Terms of Use | Copyright © CodeProject, 1999-2010