
Introduction
Since I began running sites of my own, I have started being (rabidly) interested in how well my sites rank for the major search engines. This script lets you check how your site is ranked for keywords in Google, MSN and Yahoo!.
You can try an online demo here.
Background
The basic idea behind the code is pretty simple. Just submit a search query to the search engine, and scrape the results to see how you went. There are plenty of problems with scraping - some technical and some ethical, but we will conveniently ignore those for now and just enjoy the code.
Using the code
I have included the source code for a single page (both the .aspx and the .aspx.vb). To use these, create a project and add these files to the project.
The main functions look like:
sRetHTML = sGetPostData("http://search.yahoo.com/search?p=" & _
sWordsToCheck & "&n=100", eRequestType.ePost)
sPlaces = sPlaces & "Yahoo: "
sPlaces = sPlaces & sFindPlace(sRetHTML, "about this page", _
"<b>results page:</b>", sSite, "<a class=yschttl")
Firstly, a call to sGetPostData
is made with the URL we want to scrape.
The data is then parsed. The parameters to the parsing function are:
sInput As String
: The HTML returned by the search engine.
sStart As String
: Text that indicates the start of the results in the HTML.
sEnd As String
: Text that indicates the finish of the results in the HTML.
sSite As String
: The site URL to look for in the HTML.
sSeparator As String
: Text that separates one search result from the next.
The scraping code is quite simple, but does include a parameter to allow for GET as well as POST requests. It also checks a few times if the first request fails.
Private Function sGetPostData(ByVal sRequestURL As String, _
ByVal RequestType As eRequestType) As String
Dim Writer As StreamWriter = Nothing
Dim WebRequestObject As HttpWebRequest
Dim sr As StreamReader
Dim WebResponseObject As HttpWebResponse
Dim iTries As Int16
Dim bOK As Boolean
Dim Results As String
Dim sbResultsBuilder As New StringBuilder()
Dim sTemp As String
Dim sBuffer(8192) As Char
Dim iRetChars As Integer
Dim sLASTCHARS As String
bOK = False
iTries = 0
Do While bOK = False And iTries < 10
Try
WebRequestObject = CType(WebRequest.Create(sRequestURL), HttpWebRequest)
WebRequestObject.ContentType = "application/x-www-form-urlencoded"
WebRequestObject.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; " & _
"Windows NT 5.2; .NET CLR 1.0.3705;)"
If RequestType = eRequestType.ePost Then
WebRequestObject.Method = "POST"
Writer = New StreamWriter(WebRequestObject.GetRequestStream())
Writer.Close()
Else
WebRequestObject.Method = "GET"
End If
WebResponseObject = CType(WebRequestObject.GetResponse(), HttpWebResponse)
sr = New StreamReader(WebResponseObject.GetResponseStream)
Results = ""
Do
iRetChars = sr.Read(sBuffer, 0, sBuffer.Length)
If iRetChars > 0 Then
sbResultsBuilder.Append(sBuffer, 0, iRetChars)
sTemp = sBuffer
If InStr(UCase(sTemp), "") <> 0 Then
Exit Do
End If
End If
Loop While iRetChars > 0
Results = sbResultsBuilder.ToString
sGetPostData = Results
sr.Close()
WebResponseObject.Close()
If sGetPostData <> "" Then
bOK = True
Else
iTries = iTries + 1
End If
Catch ex As Exception
iTries = iTries + 1
End Try
Loop
End Function
Points of Interest
Possibly, the only even slightly tricky things that are done is to use Server.UrlPathEncode
to replace spaces etc. with the coded equivalent used in a URL.
sWordsToCheck = Server.UrlPathEncode(sWordsToCheck)
and, in the sFindPlace
function:
sInput = sInput.Replace(sSeparator, Chr(31))
sFindPlace = CStr(sInput.Split(Chr(31)).GetUpperBound(0))
This takes the HTML code (sInput
) and replaces all the instances of the separator text with chr(31)
which is never found in a HTML results page. I can then use the Split
routine to create an array of strings and find the upper bound of the array - which will be my search engine position!
So that's it. I hope you rule the SERPs!
History
- 12/Dec/2005 - First release.