Introduction
One day I discovered that it would be nice to have a search engine when connecting to your computer via SSH. In this article, I will describe how easy it is to write an application in C# language performing desktop search.
Background
The goal is to use Google Desktop API to perform search queries. As it is described in Google Desktop Query API Developer Guide, there are few ways of using GD engine. The easiest of them is HTML/XML - based Query API. The only thing you need to know is how to deal with HTTP Request and XML processing in .NET. We will send the query over HTTP protocol and receive search results in XML format.
Requesting a Desktop Search
Example query (from Google desktop API guide):
http://127.0.0.1:4664/search&s=1ftR7c_hVZKYvuYS-RWnFHk91Z0?q=Google&format=xml
Google description of the above query is shown below:
http:
is the localhost address and Google Desktop port.
search&s=1ftR7c_hVZKYvuYS-RWnFHk91Z0
is the search command and a security token.
?q=Google
is the query term(s) parameter.
- If you want to search for more than one term, separate the terms with
+
s. For example, to search for both "Google" and "Desktop", use ?q=Google+Desktop
If you want to search for a specific phrase, separate the terms with +
s and surround the phrase with %22
s. For example, to search for the phrase "Google Desktop", use ?q=%22Google+Desktop%22
To search for the two phrases "Google Desktop" and "Copyright 2007", use ?q=%22Google+Desktop%22+%22Copyright+2007%22
&format=xml
specifies that the HTTP response returns the search results in XML format, as described in the next section.
Query URL is kept in Windows registry:
HKEY_CURRENT_USER\Software\Google\Google Desktop\API\search_url
The algorithm consists of:
- Getting Query URL from registry
- Performing
HttpWebRequest
- Parsing received XML to valid format
An example of XML result is shown below:
="1.0" ="UTF-8" ="yes"
<results count="24945">
<result>
<category>web</category>
-->
<title>Developer Knowledge Base - Read the Google Desktop blog</title>
<url>http://code.google.com/support/bin/answer.py?answer=66869&topic=10434</url>
<time>128243290079530000</time>
<snippet>Desktop engineers regularly post development articles and
announce updates to th...</snippet>
<thumbnail>/thumbnail?id=6%5Fvwps3QA4FIYGAAAA&s=wgQCmjGl0VEzw3KVhm3mxBG_x48
</thumbnail>
<icon>/icon?id=http%3A%2F%2Fcode%2Egoogle%2Ecom%2F&s=kKR1by-QXDMlb5vEhxkDZhCv3eE
</icon>
<cache_url>http://127.0.0.1:4664/...</cache_url>
</result>
...
</results>
In this example, I will use only title, URL, and category nodes. Here is the class example used to deserialize the XML:
namespace googleDesktop
{
[XmlRoot("Added")]
public class deserialize
{
[XmlArray("results")]
[XmlArrayItem("result")]
public Result[] Results;
public deserialize()
{
}
}
public class Result
{
public Result()
{
}
[XmlElement("url")]
public string location;
[XmlElement("category")]
public string cat;
[XmlElement("title")]
public string title;
}
}
As you can see, our XML <results></results>
content should be enclosed in another node. That's why we will add the <Added></Added>
nodes in the proper place. Now you are ready to analyze the code:
static void Main(string[] args)
{
if (args.Length > 0)
{
RegistryKey currentUser = RegistryKey.OpenRemoteBaseKey
(Microsoft.Win32.RegistryHive.CurrentUser, "");
RegistryKey searchUrl = currentUser.OpenSubKey
("Software\\Google\\Google Desktop\\API");
object key = searchUrl.GetValue("search_url");
string query = "";
foreach (string var in args)
{
query += var + "+";
}
query = query.Remove(query.Length - 1);
string connection = key.ToString() + query + "&format=xml";
WebClient wc = new WebClient();
string result = wc.DownloadString(connection);
result = result.Insert(result.IndexOf("<results"), "<Added>");
result = result.Insert(result.Length - 1, "</Added>");
XmlSerializer ser = new XmlSerializer(typeof(deserialize));
System.IO.TextReader hehe = new System.IO.StringReader(result);
deserialize myObj = new deserialize();
myObj = (deserialize)ser.Deserialize(hehe);
print(myObj);
}
}
And the printing method (You can experiment without
HtmlDecode
method to observe the difference.
Regex
method is used to delete any HTML tags from
string
):
private static void print(deserialize myObj)
{
System.Text.RegularExpressions.Regex asd =
new System.Text.RegularExpressions.Regex("<[^>]*>");
foreach (Result var in myObj.Results)
{
Console.WriteLine
(" --------------------------------------------------------- ");
Console.WriteLine("Title: " +
System.Web.HttpUtility.HtmlDecode(asd.Replace(var.title, "")));
Console.WriteLine("Category: " + var.cat);
Console.WriteLine("Location: " +
System.Web.HttpUtility.HtmlDecode(var.location));
Console.WriteLine("\n\n");
}
}
Summary
The purpose of this article was to show you only the idea of using Google Desktop engine. Google Desktop API has more features that I've omitted. The code is obviously not error proof. If someone will find a way to construct 'deserialize' class without adding additional nodes, please post the solution. I'd love to see that.
History
- 15th September, 2007: Initial post