Google Desktop Command Line Search






3.61/5 (8 votes)
How to perform Google desktop search query in C#
Introduction
One day I discovered that it would be nice to have a search engine when connecting to your computer via SSH. In this article, I will describe how easy it is to write an application in C# language performing desktop search.
Background
The goal is to use Google Desktop API to perform search queries. As it is described in Google Desktop Query API Developer Guide, there are few ways of using GD engine. The easiest of them is HTML/XML - based Query API. The only thing you need to know is how to deal with HTTP Request and XML processing in .NET. We will send the query over HTTP protocol and receive search results in XML format.
Requesting a Desktop Search
Example query (from Google desktop API guide):http://127.0.0.1:4664/search&s=1ftR7c_hVZKYvuYS-RWnFHk91Z0?q=Google&format=xml
Google description of the above query is shown below:
http://127.0.0.1:4664/
is the localhost address and Google Desktop port.search&s=1ftR7c_hVZKYvuYS-RWnFHk91Z0
is the search command and a security token.?q=Google
is the query term(s) parameter.- If you want to search for more than one term, separate the terms with
+
s. For example, to search for both "Google" and "Desktop", use?q=Google+Desktop
If you want to search for a specific phrase, separate the terms with
+
s and surround the phrase with%22
s. For example, to search for the phrase "Google Desktop", use?q=%22Google+Desktop%22
To search for the two phrases "Google Desktop" and "Copyright 2007", use?q=%22Google+Desktop%22+%22Copyright+2007%22
&format=xml
specifies that the HTTP response returns the search results in XML format, as described in the next section.
Query URL is kept in Windows registry:
HKEY_CURRENT_USER\Software\Google\Google Desktop\API\search_url
The algorithm consists of:
- Getting Query URL from registry
- Performing
HttpWebRequest
- Parsing received XML to valid format
An example of XML result is shown below:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<results count="24945">
<result>
<category>web</category>
<!-- ... optional internal,
implementation-specific elements such as ids and flags... -->
<title>Developer Knowledge Base - Read the Google Desktop blog</title>
<url>http://code.google.com/support/bin/answer.py?answer=66869&topic=10434</url>
<time>128243290079530000</time>
<snippet>Desktop engineers regularly post development articles and
announce updates to th...</snippet>
<thumbnail>/thumbnail?id=6%5Fvwps3QA4FIYGAAAA&s=wgQCmjGl0VEzw3KVhm3mxBG_x48
</thumbnail>
<icon>/icon?id=http%3A%2F%2Fcode%2Egoogle%2Ecom%2F&s=kKR1by-QXDMlb5vEhxkDZhCv3eE
</icon>
<cache_url>http://127.0.0.1:4664/...</cache_url>
</result>
...
</results>
In this example, I will use only title, URL, and category nodes. Here is the class example used to deserialize the XML:
namespace googleDesktop
{
[XmlRoot("Added")]
public class deserialize
{
[XmlArray("results")]
[XmlArrayItem("result")]
public Result[] Results;
public deserialize()
{
}
}
public class Result
{
public Result()
{
}
[XmlElement("url")]
public string location;
[XmlElement("category")]
public string cat;
[XmlElement("title")]
public string title;
}
}
As you can see, our XML <results></results>
content should be enclosed in another node. That's why we will add the <Added></Added>
nodes in the proper place. Now you are ready to analyze the code:
static void Main(string[] args)
{
// Search keywords are derived in command line arguments
if (args.Length > 0)
{
// Obtain Query URL from Windows registry
RegistryKey currentUser = RegistryKey.OpenRemoteBaseKey
(Microsoft.Win32.RegistryHive.CurrentUser, "");
RegistryKey searchUrl = currentUser.OpenSubKey
("Software\\Google\\Google Desktop\\API");
object key = searchUrl.GetValue("search_url");
string query = "";
// Add search keywords to query URL according to scheme -
// keywords separated by '+'
foreach (string var in args)
{
query += var + "+";
}
query = query.Remove(query.Length - 1);
// Last element of the query
string connection = key.ToString() + query + "&format=xml";
// Create Web Client
WebClient wc = new WebClient();
// Obtain XML as string
string result = wc.DownloadString(connection);
// Insert <Added> node before the <results>
result = result.Insert(result.IndexOf("<results"), "<Added>");
// Insert </Added> after the </results>
result = result.Insert(result.Length - 1, "</Added>");
// Prepare serializer...
XmlSerializer ser = new XmlSerializer(typeof(deserialize));
// ... and textreader
System.IO.TextReader hehe = new System.IO.StringReader(result);
// Create instance of 'deserialize' class
deserialize myObj = new deserialize();
// Deserialize
myObj = (deserialize)ser.Deserialize(hehe);
// Print the results
print(myObj);
}
}
And the printing method (You can experiment without HtmlDecode
method to observe the difference. Regex
method is used to delete any HTML tags from string
):
private static void print(deserialize myObj)
{
System.Text.RegularExpressions.Regex asd =
new System.Text.RegularExpressions.Regex("<[^>]*>");
foreach (Result var in myObj.Results)
{
Console.WriteLine
(" --------------------------------------------------------- ");
Console.WriteLine("Title: " +
System.Web.HttpUtility.HtmlDecode(asd.Replace(var.title, "")));
Console.WriteLine("Category: " + var.cat);
Console.WriteLine("Location: " +
System.Web.HttpUtility.HtmlDecode(var.location));
Console.WriteLine("\n\n");
}
}
Summary
The purpose of this article was to show you only the idea of using Google Desktop engine. Google Desktop API has more features that I've omitted. The code is obviously not error proof. If someone will find a way to construct 'deserialize' class without adding additional nodes, please post the solution. I'd love to see that.
History
- 15th September, 2007: Initial post