(untagged)

Google Desktop Command Line Search

piotr.kolodziej

0.00/5 (No votes)

14 Sep 2007

How to perform Google desktop search query in C#

Introduction

One day I discovered that it would be nice to have a search engine when connecting to your computer via SSH. In this article, I will describe how easy it is to write an application in C# language performing desktop search.

Background

The goal is to use Google Desktop API to perform search queries. As it is described in Google Desktop Query API Developer Guide, there are few ways of using GD engine. The easiest of them is HTML/XML - based Query API. The only thing you need to know is how to deal with HTTP Request and XML processing in .NET. We will send the query over HTTP protocol and receive search results in XML format.

Requesting a Desktop Search

Example query (from Google desktop API guide):

http://127.0.0.1:4664/search&s=1ftR7c_hVZKYvuYS-RWnFHk91Z0?q=Google&format=xml

Google description of the above query is shown below:

http://127.0.0.1:4664/ is the localhost address and Google Desktop port.
search&s=1ftR7c_hVZKYvuYS-RWnFHk91Z0 is the search command and a security token.
?q=Google is the query term(s) parameter.

If you want to search for more than one term, separate the terms with +s. For example, to search for both "Google" and "Desktop", use ?q=Google+Desktop
If you want to search for a specific phrase, separate the terms with +s and surround the phrase with %22s. For example, to search for the phrase "Google Desktop", use ?q=%22Google+Desktop%22
To search for the two phrases "Google Desktop" and "Copyright 2007", use ?q=%22Google+Desktop%22+%22Copyright+2007%22

&format=xml specifies that the HTTP response returns the search results in XML format, as described in the next section.

Query URL is kept in Windows registry:

HKEY_CURRENT_USER\Software\Google\Google Desktop\API\search_url

The algorithm consists of:

Getting Query URL from registry
Performing HttpWebRequest
Parsing received XML to valid format

An example of XML result is shown below:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<results count="24945">
<result>
  <category>web</category> 
  <!-- ... optional internal, 
	implementation-specific elements such as ids and flags... --> 
  <title>Developer Knowledge Base - Read the Google Desktop blog</title> 
  <url>http://code.google.com/support/bin/answer.py?answer=66869&topic=10434</url> 
  <time>128243290079530000</time> 
  <snippet>Desktop engineers regularly post development articles and 
	announce updates to th...</snippet> 
  <thumbnail>/thumbnail?id=6%5Fvwps3QA4FIYGAAAA&s=wgQCmjGl0VEzw3KVhm3mxBG_x48
  </thumbnail> 
  <icon>/icon?id=http%3A%2F%2Fcode%2Egoogle%2Ecom%2F&s=kKR1by-QXDMlb5vEhxkDZhCv3eE
  </icon> 
  <cache_url>http://127.0.0.1:4664/...</cache_url> 
</result>

...

</results>

In this example, I will use only title, URL, and category nodes. Here is the class example used to deserialize the XML:

namespace googleDesktop
{
    [XmlRoot("Added")]
    public class deserialize
    {
        [XmlArray("results")]
        [XmlArrayItem("result")]
        public Result[] Results;

        public deserialize()
        {
        }
    }

    public class Result
    {
        public Result()
        {
        }

        [XmlElement("url")]
        public string location;

        [XmlElement("category")]
        public string cat;

        [XmlElement("title")]
        public string title;
    }
}

As you can see, our XML <results></results> content should be enclosed in another node. That's why we will add the <Added></Added> nodes in the proper place. Now you are ready to analyze the code:

static void Main(string[] args)
        {
            // Search keywords are derived in command line arguments
            
            if (args.Length > 0)
            {                
                // Obtain Query URL from Windows registry
                RegistryKey currentUser = RegistryKey.OpenRemoteBaseKey
				(Microsoft.Win32.RegistryHive.CurrentUser, "");
                RegistryKey searchUrl = currentUser.OpenSubKey
				("Software\\Google\\Google Desktop\\API");
                object key = searchUrl.GetValue("search_url");
                
                string query = "";

                // Add search keywords to query URL according to scheme - 
                // keywords separated by '+'
                
                foreach (string var in args)
                {
                    query += var + "+";
                }

                query = query.Remove(query.Length - 1);

                // Last element of the query
                string connection = key.ToString() + query + "&format=xml";

                // Create Web Client
                WebClient wc = new WebClient();

                // Obtain XML as string
                string result = wc.DownloadString(connection);

                // Insert <Added> node before the <results>
                result = result.Insert(result.IndexOf("<results"), "<Added>");
                // Insert </Added> after the </results>
                result = result.Insert(result.Length - 1, "</Added>");

                // Prepare serializer...
                XmlSerializer ser = new XmlSerializer(typeof(deserialize));
                // ... and textreader
                System.IO.TextReader hehe = new System.IO.StringReader(result);

                // Create instance of 'deserialize' class
                deserialize myObj = new deserialize();

                // Deserialize
                myObj = (deserialize)ser.Deserialize(hehe);

                // Print the results
                print(myObj);
            }
        }

And the printing method (You can experiment without HtmlDecode method to observe the difference. Regex method is used to delete any HTML tags from string):

private static void print(deserialize myObj)
        {
            System.Text.RegularExpressions.Regex asd = 
		new System.Text.RegularExpressions.Regex("<[^>]*>");
            
            foreach (Result var in myObj.Results)
            {                
                Console.WriteLine
		(" --------------------------------------------------------- ");
                Console.WriteLine("Title: " + 
		System.Web.HttpUtility.HtmlDecode(asd.Replace(var.title, "")));
                Console.WriteLine("Category: " + var.cat);
                Console.WriteLine("Location: " + 
		System.Web.HttpUtility.HtmlDecode(var.location));
                Console.WriteLine("\n\n");
            }
        }

Summary

The purpose of this article was to show you only the idea of using Google Desktop engine. Google Desktop API has more features that I've omitted. The code is obviously not error proof. If someone will find a way to construct 'deserialize' class without adding additional nodes, please post the solution. I'd love to see that.

History

15^th September, 2007: Initial post

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here