Click here to Skip to main content
13,258,269 members (42,975 online)
Click here to Skip to main content
Add your own
alternative version

Stats

14.7K views
55 downloads
1 bookmarked
Posted 1 Oct 2013

Using LEADTOOLS PDF File Features to Enhance Google Drive Search

, 1 Oct 2013
In the white paper that follows, we will show how to read and write the PDF keywords metadata, update the file on Google Drive, interface with your local Google Drive database, and do all of this within a single right-click context menu in Windows Explorer.

Editorial Note

This article is in the Product Showcase section for our sponsors at CodeProject. These articles are intended to provide you with information on products and services that we consider useful and of value to developers.

Introduction

Cloud services such as Google Drive continue to grow in popularity each year as a safe, secure and convenient way to store and back up your documents, images, music and other files. For users with a large amount of data in the cloud, searching and finding your files again can become problematic. Most search features are limited in their scope, and only take advantage of the file name or, for file formats such as PDF, the text within the file itself. Therefore some level of customization or enhancement may be necessary to take full advantage of your Google Drive cloud storage.

Searching for a PDF may be easier than searching for an MP3 or JPEG, but Google Drive has some limitations with the format as well. For example, let’s say you scan an invoice or bank statement and save it as a PDF. Even if you have a scanner or software that extracts the text with OCR, you still might not have a reliable way of searching for that document. The text would likely have words for the name of the company and date, but it might be lacking keywords that you find useful for archiving and finding the document later such as "bank", "insurance", "paid with PayPal" and so on.

This is exactly the kind of information you would want to include in the Keywords metadata of your PDF file when saving it, but Google Drive doesn’t use this metadata in its search index. Therefore you can use the LEADTOOLS PDF SDK to read and edit the file metadata, and then update the file’s

IndexableTextData
property in Google Drive. In the white paper that follows, we will show how to read and write the PDF keywords metadata, update the file on Google Drive, interface with your local Google Drive database, and do all of this within a single right-click context menu in Windows Explorer.

Creating the Right-Click Context Menu

When using a service such as Google Drive which comes with a desktop application to automatically sync files on your computer with your online cloud drive, a full blown application isn’t necessary. A more practical approach is to add a context menu item that appears when you right-click on a PDF file. After the command is added to the registry, you can right-click on any PDF file and select "Update File Keywords," which will pass the file name as an argument to the application.

using (RegistryKey pdfTypeRegKey =
Registry.ClassesRoot.OpenSubKey(".pdf"))
{
   // Create path to registry location
   string regPath = string.Format(@"{0}\shell\{1}",
      (String)pdfTypeRegKey.GetValue(null), "UpdateFileKeywords");
 
   // Add context menu to the registry
   using (RegistryKey key = Registry.ClassesRoot.CreateSubKey(regPath))
   {
      key.SetValue(null, "Update File Keywords");
   }
 
   // Add command that is invoked to the registry
   string menuCommand = string.Format("\"{0}\" \"%L\"", 
       Application.ExecutablePath);
   using (RegistryKey key = Registry.ClassesRoot.CreateSubKey(
       string.Format(@"{0}\command", regPath)))
   {
      key.SetValue(null, menuCommand);
   }
}

Using LEADTOOLS to Update PDF File Keywords Metadata

Now that the foundation for our application is laid, we must update the keywords within the PDF File. LEADTOOLS includes comprehensive PDF reading, writing and editing capabilities in a programmer-friendly SDK which allows for the direct modification of PDF file properties, searchable text, bookmarks and more. When our application loads from the right-click menu shell command, it will use the LEADTOOLS

PDFFile
object to retrieve the keywords and display them in the textbox for editing.

PDFFile _document = new PDFFile(fileName, password);
_document.Load();
_txtKeywords.Text = _document.DocumentProperties.Keywords;

Saving is just as simple, requiring only a few lines of code. As you can see, the document properties of the PDF are correctly updated with the new keywords.

_document.DocumentProperties.Keywords = _txtKeywords.Text;
_document.SetDocumentProperties(fileName);

Updating Google Drive

Finally, a few more steps must be taken in order to wrap up our enhancement to Google Drive’s PDF search. The keywords and other metadata properties within PDF files are useful and powerful features, but Google Drive does not use them within its search algorithm. However, each file in Google Drive has the

IndexableTextData
property which can be modified when using the Google Drive API.

When using the Google Drive desktop sync application for Windows, it uses a local SQL database to keep track of the local files and their online information. In order to complete this operation we must get the fileId that matches the local file we just updated. Depending on how your Google Drive folder is organized, you may need additional queries to recursively find the file within subfolders. However, once you acquire the

inode_number
that matches the PDF file name you passed through the right-click menu command, you can get the fileId from the database and call the Google Drive web service.

// Get resource_id for the target file (formatted
"type:resource_id")
sqLitecmd.CommandText = "SELECT resource_id FROM mapping
where inode_number='" + fileInodeNumber + "'";
 
reader = sqLitecmd.ExecuteReader();
reader.Read();
 
String fileResourceId =
reader["resource_id"].ToString().Split(':')[1];
reader.Close();
 
File file = googleDriveHelper.GetFile(fileResourceId);
file.IndexableText = new File.IndexableTextData();
file.IndexableText.Text = _document.DocumentProperties.Keywords;
googleDriveHelper.UpdateFileMetadata(file);

Now you can search your Google Drive for your custom PDF keywords, increasing the already incredible value of Google Drive’s free cloud storage service.

Download the Full PDF Example

You can download the fully functional demo which includes the features discussed above. To run this example you will need the following:

Support

Need help getting this sample up and going? Contact our support team for free technical support! For pricing or licensing questions, you can contact our sales team (sales@leadtools.com) or call us at 704-332-5532.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

LEADTOOLS Support
Help desk / Support LEAD Technologies, Inc.
United States United States
This member doesn't quite have enough reputation to be able to display their biography and homepage.
Group type: Organisation (No members)



You may also be interested in...

Pro
Pro

Comments and Discussions

 
-- There are no messages in this forum --
Permalink | Advertise | Privacy | Terms of Use | Mobile
Web03 | 2.8.171114.1 | Last Updated 1 Oct 2013
Article Copyright 2013 by LEADTOOLS Support
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid