Click here to Skip to main content
Click here to Skip to main content
Go to top

Microsoft Indexing Service How-To

, 9 Jul 2007
Rate this:
Please Sign up or sign in to vote.
This article describes how to provide full text search using Microsoft Indexing Service in .NET applications.

Introduction

A lot of websites provide search capabilities, where you can simply type several words, press a "Search" button, and you'll receive a list of pages which contain these words. It's simple. But how can you implement these features in your own web application? Yes, you have to use an indexing service which will index your files or web pages. After that, you can use full text search features.

There are a lot of solutions which allow you to provide this functionality in your application. One of them is Microsoft Indexing Service. It's part of Windows 2000 and later Windows versions. So, if you only provide Windows solutions (ASP.NET web applications, Windows Forms applications, etc.), you have to take a look at this Microsoft product.

One of the biggest advantages of Indexing Service is that it's totally free. You can use it without any restrictions or additional licenses. I think that this is so big, because other indexing products cost a lot of money. If you are developing a small or medium sized applications, you don't want to pay thousands of dollars for a full text search tool.

If you choose to use the Indexing Service, you should remember that it can only index file systems. For example, you can't use it for indexing files stored in your database. This is a big minus of the Microsoft Indexing Service, but I believe that you can easily solve this limitation.

In this article, I'll try to describe how to install, configure, and use the Microsoft Indexing Service. We'll develop a simple application which will allow us to use full text search features for web pages located on our local file system.

Installing and configuring the Microsoft Indexing Service

If you are using Windows XP or later, you'll be using Microsoft Indexing Service 3.0. And, if you're still using Windows 2000, you'll be using Microsoft Indexing Service 2.0. This service is installed to your machine, by default. But, you could disable its installation when installing the Operating System. You have to specify that Indexing Service be installed on your machine. To do this, you go to "Add or Remove Programs" in your Control Panel. Choose "Add/Remove Windows Components" there. You have to check that "Indexing Service" is installed. If it isn't installed, install it.

Microsoft Indexing Service Installation

Now, Microsoft Indexing Service has been installed, and you can configure it. Open the "Computer Management" configuration tool. Choose "Services and Applications", "Indexing Service". In this entry, you can manage your Microsoft Indexing Service.

First of all, you should create a new catalog in Indexing Service for the folder which will contain the indexes. Open the context menu for "Indexing Service" and choose "Catalog" in the "New" submenu. Type "Name", choose "Location", and press "OK".

New Catalog Creation

After that, you have to add the folders which will be indexed. For this, choose the "Directories" entry, open its context menu, and choose "Directory" from the "New" submenu. Choose the folder with your documents in the opened dialog box, and press "OK" to include the selected directory to the index. If you decide to exclude the folder from the existing index, please choose "No" for the "Include in Index?" parameter in this dialog window. This parameter is "Yes", by default.

New Directory Creation

If your Indexing Service is started, it will index the new catalog. Otherwise, you should start Indexing Service and it will index the catalog automatically. You can create or recreate an index folder manually. To do this, you should open the context menu for the specified folder in the existing catalog and choose "Rescan (Full)" or "Rescan (Incremental)" in the "All Tasks" submenu. Of course, your Microsoft Indexing Service has to be started at this time.

If you choose the "Indexing Service" entry in your "Computer Management", you will see the state of the Indexing Service. Sometimes, this information can help you if you have a big storage and can't find the file there.

There is another important setting for Indexing Service – "Indexing Service Usage". This setting allows you to tell Indexing Service how often it should update the indexes. For example, if your application only uses static storage, the service need not update the index so often because if you use dynamic data storage, your data is updated very often. To configure this parameter, you should open the context menu for the "Indexing Service" entry and choose "Tune Performance" in the "All Tasks" submenu.

Indexing Service Usage Configuration

Now, you can check the index. To do this, choose "Query the Catalog" in your catalog. You'll see a form which allows you to search something in your index. First of all, you can test a simple full text search. Enter something in the query field and press the "Search" button. Now, you will be able to see the files which contain the entered words. Of course, you can execute more difficult queries using this tool. Choose "Advanced query" if you want to execute some complex queries. You can use Microsoft Indexing Service queries to get the required information. This query language is the same as SQL, but it contains some syntax extensions.

Query Microsoft Indexing Service

You can use SQL to query Microsoft Indexing Service. But, there are several extensions for Indexing Service's SQL dialect which you have to know about.

The most useful command, when you use the Microsoft Indexing Service, is the SELECT command. It's clear, because you shouldn't add, delete, or update information in your indexes. You use Select to query the Indexing Service to retrieve some information about indexed files. Let's see an example query:

SELECT Path FROM SCOPE() WHERE FREETEXT(Contents, 'Hello World')

This query returns you all paths to files which contain the "Hello World" text. And, it can help me describe to you Microsoft Indexing Service's SQL extensions.

First of all, let's look at the FROM expression. In this example, we query all the data which the index contains. The SCOPE() function allows you to tell the Indexing Service which data you have decided to examine. By default, if you don't use any parameters, it examines all the data in your index. This function can optimize your queries, because it can limit the indexes for search. For example, you can use SCOPE ('"/books"'). Here, you will query only the "/books" folder, not all the folders in your index. The query execution speed will be more than if you would use a simple SCOPE() function. For more search limitations, you can use special traversal types. For example, SCOPE ('DEEP TRAVERSAL OF "/books"'). If you use this expression, Indexing Service will search in the "/books" directory and in all the directories beneath it. If you use SHALLOW TRAVERSAL, Microsoft Indexing Service will examine only the "/books" directory. For example, SCOPE('SHALLOW TRAVERSAL OF "/books"').

The WHERE expression is the same as in SQL, but there are few extensions for it too. There are Comparison Predicates. You can see them in this table:

Operator Symbol Example
Equals = WHERE DocAuthor = 'John Doe'
Not equals != or <> WHERE DocTitle != 'Finance'
Less than < WHERE WordCount < 1000
Greater than > WHERE WordCount > 500
Less than or equal to <= WHERE WordCount <= 500
Greater than or equal to >= WHERE WordCount >= 500

You also can use Boolean operators which are evaluated using the following rules:

  • NOT is evaluated before AND. NOT can only occur after AND (as in AND NOT; the combination OR NOT is not allowed).
  • AND is evaluated before OR.
  • AND expressions are associative, and can be applied in any order. For example, A AND B AND C, is the same as (A AND B) AND C, which is the same as A AND (B AND C) .
  • OR expressions are associative, and can be applied in any order.

There is a LIKE predicate too. But, there are several predicates which extend the SQL language:

  • ARRAY. This predicate performs comparisons of two arrays using logical operators. For example, ... WHERE username = SOME ARRAY ['Admin' , 'root']. This example returns you files which contain the username parameter as 'Admin' or 'root'.
  • CONTAINS. This predicate is used for full text search. For example, …WHERE CONTAINS(country,'"USA" OR "Russia"'). This example returns files which contains a country property which is "USA" or "Russia".
  • FREETEXT. This predicate allows you to find words and phrases in indexed files. It's better to use it if you need to find anything in the contents of your files. For example, …WHERE FREETEXT(Contents,'Hello World !!!').
  • MATHCES. This predicate performs queries using a Regular-Expression pattern. It's more powerful than the LIKE predicate. For example, … WHERE MATCHES (Contents, '|(USA|)|{1|}' ). This example matches any string in which exactly one instance of the pattern "BUSA" occurs.

For additional information, you have to go to the Indexing Service articles on the MSDN website.

Now you know how to prepare queries for the Microsoft Indexing Service, but you still need to take a list of properties which can be used in your queries. There are a lot of default properties for each index, which you can find in the following table.

Friendly Name Data type Property
A_HRef DBTYPE_WSTR | DBTYPE_BYREF Text of HTML HREF. This property name was created for the Microsoft® Site Server, and corresponds with the Indexing Service property name HtmlHRef. Can be queried, but not retrieved.
Access VT_FILETIME Last time a file was accessed.
All (not applicable) Searches every property for a string. Can be queried, but not retrieved.
AllocSize DBTYPE_I8 Size of disk allocation for a file.
Attrib DBTYPE_UI4 File attributes. Documented in the Win32 SDK.
ClassId DBTYPE_GUID Class ID of an object, for example, WordPerfect, Word, and so on.
Characterization DBTYPE_WSTR | DBTYPE_BYREF Characterization, or abstract, of a document. Computed by Indexing Service.
Contents (not applicable) Main contents of the file. Can be queried, but not retrieved.
Create VT_FILETIME The time the file was created.
Directory DBTYPE_WSTR | DBTYPE_BYREF The physical path to the file, not including the file name.
DocAppName DBTYPE_WSTR | DBTYPE_BYREF Name of the application that created the file.
DocAuthor DBTYPE_WSTR | DBTYPE_BYREF Author of the document.
DocByteCount DBTYPE_14 Number of bytes in a document.
DocCategory DBTYPE_STR | DBTYPE_BYREF Type of a document such as a memo, schedule, or whitepaper.
DocCharCount DBTYPE_I4 Number of characters in a document.
DocComments DBTYPE_WSTR | DBTYPE_BYREF Comments about the document.
DocCompany DBTYPE_STR | DBTYPE_BYREF Name of the company for which the document was written.
DocCreatedTm VT_FILETIME The time the document was created.
DocEditTime VT_FILETIME Total time spent editing the document.
DocHiddenCount DBTYPE_14 Number of hidden slides in a Microsoft® PowerPoint document.
DocKeywords DBTYPE_WSTR | DBTYPE_BYREF Document keywords.
DocLastAuthor DBTYPE_WSTR | DBTYPE_BYREF Most recent user who edited the document.
DocLastPrinted VT_FILETIME The time the document was last printed.
DocLastSavedTm VT_FILETIME The time the document was last saved.
DocLineCount DBTYPE_14 Number of lines contained in a document.
DocManager DBTYPE_STR | DBTYPE_BYREF Name of the manager of the document's author.
DocNoteCount DBTYPE_14 Number of pages with notes in a PowerPoint document.
DocPageCount DBTYPE_I4 Number of pages in a document.
DocParaCount DBTYPE_14 Number of paragraphs in a document.
DocPartTitles DBTYPE_STR | DBTYPE_VECTOR Names of document parts. For example, in Excel, part titles are the names of spread sheets; in PowerPoint, slide titles, and in Word for Windows, the names of the documents in the master document.
DocPresentationTarget DBTYPE_STR | DBTYPE_BYREF Target format (35mm, printer, video, and so on) for a presentation in PowerPoint.
DocRevNumber DBTYPE_WSTR | DBTYPE_BYREF Current version number of the document.
DocSlideCount DBTYPE_14 Number of slides in a PowerPoint document.
DocSubject DBTYPE_WSTR | DBTYPE_BYREF Subject of the document.
DocTemplate DBTYPE_WSTR | DBTYPE_BYREF Name of template for a document.
DocTitle DBTYPE_WSTR | DBTYPE_BYREF Title of the document.
DocWordCount DBTYPE_I4 Number of words in the document.
FileIndex DBTYPE_I8 Unique ID of the file.
FileName DBTYPE_WSTR | DBTYPE_BYREF Name of the file.
HitCount DBTYPE_I4 Number of hits (words matching a query) in the file.
HtmlHRef DBTYPE_WSTR | DBTYPE_BYREF Text of HTML HREF. Can be queried, but not retrieved.
HtmlHeading1 DBTYPE_WSTR | DBTYPE_BYREF Text of HTML document in style H1. Can be queried, but not retrieved.
HtmlHeading2 DBTYPE_WSTR | DBTYPE_BYREF Text of HTML document in style H2. Can be queried, but not retrieved.
HtmlHeading3 DBTYPE_WSTR | DBTYPE_BYREF Text of HTML document in style H3. Can be queried, but not retrieved.
HtmlHeading4 DBTYPE_WSTR | DBTYPE_BYREF Text of HTML document in style H4. Can be queried, but not retrieved.
HtmlHeading5 DBTYPE_WSTR | DBTYPE_BYREF Text of HTML document in style H5. Can be queried, but not retrieved.
HtmlHeading6 DBTYPE_WSTR | DBTYPE_BYREF Text of HTML document in style H6. Can be queried, but not retrieved.
Img_Alt DBTYPE_WSTR | DBTYPE_BYREF Alternate text for <IMG> tags. Can be queried, but not retrieved.
Path DBTYPE_WSTR | DBTYPE_BYREF Full physical path to a file, including file name.
Rank DBTYPE_I4 Rank of row. Ranges from 0 to 1000. Larger numbers indicate better matches.
RankVector DBTYPE_I4 | DBTYPE_VECTOR Ranks of individual components of a vector query.
ShortFileName DBTYPE_WSTR | DBTYPE_BYREF Short (8.3) file name.
Size DBTYPE_I8 Size of file, in bytes.
USN DBTYPE_I8 Update Sequence Number. NTFS drives only.
VPath DBTYPE_WSTR | DBTYPE_BYREF Full virtual path to a file, including the file name. If more than one possible path, then the best match for the specific query is chosen.
WorkId DBTYPE_I4 Internal ID for a file. Used within Indexing Service.
Write VT_FILETIME Last time the file was written.

As you can see, there are a lot of indexed properties for each file, but sometimes, you want to extend this list.

How to add new properties for an indexed file

First of all, this feature works only for web pages, because it is based on the HTML <meta> tag.

Let's say, you have several indexed web pages and you want to add several special properties for them. For example, if you want to add "country" and "city" properties, you should add <meta> tags to all files which will contain these new properties:

<meta name="country" content="Russia" />
<meta name="city" content="Moscow" />

After these changes, you have to restart Indexing Service. Now, you can open the entry "Properties" and see that Microsoft Indexing Service knows about your special parameters for files. But still, you can't use these new parameters in your queries.

Select the "Properties" node of your catalog and choose the property which you added to the files using the <meta> tag. Double click on the property, switch on the "Cached" checkbox, and choose the data type for the new property from the opened dialog box.

Microsoft Indexing Service Installation

After that, you should create a Column Definition File which contains information about your newly added parameters. The File could have an ".idq" extension, but this isn't important. A Column Definition File uses the following format:

[Names]
Propertyname( Data type ) = GUID ["Name" | Property ID]

The data type parameter is optional. If you don't define it, Microsoft Indexing Service will take the data type from the parameters definition for your catalog.

For my example, it contains this:

[Names]
country = d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1 "country"
city = d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1 "city"

All these data can be taken from the dialog box for the properties configuration.

After the Columns Definition File is created, information about this file has to be added to the Indexing Service Registry settings. Add a string entry named "DefaultColumnFile" to the Registry key "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\ContentIndexCommon". "DefaultColumnFile" should contain the full path to your Columns Definition File.

Restart Microsoft Indexing Service. After that, run a full rescan of your indexed folder. Now, you will be able to use the new parameters in your queries.

Using Microsoft Indexing Service in WinForms applications

Microsoft Indexing Service exposes itself to the developer as an OLE DB provider. Its name is MSIDXS. You can use ADO.NET for querying your Indexing Service. To do this, you have to create a new System.Data.OleDb.OleDbConnection object using this sample connection string:

Provider= "MSIDXS";Data Source="Documents"

In the Data Source parameter, you should use the name of your catalog in Indexing Service.

Let's create a sample code which will query Indexing Service for a few words from the file contents. In this sample, there is a queryString variable. It is an instance of the SearchParameters structure. This structure contains information about the data source and the query string. Here is the definition of this structure:

struct SearchParameters
{
    private string storage;

    public string Storage
    {
        get { return storage; }
        set { storage = value; }
    }

    private string query;

    public string Query
    {
        get { return query; }
        set { query = value; }
    }
}

First of all, you create a new OleDbConnection object:

string connectionString = 
  string.Format("Provider= \"MSIDXS\";Data Source=\"{0}\";", 
  queryString.Storage);
OleDbConnection connection = new OleDbConnection(connectionString);

After that, you have to create a new OleDbCommand associated with this connection:

string query = string.Format(@"SELECT Path FROM scope() " + 
               @"WHERE FREETEXT(Contents, '{0}')", queryString.Query);
OleDbCommand command = new OleDbCommand(query, connection);

Note that the MSIDXS provider doesn't support commands with parameters. This is bad. I hope that Microsoft will fix this issue in the next version of the Microsoft Indexing Service.

You are now able to execute this command and retrieve a list of files which contain the selected text:

connection.Open();

ArrayList result = new ArrayList();

OleDbDataReader reader = command.ExecuteReader();
while (reader.Read())
{
    result.Add(reader.GetString(0));
}

connection.Close();

In this code, checking the returned value for NULL is not necessary, because Indexing Service always returns a path to a found file.

Summary

Microsoft Indexing Service is a totally free and powerful product which is included with Windows 2000 or later versions. It's very simple to use. You can easily create indexes. You can also query these indexes using an OLEDB data provider. If you are working with Microsoft .NET, it is really easy to use. In this article, I have tried to describe how to install, configure, and query the Microsoft Indexing Service. I also recommend you see my example, which I have attached to this article. This example will show you how to use the full text search features. I hope that this article will help you to start using Indexing Service effectively.

When I prepared this article, I used these materials:

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Ilya Verbitskiy
Software Developer (Senior)
Czech Republic Czech Republic
No Biography provided

Comments and Discussions

 
QuestionIndex Server for DICOM files Pinmemberbizzare198828-Aug-11 18:18 
QuestionI can't find the registry in Windows 7 [modified] Pinmemberjuanito197521-Sep-10 23:29 
QuestionCreating the idq file? Pinmemberanilmaddala23-Apr-10 2:39 
QuestionGet the contents of an indexed file? Pinmembersurajm0073-Nov-09 3:12 
GeneralThanks PinmemberAbsalonT21-Sep-09 15:56 
QuestionHow to filter file types Pinmemberadnan15216-Feb-09 2:42 
AnswerRe: How to filter file types Pinmembercawoodm13-Aug-09 21:09 
GeneralRe: How to filter file types Pinmembersamqty2-Jun-11 6:23 
QuestionMATCHES predicate example returning empty resultset PinmemberSaraza15-Sep-08 11:35 
AnswerRe: MATCHES predicate example returning empty resultset Pinmembercawoodm13-Aug-09 21:18 
QuestionHow it searches in EML files ? PinmemberBobb198226-Jun-08 23:05 
QuestionRetrieve the index PinmemberBayu Tenoyo15-Nov-07 14:57 
AnswerRe: Retrieve the index PinmemberIlya Verbitskiy15-Nov-07 20:32 
QuestionHow to get the page description (characterization) from the body tag of the document? Pinmemberbhavna8164-Nov-07 22:31 
AnswerRe: How to get the page description (characterization) from the body tag of the document? PinmemberIlya Verbitskiy5-Nov-07 1:11 
GeneralError in where clause PinmemberSanjay Patnaik16-Aug-07 22:58 
GeneralQuestion Pinmemberpjd100117-Jul-07 16:49 
GeneralRe: Question PinmemberEdward Steward18-Jul-07 2:02 
GeneralRe: Question Pinmemberpjd100119-Jul-07 17:02 
GeneralRe: Question PinmemberEdward Steward21-Sep-07 20:09 
GeneralRe: Question Pinmemberpjd100126-Sep-07 15:06 
GeneralActually, you can index database PinmemberJAlexandrian11-Jul-07 4:51 
GeneralVista PinmemberUnRusoDeCaracas10-Jul-07 10:51 
GeneralRe: Vista PinmemberIlya Verbitskiy10-Jul-07 20:47 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web04 | 2.8.140921.1 | Last Updated 10 Jul 2007
Article Copyright 2007 by Ilya Verbitskiy
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid