Click here to Skip to main content
13,000,469 members (56,167 online)
Click here to Skip to main content
Add your own
alternative version


52 bookmarked
Posted 29 Aug 2006

Full-Text Indexing Files with Microsoft SQL Server

, 2 Feb 2007
Rate this:
Please Sign up or sign in to vote.
Indexing Word, Excel and other types of documents is easy with SQL Server


This article is about the full-text search capability of SQL Server 2000 (2005). It is an easy to use, very fast and extensible solution to index and search in various types of documents' content. For example in Word, Excel, Adobe portable document format (PDF) and HTML files.


To go through with this example, you will need Microsoft SQL Server 2000 Server (at least) access, a database with DB owner right, and of course the client tools.

Creating a Table

In order to index files stored in a database table, we have to create two table fields. In the first field, we will store the content of the document in binary format, in the second we will store the extension of the file, for example ".doc" or ".xls".
It's a good idea to store the full name and the size information of the files, because probably you'll need these in real situations, but you won't need them for full-text indexing.

Storing the sizes of the files can be optional, because you can query them with the DataLength function, but this could take a long time, much longer than reading these values from a field.

So our create table script can be this:

CREATE TABLE [dbo].[Doc] (
 [ID]  uniqueidentifier ROWGUIDCOL  NOT NULL ,
 [Extension] [varchar] (10) NOT NULL ,
 [Content] [image] NOT NULL ,
 [FileSize] [int] NOT NULL ,
 [FileName] [nvarchar] (500) NOT NULL ,
 [Stamp] [timestamp] NOT NULL 


ALTER TABLE [dbo].[Doc] ADD 

A simple idea is to use a unique identifier type as the primary key of the document table. It can be useful in many cases if you are a web developer. Maybe you would like to use this id in the URL (query string) when you make a download page, and you don't want to give the chance to your users to download all the documents without seeing all your advertisements, etc. by simply increasing a parameter at the end of the URL.

Please notice that a primary key has been created by the script. It is always very important because this can guarantee that you can't insert two records with the same id into the database (a table is a set of records, you can't make the difference between records with the same values, except with cursors), and a good clustered index can really improve the performance of the SELECT statements.

Also, there is a timestamp field – called stamp – on the table. It will be necessary for us and we will talk about it later.

Creating Full-text Index

I think nobody knows the proper format of the full-text index creation statement by head. So, we will create this index from Microsoft SQL Server Enterprise Manager.
First choose your server, then your database. Choose tables then on the right, search for the table you've created (Doc).
After right clicking on the table, choose "Define Full-Text indexing on a table" submenu from "Full-Text Index table" menu.

After these, a wizard will appear.
In the "Select an index step", choose your primary key. Later, when we query the table, we will get back this (unique) index's values from the full-text search engine as a result.

In the "Select Table Columns", step mark the "Content" column, and in that row in the document type column, choose the extension field. Then step away to another row, because the Next button will be activated only after you change the selection.

In this step, we gave the indexable column to the server and also defined the types of each document.

In the "Select a Catalog" step, you can create a new full-text catalog or choose an existing one. I offer that you make a new catalog for these data, because the collected indexes can be huge if the count of the documents increase.

In the "Select or Create Population Schedules", you can schedule the incremental and the full population of the catalog or the table. You can leave it empty in this case, because we will use the change tracking feature of the SQL Server. It means, we tell the SQL server to update its full-text catalog "immediately" after an insert or an update occurs.
In real situations, it is generally offered to make a full population every week or daily and an incremental daily or more often, but it depends on your server utilization.
The cause is that the change tracking and incremental update feature doesn't recognize if someone uses WriteText or UpdateText statements.

When we designed the table, we've put a timestamp field into it. The timestamp field is needed for incremental index updates, without that field all the incremental updates do full updates. The full-text index engine can find the updates on this property change. This is because the WriteText and UpdateText statements are not noticed by the full-text engine. These statements don't update the timestamp field like normal Insert and Update do.

So finish the wizard, by pressing the Next button few times and then Finish.

After the full-text index has been defined, we have to turn the change tracking functionality on. We can do it by right clicking on the table name in Enterprise Manager and then choose Change Tracking then the "Update index in background" from Full-Text Index Table menu. From this time, the full-text index engine will watch our table for changes and update its index if necessary.

An interesting result of our index creation:

if (select DATABASEPROPERTY(DB_NAME(), N'IsFullTextEnabled')) <> 1 
exec sp_fulltext_database N'enable' 

if not exists (select * from dbo.sysfulltextcatalogs where name = N'Doc')
exec sp_fulltext_catalog N'Doc', N'create' 

exec sp_fulltext_table N'[dbo].[Doc]', N'create', N'Doc', N'PK_Doc'

exec sp_fulltext_column N'[dbo].[Doc]', N'Content', N'add', 1033, N'Extension' 

exec sp_fulltext_table N'[dbo].[Doc]', N'activate'  


Now we insert some records and then we create a few select queries. Let's run this insert script in Query Analyzer:

INSERT INTO [DOC] ([Extension], [Content], [FileSize], [FileName]) 
    VALUES ('.txt', 'Hello John! It''s me: Garfield!', 30, 'Cartoon1.txt')

INSERT INTO [Doc] ([Extension], [Content], [FileSize], [FileName]) 
    VALUES ('.txt', 'Oh my god!', 30, 'Shout.txt')

INSERT INTO [DOC] ([Extension], [Content], [FileSize], [FileName]) 
    VALUES ('.txt', 'NOWAN's web site: <a href="%22'%22">'</a>, 30, 'nowan.txt')

These are simple inserts, you can examine them yourself. After you inserted the records, you can run a SELECT statement to check the table:


Full-Text Search

To create a full-text query, you have to get closer with the next statements:

The first two statements have two parameters. The first is the column name and the second is the searched string. These functions give back Boolean values.

The second two statements are more interesting. These functions return tables that have two columns: Key and Rank. It means that we can get back the unique id of the searched record or records and also we can get back the hit probability (Rank):

SELECT * FROM ContainsTable([doc], Content, '"nowan"')

Another good trait of containstable and freetexttable is that you can give difficult expressions as searched string. For example, you can use "OR" and "AND" logical terms:

Of course, the result table of these statements can be joined to real tables. So if we would like to get the original data row from the original table, we could use a select statement like this:

SELECT Doc.* FROM [Doc] 
    INNER JOIN ContainsTable([doc], Content, '"nowan"') AS FT 
    ON Doc.ID = FT.[Key]

Indexed File Types

The SQL Server can create indexes typically from text files and Microsoft Office files. This type list is extensible by iFilters designed for the old Indexing Service. You can get an iFilter from Adobe too for indexing PDF files.

Hungarian Version

You can find the Hungarian version of this article here.


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

You may also be interested in...

Comments and Discussions

QuestionDo you have an example on how to search on Word document? Pin
jdavidmtzc2-Aug-12 12:18
memberjdavidmtzc2-Aug-12 12:18 
QuestionHow to use the Transact-SQL functions CONTAINSTABLE and CONTAINS to do a search on a full-text enabled table Pin
elizas23-Mar-10 23:38
groupelizas23-Mar-10 23:38 
QuestionHow to use the Transact-SQL functions CONTAINSTABLE and CONTAINS to do a search on a full-text enabled table Pin
elizas23-Mar-10 23:35
groupelizas23-Mar-10 23:35 
GeneralFew questions Pin
Pratik.Patel9-Feb-09 14:44
memberPratik.Patel9-Feb-09 14:44 
GeneralRe: Few questions Pin
Boci5-Dec-11 10:09
memberBoci5-Dec-11 10:09 
Generalfull-text index tab is not active Pin
Member 220985825-May-08 22:33
memberMember 220985825-May-08 22:33 
AnswerRe: full-text index tab is not active Pin
boci26-May-08 9:36
memberboci26-May-08 9:36 
QuestionDatabase size .. is it an issue for full text catalog ?? Pin
reej26-Mar-08 0:30
memberreej26-Mar-08 0:30 
AnswerRe: Database size .. is it an issue for full text catalog ?? Pin
boci26-Mar-08 13:52
memberboci26-Mar-08 13:52 
QuestionCan you explain how to search on Word document? Pin
dadvir11-Jul-07 22:02
memberdadvir11-Jul-07 22:02 
AnswerRe: Can you explain how to search on Word document? Pin
boci11-Jul-07 23:12
memberboci11-Jul-07 23:12 
GeneralDotLucene is better Pin
Jan Seda3-Feb-07 3:09
memberJan Seda3-Feb-07 3:09 
GeneralRe: DotLucene is better Pin
dfgdfd24-Jun-07 6:59
memberdfgdfd24-Jun-07 6:59 
Questionwhy Full-Text search not support thai language? Pin
kookai29-Aug-06 21:12
memberkookai29-Aug-06 21:12 
AnswerRe: why Full-Text search not support thai language? [modified] Pin
boci30-Aug-06 5:04
memberboci30-Aug-06 5:04 
GeneralRe: why Full-Text search not support thai language? Pin
kookai30-Aug-06 16:46
memberkookai30-Aug-06 16:46 
AnswerRe: why Full-Text search not support thai language? Pin
volkan.ozcelik8-Sep-06 21:34
membervolkan.ozcelik8-Sep-06 21:34 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.170624.1 | Last Updated 3 Feb 2007
Article Copyright 2006 by István Kovács (HU)
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid