Data Archive






3.60/5 (2 votes)
Jul 1, 2004
3 min read

51677

2197
This article describes creating indexed search systems using relational databases
Introduction
Before a time ago my CD collection was a little, but it quickly grew and became unmanaged. I have been spending much time to find some document. Therefore I decided to index the CD collection. First I tried several existing programs for organizing CD catalogs, but they haven't good search functionality and intuitive user interface. That is why I decided to create my own search program. This article describes an approach of creating indexing search systems.
Background
I recommend you to preview MS SQL Server 2000 Books Online for "FullText search overview" and "CONTAINSTABLE" articles.
Project overview
The application functionality can be divided on two groups: scanning the data source and document searching. The first one can be easily implemented using recursive directories' browsing. The second one requires some indexing mechanism, i.e. the search data should be indexed. It is possible to create an indexing mechanism manually [2], but I have decided use more less way, using integrated Microsoft SQL Server Full Text Search functionality.
Using the code
Classes
RegistryHelper |
The RegistryHelper class manages a Windows Registry key(s)/value(s)
and provides a convenient interface to set/get application settings. |
SqlHelper |
Implements the data access methods. |
FullTextQuery |
The FullTextQuery class transforms a user search query to Full-Text
query. |
DirectoryScaner |
Scans the file system hierarchy and gets directories and files descriptions. |
ScanForm |
Displays the scan form. |
SqlConnStringForm |
Parses and forms the Microsoft SQL Server connection string. |
SearchForm |
Provides the GUI for searching archive documents. |
MainForm |
Displays a main application form. |
Stored procedures
[sp_BuildIndexes] |
Starts incremental indexes build. |
[sp_CreateDirectory] |
Creates a directory record. |
[sp_CreateFile] |
Creates the file record. |
[sp_CreateRootDirectory] |
Creates a directory record. |
[sp_DeleteDirectory] |
Deletes the directory record. |
[sp_DeleteFile] |
Deletes the file record. |
[sp_GetDirectory] |
Gets the directory information. |
[sp_GetDirectoryFiles] |
Gets all files of specified directory. |
[sp_GetDirectoryPath] |
Get all sub-directories of the directory. |
[sp_GetFile] |
Gets the file information. |
[sp_GetFilePath] |
Get the full path to the file. |
[sp_GetRootDirectories] |
Gets all root directories. |
[sp_GetSubDirectories] |
Get all sub-directories of the directory. |
[sp_Search] |
Search for a file or directory. |
[sp_UpdateDirectory] |
Updates the directory record. |
[sp_UpdateFile] |
Updates the file record. |
User defined functions
[fn_GetDirectoryPath] |
Constructs and gets the full directory path. |
[fn_GetFilePath] |
Constructs and gets the full file path. |
[sp_Search]
-- Search for a file or directory.
CREATE PROCEDURE [dbo].[sp_Search]
@Text NVARCHAR(256),
@Sort INT = 1, -- Rank
@Created DATETIME = NULL,
@Modified DATETIME = NULL
AS
--
DECLARE @DirectoryCategory INT
SET @DirectoryCategory = 1
DECLARE @FileCategory INT
SET @FileCategory = 2
--
DECLARE @results TABLE
(
CategoryID INT NOT NULL,
ResultID INT NOT NULL,
Result NVARCHAR(260) NOT NULL,
Path NVARCHAR(2048) NOT NULL,
Rank INT NOT NULL DEFAULT 0
)
-- Search in directories.
INSERT
@results
SELECT
@DirectoryCategory AS CategoryID,
DataTbl.DirectoryID AS ResultID,
DataTbl.Name AS Result,
[dbo].[fn_GetDirectoryPath] (DataTbl.DirectoryID) AS Path,
KeyTbl.[RANK] AS Rank
FROM
[dbo].[Directories] AS DataTbl
JOIN
CONTAINSTABLE([dbo].[Directories], *, @Text) AS KeyTbl
ON
DataTbl.DirectoryID = KeyTbl.[KEY]
-- Search in files.
INSERT
@results
SELECT
@FileCategory AS CategoryID,
DataTbl.FileID AS ResultID,
DataTbl.Name AS Result,
[dbo].[fn_GetFilePath] (DataTbl.FileID) AS Path,
KeyTbl.[RANK] AS Rank
FROM
[dbo].[Files] AS DataTbl
JOIN
CONTAINSTABLE([dbo].[Files], *, @Text) AS KeyTbl
ON
DataTbl.FileID = KeyTbl.[KEY]
IF (@Sort = 1) -- Relevance
BEGIN
SELECT
CategoryID, ResultID, Result, Path, Rank
FROM
@results
ORDER BY
Rank DESC
END
ELSE IF (@Sort = 2) -- Category
BEGIN
SELECT
CategoryID, ResultID, Result, Path, Rank
FROM
@results
ORDER BY
CategoryID ASC
END
ELSE IF (@Sort = 3) -- Name
BEGIN
SELECT
CategoryID, ResultID, Result, Path, Rank
FROM
@results
ORDER BY
Result ASC
END
ELSE -- undefined sort parameter - does not sort results
BEGIN
SELECT
CategoryID, ResultID, Result, Path, Rank
FROM
@results
END
GO
I think that this stored procedure should be explained more detailed. The [dbo].[sp_Search]
stored procedure performs full-text search in directories ([dbo].[Directories]
)
and files ([dbo].[Files]
) and combines found results. The found
results could be sorted by relevance, category and name as well. A found result
consists of CategoryID
, ResultID
, Result
, Path
and
Rank
values. The CategoryID
is used to determine type of found result: directory or file. The ResultID
represents the
founded result identifier. The Result
is exactly the founded string. The Path
is the recursively built the founded resource path.
Note: The @Text input parameter data should be in full-text search
format. I created a special class FullTextQuery
, which transforms
a user query to full-text search query.
How to use the application
Download and install the project. Download and execute the DataArchive.sql script. Start the application from the Start menu or Desktop shortcut. If the application could not connect to the database, it will prompt the database connection dialog. Configure the database connection.
History
- [07.05.2004] - Data Archive 1.0