Click here to Skip to main content
Click here to Skip to main content

Fulltext Search for Firebird SQL

By , 9 Mar 2005
 

Introduction

In this article, we will talk about searching the data in a Firebird database using DotLucene full-text search engine. We will focus on storing the index directly in the database (the source code of the solution is attached).

Firebird SQL

Firebird SQL (excellent embedded database with a good .NET support) has no built-in fulltext search support so far. Instead, you need to rely on third party tools. Fortunately, there is a great search engine library available: DotLucene. It is an open-source .NET library (ported from Java) that can index any data (structured or unstructured) that you are able to convert to raw text. Using an additional library for fulltext doesn't look too elegant at first sight. However, it has some advantages. Let's compare it quickly with MySQL integrated fulltext search:

MySQL Fulltext Search

MySQL fulltext search has these drawbacks (compared to DotLucene):

  • You can use it only in MyISAM tables (i.e. no transactions)
  • You can't browse the index (see Luke)
  • You need to store transformed text in the DB (i.e. for indexing HTML, you need to store another copy of the text with stripped HTML tags)
  • It doesn't support highlighting of the query words in the result
  • You will hardly modify the sources to do custom changes
  • The license doesn't allow to use it in commercial application for free
  • It is reported to be slow on large data sets

How to Index the Data?

For basics about using DotLucene to index your data, I recommend reading:

The following applies for indexing the database:

  • You are using a different source of data (obviously ;-). Instead of reading from the disk, you need to load it from the database.
  • When indexing texts, you don't need to it in the index in full, just keep them in the database.
  • Create an additional Field that will contain the primary key of the indexed document (so you can later load it from the database).
  • When indexing HTML, you need to strip the HTML tags (you need to supply raw text to DotLucene).

Where to Store the Index

On a server, it's no problem to store the index in a separate directory (you can also load it to RAM to make your searches super fast - if you have enough RAM, of course). In a desktop application, it might be useful to store the index in a Firebird database.

DotLucene supports a mechanism for adding custom index storages. All storage types (file system and RAM are built-in) are implemented as a class derived from Lucene.Net.Store.Directory abstract class. I have created a Directory implementation that stores the index directly in a Firebird SQL database.

FbDirectory class

All index reading/writing operations in DotLucene are done using a Directory class. The new FbDirectory class is based on FSDirectory. The filesystem operations are replaced with database operations. Here you can see what we need to implement:

using System;

namespace Lucene.Net.Store
{
    public abstract class Directory 
    {
        public abstract String[] List();
        public abstract bool FileExists(String name);
        public abstract long FileModified(String name);
        public abstract void TouchFile(String name);
        public abstract void DeleteFile(String name);
        public abstract void RenameFile(String from, String to);
        public abstract long FileLength(String name);
        public abstract OutputStream CreateFile(String name);
        public abstract InputStream OpenFile(String name);
        public abstract Lock MakeLock(String name);
        public abstract void Close();
    }
}

Performance Tips

  1. If performance is your main concern, use the standard FSDirectory instead to store the index on disk. My tests show that database storage is twice slower than filesystem. Use the database storage only when you have no other choice.
  2. Use compound index format (IndexWriter.SetUseCompoundFile(true);). This is default in DotLucene 1.4 but in 1.3 you have to do it manually.
  3. Create the index in memory, optimize, then save it on disk using FbDirectory.Copy(); This will only help you if you are rebuilding the whole index from scratch.
  4. If you are adding a document to the index from a desktop application, do it in background (in a separate thread). You are still able to search while you are adding a new document.

Useful Resources

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

Dan Letecky
Czech Republic Czech Republic
Member
My open-source AJAX controls:
 
DayPilot
DayPilot MVC
DayPilot Java
Outlook-Like Calendar/Scheduling Controls

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralLucene.Net - 2.0memberalisson_abreu3 Nov '08 - 7:17 
When I try to build, I got the following error messages:
 
Error 1 The type or namespace name 'InputStream' could not be found (are you missing a using directive or an assembly reference?) ... Lucene.Net-2.0.0
 
Error 2 The type or namespace name 'OutputStream' could not be found (are you missing a using directive or an assembly reference?) ... Lucene.Net-2.0.0
 
Error 3 The type or namespace name 'OutputStream' could not be found (are you missing a using directive or an assembly reference? ... Lucene.Net-2.0.0
 
Error 4 The type or namespace name 'InputStream' could not be found (are you missing a using directive or an assembly reference? ... Lucene.Net-2.0.0
 
Could anyone help me?
QuestionThe code is released for businesses too?memberAdriano Alves de Lima23 Oct '08 - 7:36 
Guys, I wonder if this code is released for commercial purposes, in the end the article author does not explain whether you can use in business?
 
Grateful now ...
Questionnew IndexWriter syntax ?membervaclav13543 Aug '06 - 3:02 
Hi!
 
I´m trying to run LUCENE with FireBird, and have small problem.
 
IndexWriter writer = new IndexWriter(???, new StandardAnalyzer(), true);
 
What should be instead of ??? if the IndexWriter should use Firebird (FbDirectory) as a storage place. I didnt find any syntax example. Everywhere there is a sample with file storage.
 
Thanx for posting small piece of code.
 
Vaclav
QuestionHow to index data stored in a database using a fulltext index?memberMichael Freidgeim18 Jul '06 - 14:54 
I didn't found in the article, how to index data stored in a database using a fulltext index?
The article and download describes how to use Firebird database to store index (in LCN_FLS and LCN_BLCK tables).
But if I have an existing database (e.g MS Northdwind) and want to full-text index tables (e.g Customers CompanyName and ContactName columns,text columns in Orders and Products tables).
How can I submit these text columns for Lucene.Net indexing?
 
Michael Freidgeim.
Blog: http://geekswithblogs.net/mnf/
AnswerRe: How to index data stored in a database using a fulltext index?memberMichael Freidgeim8 Aug '06 - 21:20 
I've found the article that answered my question Indexing Database Content with dotLucene [^]
 
Michael Freidgeim.
Blog: http://geekswithblogs.net/mnf/

GeneralProblem .NET 2.0 Lucene 1.4.3 [modified]memberJonas Ljunggren26 May '06 - 11:04 
Anyone who has solved the problems that shows then the source is compiled?
 

The name 'Date' does not exist in the current context
[return Date.GetTime(Convert.ToDateTime(cmd.ExecuteScalar()));]
 

'Lucene.Net.Store.InputStream' does not contain a definition for 'BUFFER_SIZE'
[int bufferNumber = start / InputStream.BUFFER_SIZE;]
 

 
-- modified at 17:05 Friday 26th May, 2006
GeneralRe: Problem .NET 2.0 Lucene 1.4.3memberArkar Myo30 Oct '06 - 1:11 
Is it possible?
 
return Convert.ToDateTime(cmd.ExecuteScalar()).Ticks;
(or)
return DateTime.Parse(Convert.ToString(cmd.ExecuteScalar())).Ticks;Smile | :)

 
i am arkar myo.

GeneralNice Articlemembercomputerguru923821 Mar '06 - 14:27 

Very nice article done for a cool database. Keep up the good work Smile | :)
 
PJC
GeneralIncrease performancememberSamuel Chen22 Mar '05 - 20:41 
hi Dan Letecky :
 
Nice to meet you again.
 
I have more than million records indexed. They are stored as the following structure:
id : 0 ~ 1000000 (keyword)
company_id : 0 ~ 1000000 (keyword)
content : Title(some text) ||(splited by double '|') product description || company description || ... (UnStored)
full_content : (same as content, Text, just for test)
 
For example, It takes more than 10 seconds when It's searching with "+48100 +milk".
 
How can I increase the performance ?
 
Should I store all splited parts as keyword fields ?
Or is there any other way ?
 
thx!
ps. I am using file system to store the index.
 

 

Best regards
Samuel
www.alphatom.com


GeneralRe: Increase performancememberSamuel Chen22 Mar '05 - 22:39 
funny result Smile | :)
 
I have downloaded luke which is recommended on your dotlucene.net site.
And then searched with the same keywords. The result surprised me!
It just took 31ms (and the computer was still indexing at the same time).
 
In this search i selected org.apache.lucene.analysis.KeywordAnalyzer as "Analyzer to use to query parsing" and org.apache.lucene.analysis.cn.ChineseAnalyzer as "SnowballAnalyzer".
The "default field" was also set as company_id.(blank will get nothing)
 
Does it mean that in my case I should take SnowballAnalyzer and Default fields to increase the performance?
 

Best regards
Samuel
www.alphatom.com


GeneralSome questions...memberPaul Selormey9 Mar '05 - 13:35 
First I really appreciate the great information you are providing with these articles.
 
Just some few questions:
  • UDT: Any way to define/implement user data types in .NET?
  • .NET Runtime hosting: Any plan to implement a .NET runtime hosting similar to the new SQL Server Express 2005?
  • .NET Assembly: Any plan to compile the Firebird with MC++ to create a .NET assembly that could be used in on-touch deployments?
 
Best regards,
Paul.
 
Jesus Christ is LOVE! Please tell somebody.
GeneralRe: Some questions...memberDan Letecky11 Mar '05 - 5:40 
1) Yes, user defined data types are supported (see "domains").
2) Not supported.
3) Not supported. Anyway, it doesn't matter. The embedded Firebird DLL doesn't have to be registered, just copy it with your application.

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web03 | 2.6.130523.1 | Last Updated 9 Mar 2005
Article Copyright 2005 by Dan Letecky
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid