Skip to main content
Email Password   helpLost your password?

What is Lucene.Net?

Lucene.Net is a high performance Information Retrieval (IR) library, also known as a search engine library. Lucene.Net contains powerful APIs for creating full text indexes and implementing advanced and precise search technologies into your programs. Some people may confuse Lucene.net with a ready to use application like a web search/crawler, or a file search application, but Lucene.Net is not such an application, it's a framework library. Lucene.Net provides a framework for implementing these difficult technologies yourself. Lucene.Net makes no discriminations on what you can index and search, which gives you a lot more power compared to other full text indexing/searching implications; you can index anything that can be represented as text. There are also ways to get Lucene.Net to index HTML, Office documents, PDF files, and much more.

Lucene.Net is an API per API port of the original Lucene project, which is written in Javal even the unit tests were ported to guarantee the quality. Also, Lucene.Net index is fully compatible with the Lucene index, and both libraries can be used on the same index together with no problems. A number of products have used Lucene and Lucene.Net to build their searches; some well known websites include Wikipedia, CNET, Monster.com, Mayo Clinic, FedEx, and many more. But, it’s not just web sites that have used Lucene; there is also a product that has used Lucene.Net, called Lookout, which is a search tool for Microsoft Outlook that just brought Outlook’s integrated search to look painfully slow and inaccurate.

Lucene.Net is currently undergoing incubation at the Apache Software Foundation. Its source code is held in a subversion repository and can be found here. If you need help downloading the source, you can use the free TortoiseSVN, or RapidSVN. The Lucene.Net project always welcomes new contributors. And, remember, there are many ways to contribute to an open source project other than writing code.

Creating a search solution

There are roughly two main parts to a search solution. Indexing the content you wish to search, and actually searching the content. And, it is pretty much as simple as that. After we have an index, we will perform a search.

What you need to create an index

Let’s see an example of what it takes to create an index and to populate it.

//state the file location of the index
string indexFileLocation = @"C:\Index"; 
Lucene.Net.Store.Directory dir =
    Lucene.Net.Store.FSDirectory.GetDirectory(indexFileLocation, true);

//create an analyzer to process the text
Lucene.Net.Analysis.Analyzer analyzer = new
Lucene.Net.Analysis.Standard.StandardAnalyzer(); 

//create the index writer with the directory and analyzer defined.
Lucene.Net.Index.IndexWriter indexWriter = new
Lucene.Net.Index.IndexWriter(dir, analyzer, 
           /*true to create a new index*/ true); 

//create a document, add in a single field
Lucene.Net.Documents.Document doc = new
Lucene.Net.Documents.Document();

Lucene.Net.Documents.Field fldContent = 
  new Lucene.Net.Documents.Field("content", 
  "The quick brown fox jumps over the lazy dog",
  Lucene.Net.Documents.Field.Store.YES, 


Lucene.Net.Documents.Field.Index.TOKENIZED, 
Lucene.Net.Documents.Field.TermVector.YES);

doc.Add(fldContent);

//write the document to the index
indexWriter.AddDocument(doc);

//optimize and close the writer
indexWriter.Optimize(); 
indexWriter.Close();

Alright, not bad, let’s take a look at what we just did. There are five main classes in use here, and they are Directory, Analyzer, IndexWriter, Document, and Field. We create a Directory that lets Lucene know where we want to store the index. The Analyzer is used to analyze the text. We have an IndexWriter that uses the Directory and Analyzer to create and write out the index. Then, we create a new Document object, and create a Field that has it’s field name set to “content” and the value to “The quick brown fox jumps over the lazy dog”. We add the Field to the Document, and now, we can index the newly created Document with the IndexWriter. Then, we have a funny looking call to Optimize (more on this later), and call Close to close the writer when we are done. We have successfully created a full text index that’s ready to be searched. First, let’s elaborate a little bit on some of the classes that we just used.

After a good indexing of some documents, I’m sure that we are ready for the fun part.

What you need to search an index

Let’s take a look at an example of what you need to perform a simple search.

//state the file location of the index
string indexFileLocation = @"C:\Index";
Lucene.Net.Store.Directory dir =
    Lucene.Net.Store.FSDirectory.GetDirectory(indexFileLocation, true);

//create an index searcher that will perform the search
Lucene.Net.Search.IndexSearcher searcher = new
Lucene.Net.Search.IndexSearcher(dir);

//build a query object
Lucene.Net.Index.Term searchTerm = 
  new Lucene.Net.Index.Term("content", "fox");
Lucene.Net.Search.Query query = new Lucene.Net.Search.TermQuery(searchTerm);

//execute the query
Lucene.Net.Search.Hits hits = searcher.Search(query);

//iterate over the results.
for (int i = 0; i < hits.Length(); i++)
{
    Document doc = hits.Doc(i);
    string contentValue = doc.Get("content");

    Console.WriteLine(contentValue);

}

With this small bit of code, we defined where our index is stored, again through the use of a Directory class. But now, we have this IndexSearch object, which does all the heavy lifting of the actual search. To use the IndexSearcher, we have to pass it a Query object. You call the Search method from the IndexSearcher object, while passing in the Query object to the search. And, it will return you a Hits object. And finally, by iterating through Hits, we are able to pull out the Documents that match that query. After we have our document, we can pull out a field's value that was previously stored with the document when it was indexed. Let's look into the classes a little more closer!

Like I mentioned earlier, there are many implementations of the Query class, each of them has a place in queries. Mostly, you wouldn’t create a query object yourself, but let a powerful parser build a complex query for you with some simple syntax, much like how you search Google. This is were I introduce you to the QueryParser. A QueryParser instance has a method called Parse(string query). Here is a small example on using the QueryParser:

//create an analyzer to process the text
Lucene.Net.Analysis.Analyzer analyzer = new
Lucene.Net.Analysis.Standard.StandardAnalyzer();

//create the query parser, with the default search feild set to "content"
Lucene.Net.QueryParsers.QueryParser queryParser = new
    Lucene.Net.QueryParsers.QueryParser("content", analyzer);

//parse the query string into a Query object
Lucene.Net.Search.Query
query = queryParser.Parse("fox");

And, if you think all this stuff is neat, we have barely even scratched the surface. But, this will be all of the article for now. If you want to find out some more, let me know, and I’ll work on another article about Lucene.Net.

You must Sign In to use this message board.
 
 
Per page   
 FirstPrevNext
Generalhow to update the document content Pin
aldo hexosa
18:56 6 Oct '09  
GeneralFaceted search on lucene.net Pin
milansolanki
4:42 11 Sep '09  
GeneralRe: Faceted search on lucene.net Pin
AndrewSmith
5:59 11 Sep '09  
Generalfile search using lucene Pin
emmmatty1
4:15 15 Aug '09  
QuestionFinding Previous Hits and Next Hits in Lucene Indexing........... Pin
INDRESH SINGH
0:07 11 Aug '09  
QuestionHelp with SpellChecker Pin
Member 2681553
0:21 31 Jul '09  
GeneralWhy does sample code fail for me? Pin
snekker
14:10 19 May '09  
GeneralRe: Why does sample code fail for me? Pin
AndrewSmith
5:26 1 Jun '09  
GeneralRe: Why does sample code fail for me? Pin
pacofer
10:26 14 Jun '09  
GeneralPDF Files Pin
bazzer
0:40 22 Jan '09  
GeneralRe: PDF Files Pin
AndrewSmith
15:54 26 Jan '09  
GeneralHow to create an Index Faster? [modified] Pin
gokul78
9:41 15 Jan '09  
GeneralRe: How to create an Index Faster? Pin
asaund6835
11:35 26 Jan '09  
GeneralRe: How to create an Index Faster? Pin
gokul78
17:16 28 Jan '09  
AnswerRe: How to create an Index Faster? Pin
Kvachan Oleg
23:31 3 Feb '09  
GeneralRe: How to create an Index Faster? Pin
Hardy_Smith
0:16 27 Feb '09  
GeneralRe: How to create an Index Faster? Pin
AndrewSmith
3:28 27 Feb '09  
GeneralRe: How to create an Index Faster? Pin
salman741
1:46 7 May '09  
GeneralRe: How to create an Index Faster? Pin
Hardy_Smith
7:52 6 Aug '09  
QuestionCaching? Pin
Kvachan Oleg
20:28 13 Jan '09  
AnswerRe: Caching? Pin
AndrewSmith
3:54 14 Jan '09  
GeneralGood job Pin
Dr.Luiji
22:39 12 Jan '09  
GeneralRe: Good job Pin
AndrewSmith
3:55 14 Jan '09  
GeneralAddIndexes Method and duplicate items ? Pin
trocphunc
18:05 6 Jan '09  
GeneralRe: AddIndexes Method and duplicate items ? Pin
AndrewSmith
4:07 14 Jan '09  


Last Updated 28 Sep 2008 | Advertise | Privacy | Terms of Use | Copyright © CodeProject, 1999-2009