![]() |
Web Development »
ASP.NET »
Samples
Intermediate
DotLucene: Full-Text Search for Your Intranet or Website using 37 Lines of CodeBy Dan LeteckyAn introduction to DotLucene, open source full-text search engine. |
C#, Windows, .NET1.1, ASP.NET, VS.NET2003, Dev
|
|
Advanced Search Add to IE Search |
|
|
|
||||||||||||||||

Can there be a full-text search coded on 37 lines? Well, I am going to cheat a bit and use DotLucene for the dirty work. DotLucene is a .NET port of Jakarta Lucene search engine maintained by George Aroush et al. Here is a quick list of its features:
Don't take the line count too seriously. I will show you that the core functionality doesn't take more than 37 lines of code, but to make it a real application you will need to spend some more time on it...
We will build a simple demo project that shows how to:
But DotLucene has more potential. In real-world application, you would probably want to:
If you are happy with the Indexing Server, no problem. However, DotLucene has many advantages:
The following line of code creates a new index stored on disk. directory is a path to the directory where the index will be stored.
IndexWriter writer =
new IndexWriter(directory, new StandardAnalyzer(), true);
In this example, we create the index from scratch. This is not necessary, you can also open an existing index and add documents to it. You can also update existing documents by deleting it and adding a new version.
For each HTML document, we will add two fields into the index:
text field that contains the text of the HTML file (with stripped tags). The text itself won't be stored in the index.
path field that contains the file path. It will be indexed and stored in full in the index. public void AddHtmlDocument(string path)
{
Document doc = new Document();
string rawText;
using (StreamReader sr =
new StreamReader(path, System.Text.Encoding.Default))
{
rawText = parseHtml(sr.ReadToEnd());
}
doc.Add(Field.UnStored("text", rawText));
doc.Add(Field.Keyword("path", path));
writer.AddDocument(doc);
}
After adding the documents, you need to close the indexer. Optimization will improve search performance.
writer.Optimize();
writer.Close();
Before doing any search, you need to open the index. directory is the path to the directory where the index was stored.
IndexSearcher searcher = new IndexSearcher(directory);
Now we can parse the query (text is the default field to search for).
Query query =
QueryParser.Parse(q, "text", new StandardAnalyzer());
Hits hits = searcher.Search(query);
Variable hits is a collection of result documents. We will go through it and store the results in a DataTable.
DataTable dt = new DataTable();
dt.Columns.Add("path", typeof(string));
dt.Columns.Add("sample", typeof(string));
for (int i = 0; i < hits.Length(); i++)
{
// get the document from index
Document doc = hits.Doc(i);
// get the document filename
// we can't get the text from the index
//because we didn't store it there
DataRow row = dt.NewRow();
row["path"] = doc.Get("path");
dt.Rows.Add(row);
}
Let's create a highlighter. We will use bold font for highlighting (<B>phrase</B>).
QueryHighlightExtractor highlighter =
new QueryHighlightExtractor(query, new StandardAnalyzer(),
"<B>", "</B>");
During the result fetching, we will load the relevant part of the original text.
for (int i = 0; i < hits.Length(); i++)
{
// ...
string plainText;
using (StreamReader sr =
new StreamReader(doc.Get("filename"),
System.Text.Encoding.Default))
{
plainText = parseHtml(sr.ReadToEnd());
}
row["sample"] =
highlighter.GetBestFragments(plainText, 80, 2, "...");
// ...
}
General
News
Question
Answer
Joke
Rant
Admin
Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads.
|
PermaLink |
Privacy |
Terms of Use
Last Updated: 30 Mar 2005 Editor: Rinish Biju |
Copyright 2005 by Dan Letecky Everything else Copyright © CodeProject, 1999-2010 Web17 | Advertise on the Code Project |