Click here to Skip to main content
11,480,532 members (55,767 online)
Click here to Skip to main content

Small Lucene.NET Demo App

, 21 Jun 2013 CPOL 50.5K 3.7K 85
Rate this:
Please Sign up or sign in to vote.
A small demo app that shows how to store/search using Lucene.NET

Demo project : Lucene_VS2012_Demo_App.zip

 

Table Of Contents

Introduction

I have just left a job, and am about to start a new one, but just before I  left one of the other guys in my team was tasked with writing something using  the Lucene search engine for .NET. We had to search across 300,000 or some  objects and it appeared to be pretty quick. We were doing it in response to a  user typing a character, and no delay was not noticeable at all, even though it  was going through loads of Rx and loads of different application layers, and  finally hitting Lucene to search for results.

This spiked my interest a bit and I decided to give Lucene a try and see if I  could some up with a simple demo that I could share.

So that is what I did and this is the results of that.

 

Lucene .NET Basics

Lucene.Net is a port of the Lucene search engine library, written in C# and  targeted at .NET runtime users.

The general idea is that you build an Index of .NET objects that are stored  within a specialized Lucene Document with searchable fields. You are then able  to run queries against these stored Documents are rehydrate them back into .NET  objects 

Building An Index (Document)

Index build is the phase where you take your .NET objects and created a  Document for each one and add certain fields (you do not have to store/add  fields for all the .NET objects properties, that is up to you to decide) and  then save these Document(s) to a physical directory on disk, which will later be  searched.

Querying Data

Querying data is obviously one of the main reasons that you would want to use  Lucene .NET and it should come as no suprise that it has good querying  facilities.

I think one of the nice resources of the query syntax that Lucene .NET uses  can be found here :

http://www.lucenetutorial.com/lucene-query-syntax.html

Some simple examples might be 

title:foo  : Search for word "foo" in the title  field.
title:"foo bar" : Search for phrase "foo bar" in the title  field.
title:"foo bar" AND  body:"quick fox" : Search for phrase "foo bar" in the title field  AND the phrase "quick fox" in the body field.
(title:"foo bar" AND  body:"quick fox") OR title:fox : Search for either the phrase "foo bar" in the  title field AND the phrase "quick fox" in the body field, or the word  "fox" in the title field.
title:foo -title:bar : Search for word "foo" and not "bar" in the  title field.

 

Types Of Analyzer

There are actual lots of different Analyzer types in Lucene.NET, such as  (there are many more than this, these are just a few):

  • SimpleAnalyzer
  • StandardAnalyzer
  • StopAnalyzer
  • WhiteSpaceAnalyzer

Choosing the correct one, will depend on what you are trying to achieve, and  what your requirements dictate.

 

The Demo App

This section will talk about the attached demo app, and should give you  enough information to start building your own Lucene.NET powered search should  you wish to use it in your own applications.

What Does The Demo App Do?

The demo app is pretty simple really, here is what it does:

  • There is a static text file (in my case a poem) that is available to  index
  • On startup the text file is indexed and added to the overall Lucene  Index directory (which in my case is hardcoded to C:\Temp\LuceneIndex)
  • There is a UI (I used WPF, but that is irrelavent) which :
    • Allows a user to enter a search key word that is used to search the  indexed Lucene data
    • Will show all the lines from the text file that was originally used  to create the Lucene Index data
    • Will show the matching lines in the poem when the user conducts a  search.

I think the best bet is to see an example. So this is what the UI looks like  when it first loads:

Then we type a search term in, say the word "when", and we would see this:

And that is all the demo does, but I think that is enough to demonstrate how  Lucene works.

 

What Gets Stored And How

So what gets stored. Well that is pretty simple, recall I stated that we had  a static text file (a poem), well we start by reading that static text file  using a simple utility class which is shown below, into actual SampleDataFileRow objects that be added to the Lucene index

public class SampleDataFileReader : ISampleDataFileReader
{
    public IEnumerable<SampleDataFileRow> ReadAllRows()
    {
        FileInfo assFile = new FileInfo(Assembly.GetExecutingAssembly().Location);
        string file = string.Format(@"{0}\Lucene\SampleDataFile.txt", assFile.Directory.FullName);
        string[] lines = File.ReadAllLines(file);
        for (int i = 0; i < lines.Length; i++)
		{
            yield return new SampleDataFileRow
            {
                LineNumber = i + 1,
                LineText = lines[i]
            };
		}     
    }
}

Where the SampleDataFileRow objects look like this

public class SampleDataFileRow
{
    public int LineNumber { get; set; }
    public string LineText { get; set; }
    public float Score { get; set; }
}

And then from there we build the Lucene Index, which is done as follows:

public class LuceneService : ILuceneService
{
    // Note there are many different types of Analyzer that may be used with Lucene, the exact one you use
    // will depend on your requirements
    private Analyzer analyzer = new WhitespaceAnalyzer(); 
    private Directory luceneIndexDirectory;
    private IndexWriter writer;
    private string indexPath = @"c:\temp\LuceneIndex";

    public LuceneService()
    {
        InitialiseLucene();
    }

    private void InitialiseLucene()
    {
        if(System.IO.Directory.Exists(indexPath))
        {
            System.IO.Directory.Delete(indexPath,true);
        }

        luceneIndexDirectory = FSDirectory.GetDirectory(indexPath);
        writer = new IndexWriter(luceneIndexDirectory, analyzer, true);
    }

    public void BuildIndex(IEnumerable<SampleDataFileRow> dataToIndex)
    {
        foreach (var sampleDataFileRow in dataToIndex)
	    {
		    Document doc = new Document();
            doc.Add(new Field("LineNumber", 
			sampleDataFileRow.LineNumber.ToString() , 
			Field.Store.YES, 
			Field.Index.UN_TOKENIZED));
            doc.Add(new Field("LineText", 
			sampleDataFileRow.LineText, 
			Field.Store.YES, 
			Field.Index.TOKENIZED));
            writer.AddDocument(doc);
	    }
        writer.Optimize();
        writer.Flush();
        writer.Close();
        luceneIndexDirectory.Close();
    }


    ....
    ....
    ....
    ....
    ....
}

I think that code is fairly simple and easy to follow, we essentially just do  this:

  1. Create new Lucene index directory
  2. Create a Lucene writer
  3. Create a new Lucene Document for our source object,
  4. Add the fields to the Lucene Document
  5. Write the Lucene Document to disk

One thing that may be of interest, is that if you are dealing with vast  quantites of data you may want to create static Field fields and  reuse them rather than creating new one each time you rebuild the index.  Obviously for this demo the Lucene index is only created once per application  run, but in a production application you may build the index every 5 mins or  something like that, in which case I would recommend reusing the Field objects by making static fields that get re-used.

 

What Gets Searched And How

So in terms of searching the indexed data this is really easy and all you  need to do is something like this:

public class LuceneService : ILuceneService
{
    // Note there are many different types of Analyzer that may be used with Lucene, the exact one you use
    // will depend on your requirements
    private Analyzer analyzer = new WhitespaceAnalyzer(); 
    private Directory luceneIndexDirectory;
    private IndexWriter writer;
    private string indexPath = @"c:\temp\LuceneIndex";

    public LuceneService()
    {
        InitialiseLucene();
    }

    ....
    ....


    public IEnumerable<SampleDataFileRow> Search(string searchTerm)
    {
        IndexSearcher searcher = new IndexSearcher(luceneIndexDirectory);
        QueryParser parser = new QueryParser("LineText", analyzer);

        Query query = parser.Parse(searchTerm);
        Hits hitsFound = searcher.Search(query);
        List<SampleDataFileRow> results = new List<SampleDataFileRow>();
        SampleDataFileRow sampleDataFileRow = null;

        for (int i = 0; i < hitsFound.Length(); i++)
        {
            sampleDataFileRow = new SampleDataFileRow();
            Document doc = hitsFound.Doc(i);
            sampleDataFileRow.LineNumber = int.Parse(doc.Get("LineNumber"));
            sampleDataFileRow.LineText = doc.Get("LineText");
            float score = hitsFound.Score(i);
            sampleDataFileRow.Score = score;
            results.Add(sampleDataFileRow);
        }
           
        return results.OrderByDescending(x => x.Score).ToList();
    }
}

There is not much too that to be honest, and I think the code explains all you need to know

Lucene GUI

There is also a pretty cool GUI for examining your stored Lucene data, which  is called "Luke.NET", and it freely available from codeplex using the following  link:

http://luke.codeplex.com/releases/view/82033

When you run this tool you will need to enter the path to the index directory  for the Lucene index that was created. For this demo app that is

C:\Temp\LuceneIndex

One you enter that you click "Ok" and you will be presented with a UI that  allows you to examine all the indexed data that Lucene stored, and also run  searches should you wish to.

Its a nice tool and worth a look.

 

That's It

Anyway that is all I have to say for now, I do have a few article done, but  they just need writing up and I am struggling to find time of late. I'll get  there when I get there I guess. Anyway as always if you enjoyed this, a  vote/comment is most welcome.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Sacha Barber
Software Developer (Senior)
United Kingdom United Kingdom
I currently hold the following qualifications (amongst others, I also studied Music Technology and Electronics, for my sins)

- MSc (Passed with distinctions), in Information Technology for E-Commerce
- BSc Hons (1st class) in Computer Science & Artificial Intelligence

Both of these at Sussex University UK.

Award(s)

I am lucky enough to have won a few awards for Zany Crazy code articles over the years

  • Microsoft C# MVP 2015
  • Codeproject MVP 2015
  • Microsoft C# MVP 2014
  • Codeproject MVP 2014
  • Microsoft C# MVP 2013
  • Codeproject MVP 2013
  • Microsoft C# MVP 2012
  • Codeproject MVP 2012
  • Microsoft C# MVP 2011
  • Codeproject MVP 2011
  • Microsoft C# MVP 2010
  • Codeproject MVP 2010
  • Microsoft C# MVP 2009
  • Codeproject MVP 2009
  • Microsoft C# MVP 2008
  • Codeproject MVP 2008
  • And numerous codeproject awards which you can see over at my blog

Comments and Discussions

 
Questionget cosinsimilarity Pin
Taghizadeh3-Dec-14 23:17
memberTaghizadeh3-Dec-14 23:17 
QuestionWould I need to create a separate SampleDataFileRow for every line in every document? Pin
C^219-Apr-14 13:47
memberC^219-Apr-14 13:47 
AnswerRe: Would I need to create a separate SampleDataFileRow for every line in every document? Pin
Garth J Lancaster19-Apr-14 19:16
memberGarth J Lancaster19-Apr-14 19:16 
GeneralRe: Would I need to create a separate SampleDataFileRow for every line in every document? Pin
Sacha Barber20-Apr-14 1:30
mvpSacha Barber20-Apr-14 1:30 
AnswerRe: Would I need to create a separate SampleDataFileRow for every line in every document? Pin
Sacha Barber20-Apr-14 1:32
mvpSacha Barber20-Apr-14 1:32 
AnswerRe: Would I need to create a separate SampleDataFileRow for every line in every document? Pin
Ranjan.D17-Jan-15 15:16
mvpRanjan.D17-Jan-15 15:16 
GeneralRe: Would I need to create a separate SampleDataFileRow for every line in every document? Pin
Sacha Barber17-Jan-15 21:48
mvpSacha Barber17-Jan-15 21:48 
QuestionVery nice Pin
CIDev4-Oct-13 7:22
professionalCIDev4-Oct-13 7:22 
QuestionThanks Pin
david_dawkins21-Jul-13 12:40
memberdavid_dawkins21-Jul-13 12:40 
AnswerRe: Thanks Pin
Sacha Barber22-Jul-13 0:36
mvpSacha Barber22-Jul-13 0:36 
GeneralMy vote of 5 Pin
newton.saber15-Jul-13 3:54
membernewton.saber15-Jul-13 3:54 
GeneralMy vote of 5 Pin
Mihai MOGA13-Jul-13 21:36
professionalMihai MOGA13-Jul-13 21:36 
GeneralMy vote of 5 Pin
Prasad Khandekar9-Jul-13 21:40
professionalPrasad Khandekar9-Jul-13 21:40 
GeneralRe: My vote of 5 Pin
Sacha Barber10-Jul-13 0:08
mvpSacha Barber10-Jul-13 0:08 
QuestionNice Pin
jeethendra123426-Jun-13 0:04
memberjeethendra123426-Jun-13 0:04 
AnswerRe: Nice Pin
Sacha Barber2-Jul-13 1:14
mvpSacha Barber2-Jul-13 1:14 
QuestionGreat Article Pin
ibrahimahmed44325-Jun-13 4:38
memberibrahimahmed44325-Jun-13 4:38 
AnswerRe: Great Article Pin
Sacha Barber25-Jun-13 5:30
mvpSacha Barber25-Jun-13 5:30 
QuestionA bit late to the party mate Pin
Member 456543322-Jun-13 1:52
memberMember 456543322-Jun-13 1:52 
AnswerRe: A bit late to the party mate Pin
Sacha Barber22-Jun-13 6:39
mvpSacha Barber22-Jun-13 6:39 
GeneralMy vote of 5 Pin
Monjurul Habib21-Jun-13 21:27
professionalMonjurul Habib21-Jun-13 21:27 
GeneralRe: My vote of 5 Pin
Sacha Barber21-Jun-13 22:05
mvpSacha Barber21-Jun-13 22:05 
GeneralRe: My vote of 5 Pin
Monjurul Habib23-Jun-13 1:36
professionalMonjurul Habib23-Jun-13 1:36 
QuestionGood job Pin
Pete O'Hanlon21-Jun-13 6:19
protectorPete O'Hanlon21-Jun-13 6:19 
AnswerRe: Good job Pin
Sacha Barber21-Jun-13 6:27
mvpSacha Barber21-Jun-13 6:27 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web04 | 2.8.150520.1 | Last Updated 21 Jun 2013
Article Copyright 2013 by Sacha Barber
Everything else Copyright © CodeProject, 1999-2015
Layout: fixed | fluid