Click here to Skip to main content
Click here to Skip to main content
Go to top

DotLucene Indexer

, 22 Feb 2006
Rate this:
Please Sign up or sign in to vote.
DotLucene Indexer is a handy tool that can be used to automatically generate index for full text

Introduction

DotLucene has been getting quite a bit of attention recently. It is a full text search library that can be used to index fulltext and search later on. Typically you have two parts to this approach. The first is to index documents by a process and the second is to perform a search on the index and retrieve the results. More information about this can be found here and here.

Scenario / Concept

After using DotLucene for a while, you will observe that we tend to develop similar code to do our indexing. Typically, we had a scenario where we had to build our indexes from simple SQL select and add some fields to the index. This article discusses about automating the indexing process, so we can reduce development.

Approach

We will build an XML that will store all our configurations. We need the ability to have multiple indexes, each index builds index to a target index folder and gives it a name.

<indexConfiguration>
	<!--<span class="code-comment"> Multiple indexes --></span>
  <index name="TaskA" indexFolderUrl=
	"C:\Documents and Settings\abi\Desktop\LuceneIndexer\LuceneIndexer\TestIndex">
    
  </index>
</indexConfiguration>

We have a set of fields. Each field has a name, the way it should be stored, indexed and tokenised.

<fields>
	<field name="ID" isStored="true" isIndexed="true" isTokenised="false" />
	<field name="Title" isStored="true" isIndexed="true" isTokenised="false" />
	<field name="ContentText" isStored="true" 
			isIndexed="true" isTokenised="true" />
</fields>

We can get the data from anywhere (SQL Server, Oracle, file system). For our scenario, let's say we get data from SQL Server (I developed code for this, if you want more, you can extend this). SQL server needs a select statement and a connection string which translates to the below XML:

<sqlClient connectionString="Data Source=(local);
	Initial Catalog=Junk;Trusted_Connection=True">
	SELECT * FROM Content</sqlClient>

Putting it all together, this is what we get.

<?xml version="1.0" encoding="utf-8"?><indexConfiguration>
  <index name="TaskA" indexFolderUrl=
	"C:\Documents and Settings\abi\Desktop\LuceneIndexer\LuceneIndexer\TestIndex">
    <!--<span class="code-comment"> Your connection string & Select statement goes here --></span>
    <sqlClient connectionString="Data Source=
	(local);Initial Catalog=Junk;Trusted_Connection=True">
	SELECT * FROM Content</sqlClient>
    <fields>
      <field name="ID" isStored="true" isIndexed="true" isTokenised="false" />
      <field name="Title" isStored="true" isIndexed="true" isTokenised="false" />
      <field name="ContentText" isStored="true" isIndexed="true" isTokenised="true" />
    </fields>
  </index>
  <index name="TaskB" indexFolderUrl="\\MyPC\TestIndex">
    <!--<span class="code-comment"> Your connection string & Select statement goes here --></span>
    <sqlClient connectionString="Data Source=(local);
	Initial Catalog=Junk;Trusted_Connection=True">SELECT * FROM Content</sqlClient>
    <fields>
      <field name="ID" isStored="true" isIndexed="true" isTokenised="false" />
      <field name="Title" isStored="true" isIndexed="true" isTokenised="false" />
      <field name="ContentText" isStored="true" isIndexed="true" isTokenised="true" />
    </fields>
  </index>
</indexConfiguration>

As we want this so that it can be extended in future, we come up with an abstract class called Indexer and SqlClientIndex will inherit off this class. This is the Indexer class:

Sample Image - DotLucene_Indexer.jpg

public abstract class Indexer
   {
        protected XmlNode XmlNode;

        public Indexer(XmlNode xmlNode)
        {
            this.XmlNode = xmlNode;
        }

        public void Generate()
        {
            //Create the index
            string indexFolderUrl = XmlNode.Attributes["indexFolderUrl"].Value;
            IndexWriter writer = new IndexWriter(indexFolderUrl, 
				new StandardAnalyzer(), true);
            IndexRecords(writer);

            writer.Optimize();
            writer.Close();
        }
     
        protected abstract void IndexRecords(IndexWriter writer);
    }

This is the code for SqlClientIndexer:

public class SqlClientIndexer : Indexer
  {
        public SqlClientIndexer(XmlNode xmlNode) : base(xmlNode)
        {
            
        }

        protected override void IndexRecords(IndexWriter writer)
        {
            DataTable dt = GetData;
            //Index all records
            XmlNodeList fields = this.XmlNode.SelectNodes("fields/field");
            for (int i = 0; i < dt.Rows.Count; i++)
            {
                Document doc = new Document();
                for (int j = 0; j < fields.Count; j++)
                {
                    string name = fields[j].Attributes["name"].Value;
                    doc.Add(new Field(name, dt.Rows[i][name].ToString(), 
			bool.Parse(fields[j].Attributes["isStored"].Value), 
			bool.Parse(fields[j].Attributes["isIndexed"].Value), 
			bool.Parse(fields[j].Attributes["isTokenised"].Value)));
                }
                writer.AddDocument(doc);
            }
        }

        private DataTable GetData
        {
            get
            {
                //Get Data using SQL
                string selectCommandText = 
			XmlNode.SelectSingleNode("sqlClient").InnerText;
                string connectionString = 
			XmlNode.SelectSingleNode
			("sqlClient/@connectionString").Value;
                SqlDataAdapter da = 
			new SqlDataAdapter(selectCommandText, 
			new SqlConnection(connectionString));
                DataTable dt = new DataTable();
                da.Fill(dt);
                return dt;
            }
        }
    }

Automate

Now, we write a console application that reads our index configuration file and indexes based on configuration. We also want the option to be able to index some indexes at some time. We use command line arguments to build the XPath, while selecting the nodes.

static class Program
  {
        /// <span class="code-SummaryComment"><summary></span>
        /// The main entry point for the application.
        /// <span class="code-SummaryComment"></summary></span>
        [STAThread]
        static void Main(string[] args)
        {
            XmlDocument doc = new XmlDocument();
            doc.Load("Index.config");
            string xpath = (args.Length > 0 ? "[" + args[0] + "]" : "");
            XmlNodeList nodes = doc.SelectNodes("/indexConfiguration/*" + xpath);
            for (int i = 0; i < nodes.Count; i++)
            {
                Indexer ind = new SqlClientIndexer(nodes[i]);
                ind.Generate();
            }
        }
    }

Finally, i want 'TaskA' to be indexed every 1 hour and project B once every 10 mins. This can be achieved using Windows Task scheduler, DTS or any scheduling tool. In the command line, we will pass "name='TaskA'" for project A and "name='TaskB'" (this is basically Xpath condition). We can achieve a set of Tasks by having the xpath like "name='Task1' or name='Task2'".

Extensions

This can be extended for folder indexing, Oracle, MySql and so on.

History

  • 22nd February, 2006: Initial post

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Abi Bellamkonda
Architect
Australia Australia
"Impossible" + "'" + " " = "I'm Possible"
 
Started programming when i was a kid with 286 computers and Spectrum using BASIC from 1986. There was series of languages like pascal, c, c++, ada, algol, prolog, assembly, java, C#, VB.NET and so on. Then shifted my intrest in Architecture during past 5 years with Rational Suite and UML. Wrote some articles, i was member of month on some sites, top poster(i only answer) of week (actually weeks), won some books as prizes, rated 2nd in ASP.NET and ADO.NET in Australia.
 
There is simplicity in complexity

Comments and Discussions

 
NewsSOLR extension [modified] PinmemberAbi Bellamkonda13-Mar-13 16:13 
GeneralRecord update Pinmemberbuatt29-Mar-07 20:26 
AnswerRe: Record update PinmemberAbishek Bellamkonda29-Mar-07 21:14 
GeneralVery nice PinmemberBo B28-Mar-06 21:53 
AnswerRe: Very nice PinmemberAbishek Bellamkonda29-Mar-06 21:19 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web04 | 2.8.140916.1 | Last Updated 23 Feb 2006
Article Copyright 2006 by Abi Bellamkonda
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid