|
Sample ASP.NET client

Sample Windows Forms client

Introduction
It has never been easier to create applications with search capabilities - open-source DotLucene [dotlucene.net] allows building powerful and super-fast full-text search applications. Moreover, it's easy to use. Let's demonstrate it by exploring Seekafile Server [seekafile.org] - a flexible indexing server with capabilities similar to that of Windows Indexing Service [microsoft.com].
This article is a follow-up of Desktop Search Application: Part 1. In that article, I have discussed indexing and searching Office document using DotLucene [dotlucene.net]. This time we will build a more serious application that can be either used directly or as a studying material for practical usage of DotLucene.
In this article, you will learn:
- How to perform indexing in the background.
- How to update documents in DotLucene index.
- How to create queries for DotLucene programmatically.
- How to use IFilter to parse Office documents, Adobe PDF and other file types correctly (it includes the updated parsing code from Desktop Search Application: Part 1).
Features
Seekafile Server [seekafile.org] is a Windows service that indexes documents in the specified directories and watches them for changes.
- Background indexing
- The indexer runs as a Windows service.
- You specify the directories to be watched for changes in the configuration file.
- Indexer works on the background (it doesn't slow down other operations).
- It recognizes any change within a second.
- Powered by DotLucene [dotlucene.net]
- Super-fast searching.
- The index is stored in DotLucene/Lucene 1.3+ compatible format.
- The index can be accessed directly from other applications (you can search even when the indexing is in progress).
- Access the index from any custom application (ASP.NET, Windows Forms application, Java application).
- Built-in support for common file formats:
- Microsoft PowerPoint (PPT)
- Microsoft Word (DOC)
- Microsoft Excel (XLS)
- HTML (HTM/HTML)
- Text files (TXT)
- Rich Text Format (RTF)
- Supports custom plug-ins written in C# or VB.NET.
- Supports IFilter for searching other extensions:
- Adobe Acrobat (PDF)
- Microsoft Visio (VSD)
- XML
- and other...
- Runs on Windows 2000/XP/2003
How it works
Architecture
The is the overview of the architecture:

The architecture is index-centric. It uses the index to communicate with the client search applications. The index is flexible enough to allow this:
- It is possible to search the index while the Seekafile Server is modifying it.
- There can be multiple clients accessing the index simultaneously.
- The changes are visible immediately to all the clients.
- The only information clients need to know is the index location and the available DotLucene document fields [dotlucene.net].
- The index is compatible with the Java version - you can access it from a Java client as well.
Watching changes
This is an overview of the indexing process:
- When the service is started it checks whether the index was already created at the specified location; if not it creates a new one:
if (!IndexReader.IndexExists(cfg.IndexPath))
{
Log.Echo("Creating a new index");
IndexWriter writer = new IndexWriter(cfg.IndexPath,
new StandardAnalyzer(), true);
writer.Close();
}
- It goes through all the indexed directories and adds all the files to the
IndexerQueue (to ensure that everything is indexed properly): foreach (string folder in cfg.Items)
{
IndexerQueue.Add(folder);
startWatcher(folder);
}
- It starts the
FileSystemWatcher to watch all file changes in the indexed directories: private void startWatcher(string directory)
{
watcher = new FileSystemWatcher();
watcher.Path = directory;
watcher.NotifyFilter = NotifyFilters.LastWrite |
NotifyFilters.FileName |
NotifyFilters.DirectoryName;
watcher.IncludeSubdirectories = true;
watcher.Filter = "";
watcher.Changed += new FileSystemEventHandler(OnChanged);
watcher.Created += new FileSystemEventHandler(OnChanged);
watcher.Deleted += new FileSystemEventHandler(OnChanged);
watcher.Renamed += new RenamedEventHandler(OnRenamed);
watcher.EnableRaisingEvents = true;
}
- If there is a change event, it adds the file to the
IndexerQueue: private void OnChanged(object source, FileSystemEventArgs e)
{
if (Directory.Exists(e.FullPath) &&
e.ChangeType == WatcherChangeTypes.Changed)
return;
IndexerQueue.Add(e.FullPath);
}
private void OnRenamed(object source, RenamedEventArgs e)
{
IndexerQueue.Add(e.OldFullPath);
IndexerQueue.Add(e.FullPath);
}
IndexerQueue
The IndexerQueue works this way:
- It works in a separate thread. There is only a single thread processing a single queue at any moment:
public static void Start()
{
if (instanceDirectory == null)
throw new ApplicationException("You must " +
"initialize the queue first by calling Init().");
lock (runningLock)
{
if (!isRunning)
{
indexerThread = new Thread(new ThreadStart(Run));
indexerThread.Name = "Indexer";
indexerThread.Start();
}
}
}
- It processes the items from the queue. It waits if there is nothing in the queue:
while (!shouldStop)
{
if (nextPath != null)
{
lock (items.SyncRoot)
{
items.Remove(nextPath);
}
}
else
{
Thread.Sleep(100);
}
nextPath = next();
}
- If the path is a directory, it goes through it and adds all its content to the queue:
private static void parseDirectory(DirectoryInfo di)
{
foreach (FileInfo f in di.GetFiles())
{
Add(f.FullName, false);
}
foreach (DirectoryInfo d in di.GetDirectories())
{
parseDirectory(d);
}
}
- If the path does not exist, it deletes it from the index (
deleteDocuments) including all subfiles if there are any (deleteDirectory): private static void deleteDocuments(string fullName)
{
IndexReader r = IndexReader.Open(instanceDirectory);
int deletedCount = r.Delete(new Term("fullname", fullName));
r.Close();
}
private static void deleteDirectory(string fullName)
{
IndexReader r = IndexReader.Open(instanceDirectory);
int deletedCount = r.Delete(new Term("parent", fullName));
r.Close();
}
- If the path is already in the index, it checks whether there is any change in file length, creation time, or last write time. To check whether the document is in the index, we create a query programmatically using
BooleanQuery and TermQuery classes: private static bool isInIndex(FileInfo fi)
{
IndexSearcher searcher = new IndexSearcher(instanceDirectory);
BooleanQuery bq = new BooleanQuery();
bq.Add(new TermQuery(new Term("fullname",
fi.FullName)), true, false);
bq.Add(new TermQuery(new Term("length",
fi.Length.ToString())), true, false);
bq.Add(new TermQuery(new Term("created",
DateField.DateToString(fi.CreationTime))), true, false);
bq.Add(new TermQuery(new Term("modified",
DateField.DateToString(fi.LastWriteTime))), true, false);
Hits hits = searcher.Search(bq);
int count = hits.Length();
searcher.Close();
return count == 1;
}
- If there are changes it updates the document in the index. Updating requires deleting the old document and adding a new one:
if (isInIndex(fi))
return;
deleteDocuments(fi.FullName);
addDocument(fi);
- When adding a document, we record the following metadata:
- name: file name, e.g. document.doc,
- fullname: path, e.g. c:\storage\marketing\document.doc,
- parent: all parent directories, inserted as multiple fields, e.g. c:\; c:\storage; c:\storage\marketing,
- created: creation time,
- modified: last write time,
- length: file length in bytes,
- extension: file extensions, e.g. .doc.
Document doc = new Document();
doc.Add(new Field("name", fi.Name, true, true, true));
doc.Add(new Field("fullname", fi.FullName, true,
true, false));
DirectoryInfo di = fi.Directory;
while (di != null)
{
doc.Add(new Field("parent", di.FullName, true,
true, false));
di = di.Parent;
}
doc.Add(Field.Keyword("created",
DateField.DateToString(fi.CreationTime)));
doc.Add(Field.Keyword("modified",
DateField.DateToString(fi.LastWriteTime)));
doc.Add(Field.Keyword("length", fi.Length.ToString()));
doc.Add(Field.Keyword("extension", fi.Extension));
Parsing the files
DotLucene is able to index only plain text. Therefore, we need to extract the plain text from the rich file formats like Microsoft Word DOC, RTF, or Adobe PDF. The parsing can be done using a .NET plug-in found in the plugins subdirectory of the Seekafile Server or by IFilter interface (which is available in all Windows 2000/XP/2003 installations).
Read more about IFilter:
Plug-ins
Generally, there are two ways of extending the parsing system:
Read more about custom plug-ins:
There is also a sample plug-in included in Seekafile Server download [seekafile.org].
Sample ASP.NET client search application

This ASP.NET application accesses the index directly to search it. It searches the file content only (file and directory names are ignored). It shows a relevant snippet from the document.
Read more about building an ASP.NET client search application [seekafile.org].
Download [seekafile.org] this sample as a part of the Seekafile Server from seekafile.org.
Sample Windows Forms client search application

This Windows Forms application accesses the index directly to search it. It searches the file content only (file and directory names are ignored).
Read more about building a Windows Forms client search application [seekafile.org].
Download [seekafile.org] this sample as a part of the Seekafile Server from seekafile.org.
Features planned for next versions
- Exclude filters.
- Multiple indexes per service.
- Windows Forms client search application.
- Simple GUI management.
- Convenient installer.
- Indexing status and notification support.
- Multi-user desktop search.
Acknowledgements
| You must Sign In to use this message board. |
|
| | Msgs 1 to 25 of 49 (Total in Forum: 49) (Refresh) | FirstPrevNext |
|
|
 |
|
|
Hi,
i tried to index my local site or my web site but i found that Seekafile Server 1.0 indexes a fixed link i need to know how to change this link with my own ?
please help
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
 |
|
|
Hello, thanks for help i tried to change ServerConfig to my own local host : port
ServerConfig xmlns:xsd="http://localhost:42015/" xmlns:xsi="http://localhost:42015/ is it true ?
i tried to index documents with seekafile web site sample but it stills index a not exists folder
c:/temp/seekafile/data/circs/circ64.pdf !!!!!!!!!!!?????????
how to implement this [Seekafile Server] to my web site ? !!! should i add these files to my web site directory or change my web.config?
please give me more details
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
I downloaded the project. The Seakafile.Desktop is set as startup project. I run the project and the form for a search is open, but the index isn't created yet. I'd like to know how the index is created. Sorry for my dumb questions, I'm a beginner in C# yet. Thanks everybody.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
 |
|
|
hello friends, I am new in the use of Lucene.Net, I am trying to create a consultation boolean, somebody would have a clue of I can do. thank you antonio
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
 |
|
|
I m using Lucene service to search pdf docs bt this service is unable to search Scanned Image PDF documents. plz nybdy assist me to do so...
Thanks and Regards Manpreet Singh (Web Developer) V2web Info. pvt. ltd. Daryaganj, Delhi.
Plz Assist
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
 |
|
|
for each document indexed, we need to extract the parent, fullname etc to add to the index. as we know, the parent and fullname can be quite long, is there any way to improve this problem?
i do not think google's desktop search tool use such method employed here.
It's quite a good website to visit ,and i can learn a lot of things about programming,just like the website as www.vckbase.com in china.i want to be a member of this big family.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Hi, I have downloaded various filters from the citeknet homepage as suggested. I have installed them. But, my seekafile server doesn't seem to index the specified file formats with the newly installed IFilters, but it is indexing with the default FileInfo plugin. I guess I need to do some configuration of my windows indexing service before the seekafile service indexes properly Can anyone tell me what should I do, so that my seekafile server indexes all the file formats properly Thanks in advance
Sairaj Sunil MTech
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Hello, I have successfully configured seekafile server 1.5 to index the local folder that contains our website. To test the search, I am using the Sample ASP.NET client search application, using our index folder. It successfully finds documents but returns the filesystem path (ie d:\inetpub\wwwroot....) rather then the web path. How do I configure seekafile server or the ASP.NET client to return the web path? Thanks
Dabbo
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Can you do a Server.mappath call on the local directory to give you a virtual directory?
just a thought but not tested...
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Hello, I want to index a folder present in the shared docs of remote computer. I added that folder to that list of indexed folders. But indexing is not happening. The error that I get is: Error in the config.xml:The folder doesn't exist. But the folder does exist and I am able to access that folder from my own windows application and list the files in that folder. Please help me
Sairaj Sunil MTech
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Yes. Make sure that the shared folder is accessible to the account that is used for running Seekafile service.
-- My sites for smart .NET developers: DayPilot - Open-source Outlook-like calendar control for ASP.NET DotLucene - The fastest open source fulltext search engine for .NET Seekafile Server - Flexible open-source search server DotNetFirebird - Using Firebird SQL in .NET
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Hello, I am running the service under Local system account. Is it the problem? Do I need to run the application as a Network service.
-- modified at 11:04 Tuesday 26th September, 2006
Sairaj Sunil MTech
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
The Local system account doesn't have rights to access network resources. You should use an account that has rights to read on the remote share folder (i.e. probably a domain account).
-- My sites for smart .NET developers: DayPilot - Open-source Outlook-like calendar control for ASP.NET DotLucene - The fastest open source fulltext search engine for .NET Seekafile Server - Flexible open-source search server DotNetFirebird - Using Firebird SQL in .NET
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Hello, I understood that I need to run the service under NT AUTHORITY\NetworkService.I passed the same string as the parameter value for lpServiceStartName and the password is empty. But still, the service is installed under LocalSystem account. Do I need to make some other changes? Thanks once again
-- modified at 0:40 Thursday 5th October, 2006
Sairaj Sunil MTech
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
 |
|
|
Hi Dan, Fisrtly brilliant article.Absolutely loved it !!!! Now to my question is it possible to do what i mention in the subject ? For eg. If my file name is helloworld.pdf , and when i search for helloworld , helloworld.pdf should be displayed in the results ?
Thanks in advance.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Lucene.Net supports trailing asterisk. I.e. you can search for helloworld*. Unfortunately you can't search for *world* or *world.pdf.
-- My sites for smart .NET developers: DayPilot - Open-source Outlook-like calendar control for ASP.NET DotLucene - The fastest open source fulltext search engine for .NET Seekafile Server - Flexible open-source search server DotNetFirebird - Using Firebird SQL in .NET
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
In order to index XML:
1. You need to install the XML Filter from Microsoft - http://www.microsoft.com/sharepoint/server/techinfo/reskit/XML_Filter.asp.
2. Configure your index processing rules (http://www.seekafile.org/processing-rules.html) so that ".*" or just ".xml" extension is processed by IFilterPlugin.
But note:
1. I haven't tested Microsoft XML IFilter with Seekafile.
2. Remember that just the content of the elements is indexed.
3. Since it's trivial to write a custom Seekafile XML plugin that uses .NET Framework classes to work with XML I would recommend writing a custom plugin - that way you get rid of COM interop (which is difficult to debug).
-- My sites for smart .NET developers: DayPilot - Open-source Outlook-like calendar control for ASP.NET DotLucene - The fastest open source fulltext search engine for .NET Seekafile Server - Flexible open-source search server DotNetFirebird - Using Firebird SQL in .NET
|
| Sign In·View Thread·PermaLink | 2.00/5 (1 vote) |
|
|
|
 |
|
|
Hello, I created a plugin named PDFPlugin that parses the pdf files using PDFbox. But I get the following errors: Error 1:'PDFPlugin.PdfPlugin' does not implement interface member 'Seekafile.Plugin.IDocumentPlugin.Document(string)'. 'PDFPlugin.PdfPlugin.Document(string)' is either static, not public, or has the wrong return type. Error 2:'PDFPlugin.PdfPlugin' does not implement interface member 'Seekafile.Plugin.IDocumentPlugin.Analyzer(string)'. 'PDFPlugin.PdfPlugin.Analyzer(string)' is either static, not public, or has the wrong return type.
I have implemented both these methods in my class Pdfplugin(in the namespace PDFPlugin) and they are public, not static and they have the same return type as the methods in the interface
Please help me
Sairaj Sunil MTech
|
| Sign In·View Thread·PermaLink | 2.00/5 (1 vote) |
|
|
|
 |
|
|
Please send the source of the plugin to dan at annpoint.com. I will try to look at it.
-- My sites for smart .NET developers: DayPilot - Open-source Outlook-like calendar control for ASP.NET DotLucene - The fastest open source fulltext search engine for .NET Seekafile Server - Flexible open-source search server DotNetFirebird - Using Firebird SQL in .NET
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Hi I have just downloaded Seekafile Server 1.5 beta2. The indexing doesn't seem to work for pdf documents, bcoz my client search application does not return any pdf files for the search query. Please tell me how to index pdf documents using Seekafile Server. As of now,I am using Seekfile manager to create the index and letting my client access the index. Any help is greatly appreciated
Sairaj Sunil MTech
|
| Sign In·View Thread·PermaLink | 1.50/5 (2 votes) |
|
|
|
 |
|
|
General News Question Answer Joke Rant Admin
|