Introduction
Want an effective search routine for your website containing content in both
static HTML/ASPX files, as well as SQL Servers, databases etc. etc.?
Background
After having been the programmer on about ... well.. A LOT of websites, I got
fed up with the existing search engines on the market, both freeware and
commercial, and writing my own custom search engine every time that searched
through our database content was getting a bit tedious at the end. So, I decided
to write my own search engine.. based on the same concepts as a normal
spider/webcrawler.
But the idea basically caused a few headaches to me...
- Speed.. crawling through hundreds of pages was kinda slow.. even on a fast
server
- No custom software on the server.. Developers using 3rd party hosting can't
always persuade the hosting company to run scheduled tasks on their server
- No SQL Server dependency...
- EASY IMPLEMENTATION!
Speed, we all want it, but crawling through a whole website in real-time
doesn't work with that, so I decided to build a caching search engine. And with
the AWESOME threading capabilities of .NET, the 2nd requirement became quite
easy to solve.... The 3rd requirement was solved in like 2 seconds....
DataSet
s/DataTable
s ... learn to use/love 'em :).
Number 4 .. well.. I'm lazy .. :)
Using the code
YASS is VERY easy to implement on your website. Here's an example of how to
search the site.
DataTable result = SiteSearch.Search("Search words");
That's it! You've now got a DataTable
containing the URLs and
the URL ranking returned in a DataTable
.. Calling the indexing
service itself is also quite basic.
IndexerSchedule.Install(0);
IndexerSchedule.Install(60*60);
Pretty easy eh? :)
The downside to this is that every time the ASP worker process on the server
gets restarted/killed, the indexer thread disappears.. but there is an easy
solution to that to put the function in your Global.asax.
protected void Application_Start(Object sender, EventArgs e)
{
IndexerSchedule.Install(60*60);
}
I've included a very simple example, as well as my yass.cs source..
but please, don't hit me. It's VERY messy.. and will be cleaned up later.
Requirements
The only things you have to do, is to put the yass.dll in your
bin folder, and add these three keys in your Web.config
appsettings
.
<add key="YASSHost" value="http://localhost" />
<add key="YASSEntrypoint" value="/default.aspx" />
<add key="YASSXmlDir" value="c:\\inetpub\\wwwroot\\yass\\xml\\" />
Make sure that the folder specified has read/write rights.. otherwise this
will fail...
Future
So far my future plans are:
- Make it faster...it seems to slow down with approx 1200 pages
- Make support for more Entrypoints in web.config
- Make SQL server plug-in
- Make the
DataTable
return "teaser" text under each URL
- Make support for Exclude URLs/filetypes in web.config
- Clean up my yass.cs code.. and make it readable for people who can't read Danish
History
1.0 - First hack.. done in 3 days.