YASS - Yet Another Site Searcher






4.10/5 (9 votes)
Feb 27, 2004
2 min read

89599

2394
A small single site Webcrawler with built-in scheduler.
Introduction
Want an effective search routine for your website containing content in both static HTML/ASPX files, as well as SQL Servers, databases etc. etc.?
Background
After having been the programmer on about ... well.. A LOT of websites, I got fed up with the existing search engines on the market, both freeware and commercial, and writing my own custom search engine every time that searched through our database content was getting a bit tedious at the end. So, I decided to write my own search engine.. based on the same concepts as a normal spider/webcrawler.
But the idea basically caused a few headaches to me...
- Speed.. crawling through hundreds of pages was kinda slow.. even on a fast server
- No custom software on the server.. Developers using 3rd party hosting can't always persuade the hosting company to run scheduled tasks on their server
- No SQL Server dependency...
- EASY IMPLEMENTATION!
Speed, we all want it, but crawling through a whole website in real-time
doesn't work with that, so I decided to build a caching search engine. And with
the AWESOME threading capabilities of .NET, the 2nd requirement became quite
easy to solve.... The 3rd requirement was solved in like 2 seconds....
DataSet
s/DataTable
s ... learn to use/love 'em :).
Number 4 .. well.. I'm lazy .. :)
Using the code
YASS is VERY easy to implement on your website. Here's an example of how to search the site.
DataTable result = SiteSearch.Search("Search words");
That's it! You've now got a DataTable
containing the URLs and
the URL ranking returned in a DataTable
.. Calling the indexing
service itself is also quite basic.
// this will run the indexer in a background thread once..
IndexerSchedule.Install(0);
// this will run the indexer in a background thread every hour
IndexerSchedule.Install(60*60); // takes seconds as argument
Pretty easy eh? :)
The downside to this is that every time the ASP worker process on the server gets restarted/killed, the indexer thread disappears.. but there is an easy solution to that to put the function in your Global.asax.
protected void Application_Start(Object sender, EventArgs e)
{
IndexerSchedule.Install(60*60);
}
I've included a very simple example, as well as my yass.cs source.. but please, don't hit me. It's VERY messy.. and will be cleaned up later.
Requirements
The only things you have to do, is to put the yass.dll in your
bin folder, and add these three keys in your Web.config
appsettings
.
<add key="YASSHost" value="http://localhost" />
<add key="YASSEntrypoint" value="/default.aspx" />
<add key="YASSXmlDir" value="c:\\inetpub\\wwwroot\\yass\\xml\\" />
Make sure that the folder specified has read/write rights.. otherwise this will fail...
Future
So far my future plans are:
- Make it faster...it seems to slow down with approx 1200 pages
- Make support for more Entrypoints in web.config
- Make SQL server plug-in
- Make the
DataTable
return "teaser" text under each URL - Make support for Exclude URLs/filetypes in web.config
- Clean up my yass.cs code.. and make it readable for people who can't read Danish
History
1.0 - First hack.. done in 3 days.