Click here to Skip to main content
Click here to Skip to main content

YASS - Yet Another Site Searcher

, 26 Feb 2004
Rate this:
Please Sign up or sign in to vote.
A small single site Webcrawler with built-in scheduler.

Introduction

Want an effective search routine for your website containing content in both static HTML/ASPX files, as well as SQL Servers, databases etc. etc.?

Background

After having been the programmer on about ... well.. A LOT of websites, I got fed up with the existing search engines on the market, both freeware and commercial, and writing my own custom search engine every time that searched through our database content was getting a bit tedious at the end. So, I decided to write my own search engine.. based on the same concepts as a normal spider/webcrawler.

But the idea basically caused a few headaches to me...

  • Speed.. crawling through hundreds of pages was kinda slow.. even on a fast server
  • No custom software on the server.. Developers using 3rd party hosting can't always persuade the hosting company to run scheduled tasks on their server
  • No SQL Server dependency...
  • EASY IMPLEMENTATION!

Speed, we all want it, but crawling through a whole website in real-time doesn't work with that, so I decided to build a caching search engine. And with the AWESOME threading capabilities of .NET, the 2nd requirement became quite easy to solve.... The 3rd requirement was solved in like 2 seconds.... DataSets/DataTables ... learn to use/love 'em Smile | :) . Number 4 .. well.. I'm lazy .. Smile | :)

Using the code

YASS is VERY easy to implement on your website. Here's an example of how to search the site.

DataTable result = SiteSearch.Search("Search words");

That's it! You've now got a DataTable containing the URLs and the URL ranking returned in a DataTable.. Calling the indexing service itself is also quite basic.

// this will run the indexer in a background thread once.. 
IndexerSchedule.Install(0);

// this will run the indexer in a background thread every hour
IndexerSchedule.Install(60*60); // takes seconds as argument

Pretty easy eh? Smile | :)

The downside to this is that every time the ASP worker process on the server gets restarted/killed, the indexer thread disappears.. but there is an easy solution to that to put the function in your Global.asax.

protected void Application_Start(Object sender, EventArgs e)
{
    IndexerSchedule.Install(60*60);
}

I've included a very simple example, as well as my yass.cs source.. but please, don't hit me. It's VERY messy.. and will be cleaned up later.

Requirements

The only things you have to do, is to put the yass.dll in your bin folder, and add these three keys in your Web.config appsettings.

<add key="YASSHost" value="http://localhost" />
<add key="YASSEntrypoint" value="/default.aspx" />
<add key="YASSXmlDir" value="c:\\inetpub\\wwwroot\\yass\\xml\\" />

Make sure that the folder specified has read/write rights.. otherwise this will fail...

Future

So far my future plans are:

  • Make it faster...it seems to slow down with approx 1200 pages
  • Make support for more Entrypoints in web.config
  • Make SQL server plug-in
  • Make the DataTable return "teaser" text under each URL
  • Make support for Exclude URLs/filetypes in web.config
  • Clean up my yass.cs code.. and make it readable for people who can't read Danish

History

1.0 - First hack.. done in 3 days.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

Kenneth "fessor" Christensen
Web Developer
Denmark Denmark
Webdeveloper based in Holstebro, Denmark
Been developing various websolutions over the past 6 years, Democoder in sparetime..

Comments and Discussions

 
GeneralNice Solution... PinmemberMatthew Hazlett27-Feb-04 0:23 
GeneralRe: Nice Solution... PinmemberKenneth "fessor" Christensen27-Feb-04 0:27 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web01 | 2.8.140709.1 | Last Updated 27 Feb 2004
Article Copyright 2004 by Kenneth "fessor" Christensen
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid