Click here to Skip to main content
15,884,473 members
Articles / Programming Languages / C#
Article

Sharepoint Crawler Rules Problem and Workarround

Rate me:
Please Sign up or sign in to vote.
3.33/5 (2 votes)
14 Oct 2008CPOL2 min read 32.5K   120   11  
Sharepoint Craw Rules doesnt work in some case, especially excluding data.

Introduction

You can face some issues with Sharepoint Server Crawl Rules. It doesn't work properly in some cases, especially when you define rules to include data. I will explains this by using an example.

Using the code

You can define rules for including or excluding data. If you want to exclude some specific data then that’s OK. For instance you have a data tree as follows:

http://moss:8080/News/
Http://moss:8080/Products/
http://moss:8080/Products/Computers/
and many others…

image001.png

In this case MOSS will not crawl data which starts with http://moss:8080/Products/computers/ url. There is no problem. How about just the otherwise? In my portal there are thousands of lists, documents, libraries and sites and I want MOSS to crawl just some lists that I want. In my scenario I want MOSS to crawl files just under Pages folder.


http://moss:8080/Pages/*
http://moss:8080/News/Pages/*
http://moss:8080/Companies/Asia/Pages/*



But I don’t need other resources such as
http://moss:8080/Lists/*
http://moss:8080/companies/lists/* and other lists and document libraries.

As you know it is not possible to define these kinds of rules in Content Sources. Let’s try this by using Crawl Rules.

image003.png

In this case we expect that urls which contain “/Pages/” will be crawled and others will not. But it doesn’t work. I talked to guys from Microsoft Support and they said that it was about MOSS Search Architecture. I don’t know if the purpose of this works or not, but there is a workaround.

Each site, list and document library has own search visibility property.

image005.png

You can set the search visibility as you want, but if you have more than 1000 sites it is a bit problem. So you need some custom development to set search visibility automatically.

SPList and SPWeb have NoCrawl member and you can set this. Here there is a part of windows form application. Now just focun on "web.NoCrawl = false" line.

private void SetCrawlVisibility(SPWeb web, bool visible, string url)
        {
            try
            {
                web.NoCrawl = false;
                web.Update();
                if (visible)
                {
                    this.Log("Include object SPWeb " + url + " in search results", false);
                }
                else
                {
                    this.Log("Exclude object SPWeb " + url + " from search results", false);
                }
            }
            catch (Exception exception)
            {
                this.Log(exception.Message, true);
            }
        }        

I sent a windows form project with developed Visual Studio, all you need is type your rules including or excluding as follows:

image007.png

By the way clear all rules in Search Administration. After this operation starts crawling and check results.


Have a nice MOSS !

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Team Leader BELBIM INC.
Turkey Turkey
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
-- There are no messages in this forum --