Click here to Skip to main content
Click here to Skip to main content

Sharepoint Crawler Rules Problem and Workarround

, 14 Oct 2008 CPOL
Rate this:
Please Sign up or sign in to vote.
Sharepoint Craw Rules doesnt work in some case, especially excluding data.

Introduction

You can face some issues with Sharepoint Server Crawl Rules. It doesn't work properly in some cases, especially when you define rules to include data. I will explains this by using an example.

Using the code

You can define rules for including or excluding data. If you want to exclude some specific data then that’s OK. For instance you have a data tree as follows:

http://moss:8080/News/
Http://moss:8080/Products/
http://moss:8080/Products/Computers/
and many others…

image001.png

In this case MOSS will not crawl data which starts with http://moss:8080/Products/computers/ url. There is no problem. How about just the otherwise? In my portal there are thousands of lists, documents, libraries and sites and I want MOSS to crawl just some lists that I want. In my scenario I want MOSS to crawl files just under Pages folder.


http://moss:8080/Pages/*
http://moss:8080/News/Pages/*
http://moss:8080/Companies/Asia/Pages/*



But I don’t need other resources such as
http://moss:8080/Lists/*
http://moss:8080/companies/lists/* and other lists and document libraries.

As you know it is not possible to define these kinds of rules in Content Sources. Let’s try this by using Crawl Rules.

image003.png

In this case we expect that urls which contain “/Pages/” will be crawled and others will not. But it doesn’t work. I talked to guys from Microsoft Support and they said that it was about MOSS Search Architecture. I don’t know if the purpose of this works or not, but there is a workaround.

Each site, list and document library has own search visibility property.

image005.png

You can set the search visibility as you want, but if you have more than 1000 sites it is a bit problem. So you need some custom development to set search visibility automatically.

SPList and SPWeb have NoCrawl member and you can set this. Here there is a part of windows form application. Now just focun on "web.NoCrawl = false" line.

private void SetCrawlVisibility(SPWeb web, bool visible, string url)
        {
            try
            {
                web.NoCrawl = false;
                web.Update();
                if (visible)
                {
                    this.Log("Include object SPWeb " + url + " in search results", false);
                }
                else
                {
                    this.Log("Exclude object SPWeb " + url + " from search results", false);
                }
            }
            catch (Exception exception)
            {
                this.Log(exception.Message, true);
            }
        }        

I sent a windows form project with developed Visual Studio, all you need is type your rules including or excluding as follows:

image007.png

By the way clear all rules in Search Administration. After this operation starts crawling and check results.


Have a nice MOSS !

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Mahmut SARIHAN
Team Leader BELBIM INC.
Turkey Turkey
No Biography provided

Comments and Discussions

 
-- There are no messages in this forum --
| Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.1411023.1 | Last Updated 14 Oct 2008
Article Copyright 2008 by Mahmut SARIHAN
Everything else Copyright © CodeProject, 1999-2014
Layout: fixed | fluid