Click here to Skip to main content
Licence CPOL
First Posted 14 Oct 2008
Views 14,243
Downloads 41
Bookmarked 11 times

Sharepoint Crawler Rules Problem and Workarround

By | 14 Oct 2008 | Article
Sharepoint Craw Rules doesnt work in some case, especially excluding data.

Introduction

You can face some issues with Sharepoint Server Crawl Rules. It doesn't work properly in some cases, especially when you define rules to include data. I will explains this by using an example.

Using the code

You can define rules for including or excluding data. If you want to exclude some specific data then that’s OK. For instance you have a data tree as follows:

http://moss:8080/News/
Http://moss:8080/Products/
http://moss:8080/Products/Computers/
and many others…

image001.png

In this case MOSS will not crawl data which starts with http://moss:8080/Products/computers/ url. There is no problem. How about just the otherwise? In my portal there are thousands of lists, documents, libraries and sites and I want MOSS to crawl just some lists that I want. In my scenario I want MOSS to crawl files just under Pages folder.


http://moss:8080/Pages/*
http://moss:8080/News/Pages/*
http://moss:8080/Companies/Asia/Pages/*



But I don’t need other resources such as
http://moss:8080/Lists/*
http://moss:8080/companies/lists/* and other lists and document libraries.

As you know it is not possible to define these kinds of rules in Content Sources. Let’s try this by using Crawl Rules.

image003.png

In this case we expect that urls which contain “/Pages/” will be crawled and others will not. But it doesn’t work. I talked to guys from Microsoft Support and they said that it was about MOSS Search Architecture. I don’t know if the purpose of this works or not, but there is a workaround.

Each site, list and document library has own search visibility property.

image005.png

You can set the search visibility as you want, but if you have more than 1000 sites it is a bit problem. So you need some custom development to set search visibility automatically.

SPList and SPWeb have NoCrawl member and you can set this. Here there is a part of windows form application. Now just focun on "web.NoCrawl = false" line.

private void SetCrawlVisibility(SPWeb web, bool visible, string url)
        {
            try
            {
                web.NoCrawl = false;
                web.Update();
                if (visible)
                {
                    this.Log("Include object SPWeb " + url + " in search results", false);
                }
                else
                {
                    this.Log("Exclude object SPWeb " + url + " from search results", false);
                }
            }
            catch (Exception exception)
            {
                this.Log(exception.Message, true);
            }
        }        

I sent a windows form project with developed Visual Studio, all you need is type your rules including or excluding as follows:

image007.png

By the way clear all rules in Search Administration. After this operation starts crawling and check results.


Have a nice MOSS !

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Mahmut SARIHAN

Team Leader
BELBIM INC.
Turkey Turkey

Member



Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board. (secure sign-in)
 
Search this forum  
 FAQ
    Noise  Layout  Per page   
  Refresh
-- There are no messages in this forum --
Permalink | Advertise | Privacy | Mobile
Web02 | 2.5.120517.1 | Last Updated 14 Oct 2008
Article Copyright 2008 by Mahmut SARIHAN
Everything else Copyright © CodeProject, 1999-2012
Terms of Use
Layout: fixed | fluid