You can face some issues with Sharepoint Server Crawl Rules. It doesn't work properly in some cases, especially when you define rules to include data. I will explains this by using an example.
Using the code
You can define rules for including or excluding data. If you want to exclude some specific data then that’s OK. For instance you have a data tree as follows:
and many others…
In this case MOSS will not crawl data which starts with http://moss:8080/Products/computers/ url. There is no problem. How about just the otherwise? In my portal there are thousands of lists, documents, libraries and sites and I want MOSS to crawl just some lists that I want. In my scenario I want MOSS to crawl files just under Pages folder.
But I don’t need other resources such as
http://moss:8080/companies/lists/* and other lists and document libraries.
As you know it is not possible to define these kinds of rules in Content Sources. Let’s try this by using Crawl Rules.
In this case we expect that urls which contain “/Pages/” will be crawled and others will not. But it doesn’t work. I talked to guys from Microsoft Support and they said that it was about MOSS Search Architecture. I don’t know if the purpose of this works or not, but there is a workaround.
Each site, list and document library has own search visibility property.
You can set the search visibility as you want, but if you have more than 1000 sites it is a bit problem. So you need some custom development to set search visibility automatically.
SPList and SPWeb have NoCrawl member and you can set this. Here there is a part of windows form application. Now just focun on "web.NoCrawl = false" line.
private void SetCrawlVisibility(SPWeb web, bool visible, string url)
web.NoCrawl = false;
this.Log("Include object SPWeb " + url + " in search results", false);
this.Log("Exclude object SPWeb " + url + " from search results", false);
catch (Exception exception)
I sent a windows form project with developed Visual Studio, all you need is type your rules including or excluding as follows:
By the way clear all rules in Search Administration. After this operation starts crawling and check results.
Have a nice MOSS !