5,695,118 members and growing! (14,450 online)
Email Password   helpLost your password?
Enterprise Systems » Content Management Server » General     Intermediate License: The Code Project Open License (CPOL)

Microsoft Content Management Server Crawl Page for Search

By Stephen Huen

Provides a start page for search engines to crawl a Content Management Server (MCMS) web site.
SQL, C#, VBWindows, .NET, .NET 1.1, .NET 2.0, Win2K, Win2003, ASP.NET, SQL 2000, IIS 5, IIS 5.1, IIS 6, VS.NET2003, VS2005, SQL Server, IIS, Visual Studio, DBA, Dev

Posted: 5 Sep 2004
Updated: 17 May 2006
Views: 34,738
Bookmarked: 21 times
Announcements
Loading...



Search    
Advanced Search
Sitemap
6 votes for this Article.
Popularity: 3.50 Rating: 4.50 out of 5
0 votes, 0.0%
1
0 votes, 0.0%
2
1 vote, 16.7%
3
1 vote, 16.7%
4
4 votes, 66.7%
5

Sample Image - sample.gif

Introduction

This Crawl Results user control provides a start page for a search engine to crawl a Content Management Server (MCMS) web site.

For search purposes, it is recommended to hide a site's navigation menus when the browser user agent is detected to be a robot. This prevents words in the navigation menus from appearing in every page of the search result when a user searches for those words. With the navigation menus turned off, the search crawler will have no way to crawl through all the pages. This user control generates links to channels and postings, and allows a search engine to recursively crawl through all the pages.

As the user control does not generate links to all the channels and postings in one go, it is scalable to large sites. A meta robot tag of NOINDEX,FOLLOW should be included in the crawl page using the user control so that the crawler will not index the channel and posting lists but follow them.

Both C# and VB.NET versions are included.

Installation

  1. Copy CrawlResults.ascx and its code-behind file to the user control directory in your MCMS site.
  2. Create an ASPX page and insert CrawlResults.ascx. Change the StartChannelPath property to the root of your MCMS site. For example:
    <%@ Register TagPrefix="uc1" TagName="CrawlResults" 
                Src="~/UserControls/CrawlResults.ascx" %>
    . . .
    . . .
    <uc1:CrawlResults id="CrawlResults" 
      StartChannelPath="/Channels/WoodgroveNet" runat="server">
    </uc1:CrawlResults>
  3. Add the following meta tag to the HTML header in the ASPX page:
    <meta name="ROBOTS" CONTENT="NOINDEX,FOLLOW">
  4. Set the start page of the content source in your search engine to the URL of your ASPX page. For example, http://<server name>/<site name>/crawlpage.aspx.

History

  • V1.0 - 2004.09.05 - Base.
  • V1.1 - 2005.02.12 - Converted page to user control. Added option to specify the starting channel path.
  • V1.2 - 2006.05.14 - ASP.NET 2.0 version added.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Stephen Huen



Occupation: Web Developer
Company: Questech Systems
Location: Canada Canada

Other popular Content Management Server articles:

Article Top
Sign Up to vote for this article
You must Sign In to use this message board.
FAQ FAQ Noise ToleranceSearch Search Messages 
 Layout  Per page   
 Msgs 1 to 3 of 3 (Total in Forum: 3) (Refresh)FirstPrevNext
GeneralContent SearchmemberRoopeshPerlaKumar1:06 3 Feb '06  
GeneralUsing Google Search Enginememberkanid1:55 4 Feb '05  
GeneralRe: Using Google Search EnginememberStephen Huen22:04 7 Feb '05  

General General    News News    Question Question    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

PermaLink | Privacy | Terms of Use
Last Updated: 17 May 2006
Editor: Smitha Vijayan
Copyright 2004 by Stephen Huen
Everything else Copyright © CodeProject, 1999-2008
Web07 | Advertise on the Code Project