Click here to Skip to main content
Click here to Skip to main content

ASP.NET Google Site Map Provider

, 10 Oct 2006 CPOL
Rate this:
Please Sign up or sign in to vote.
Using the ASP.NET Provider model to generate a dynamic Google Sitemap.

Sample Image - GoogleSiteMapProvider.jpg

Introduction

Google SiteMaps are an important tool for website developers, webmasters, and pretty much anyone with a website. If you don't know what a Google Sitemap is, take a look at http://www.google.com/webmasters. A Google Sitemap is an XML file which instructs the Google crawler which URLs in your site to visit, and allows you to tell the crawler how often pages are updated, and you can also place a relative priority on the pages within the site. The instructions Google gives is to code the XML file by hand, but of course, with a dynamic website, you don't want to do that.

What you need is a dynamic Google Sitemap generator that gives real-time data about your website. There are many of these around. My specific problem was that I had specific websites with specific requirements, but 90% of the requirements were exactly the same. So, I decided to utilise the Provider model, and develop a base provider for delivering Sitemaps, which could then be expanded with new providers for each specific situation that arose in subsequent websites. As I hadn't done any work with either Google Sitemaps or Providers, it provided a good learning opportunity as well.

What does a Google Sitemap look like?

A Google Sitemap is just an XML file telling the Google crawler which URLs to look up in the website.

For each URL in the site to be indexed, there should be one <url> entry in the XML file. There is a limit of 10,000 URLs per file, but several sitemaps per site can be submitted. The lastMod attribute tells Google when the page was last modified, the changefreq attribute tells Google how often the page changes, and the priority is a relative measure for the page, against all other pages in your site. As Google clearly explains, there isn't anything you can do with Sitemaps to increase your site ranking, they are only a tool for helping Google crawl all the parts of your site that you want them to crawl. In this way, it's kind of a super-robots.txt file.

Background

I have worked with the ASP Provider model in the past, but I had never actually developed my own custom provider for anything. I utilised the resources of MSDN and downloaded the source code for a Custom Provider. I even read the instructions!

The Provider model allows the plugging in of different code to do the same job without having to recompile. Changing the provider for a specific task in an application is done in the web/app config file. The config file tells the application at run-time which code to execute for a specific task.

The most common use of Providers is in the area of Data providers - an example allowing the quick change between a SQL Server provider and an Oracle provider by switching between providers in the web/app.config file.

The relevant entry in the web.config file for the Google Site Map Provider looks like this:

<googlesitemaps defaultProvider="BaseGoogleSiteMapProvider">
  <providers>
    <add name="BaseGoogleSiteMapProvider" 
             type="GoogleSiteMap.GoogleSiteMapProvider" />
    <add name="SpecialisedGoogleSiteMapProvider" 
             type="Specialised.GoogleSiteMapProvider" />
  </providers>
</googlesitemaps>

This example shows the use of two providers, the default provider, and, if necessary, a Specialised Provider. (The Specialised.GoogleSiteMapProvider isn't in the demo project, I have just shown it as an example here.)

Requirements

The requirements I generated for my own project were to:

  • Be instantly useable with the majority of ASP.NET applications.
  • Be a full 'binary' solution - no integration of code or compiling - just drop in a binary, modify the web.config, and go.
  • Be extendable so that more complicated ASP.NET applications could redefine the provider without restriction.

Solution

The solution was to have a single assembly with three main types:

  1. An HTTP Handler which would return the XML on request (called GoogleSiteMapHandler)
  2. A Provider type (called GoogleSiteMapProvider)
  3. A Controller class to glue the Handler and Provider together

Why do it this way?

In effect, I could have had a separate Handler file (.ashx) which could be dropped into the destination ASP.NET website. But to keep to requirement (1), I wanted the whole project to be a simple drop-in to the \bin directory. This is why the Handler and the Provider are in the same assembly.

By doing it this way, I can also create new assemblies which inherit from the base provider, controller, and handler classes and create whole new Providers for specific types of websites which use HTTP redirection and URLs that don't actually map to physical files on the server.

Using the code

To install and try out the demo project, simply download the zip file and unpack it. The file 'iFinity.GoogleSiteMapProvider.dll' should be copied into the \bin directory of your target website.

Then, open up your web.config (remember to take a backup first) and insert the following lines:

In the <configuration> section, under <configSections>, put in the following entries:

<configuration>
   <configSections>
      <section name="googlesitemaps" 
       type="iFinity.Providers.GoogleSiteMap.GoogleSiteMapSection, 
             iFinity.GoogleSiteMapProvider />                
   </configSections>
</configuration>

Remember you will probably already have the <configuration> and <configSections> entries in the web.config, but create them if you do not.

The entry in the <configSections> tells ASP.NET to look for a section in the app/web.config file called 'googlesitemaps'. The type attribute is in the format of type="typeName, assemblyName", and tells ASP.NET that there is a type called 'GoogleSiteMapSection' in the assembly 'iFinity.GoogleSiteMapProvider'. The GoogleSiteMapSection type derives from System.Configuration.ConfigurationSection and provides the run-time type to represent the Providers section in the config file. This is all done at runtime by the ProviderBase class.

The next entry to make in the web.config file is the actual 'googlesitemaps' section that was named in the <configSection> entry. This should be done after the closing tag of the <system.web> section, but before the end of the </config> section.

<googlesitemaps defaultProvider="BaseGoogleSiteMapProvider">
  <providers>
     <add name="BaseGoogleSiteMapProvider" 
          type="iFinity.Providers.GoogleSiteMap.GoogleSiteMapProvider" 
          defaultPagePriority="0.5" defaultPageUpdateFrequency="daily" 
          sitePageTypes="aspx,html,htm" />
  </providers>
</googlesitemaps>

This entry tells ASP.NET which providers are available to use at runtime. If anything else but the default provider is to be used, the calling code would have to be modified to do so. However, to change the default provider to be used, the defaultProvider attribute just needs to match the name of a provider in the list.

The final change to make to the web.config is the addition of the HTTP Handler to actually produce the Sitemap. This is done in the web.config within the system.web section, under the httpHandlers section.

<httpHandlers>
   <add verb="*" path="GoogleSiteMapHandler.axd" 
        type="iFinity.Providers.GoogleSiteMap.GoogleSiteMapHandler,
              iFinity.GoogleSiteMapProvider"/>
</httpHandlers>

This entry tells any incoming requests for 'GoogleSiteMapHandler.axd' to load up the iFinity.GoogleSiteMapProvider assembly and call the type of 'iFinity.Providers.GoogleSiteMap.GoogleSiteMapHandler'. This is done automatically by ASP.NET for you, as long as the specified type implements the IHttpHandler interface (which this does).

Please note that the Handler doesn't need to be in the provider, and in a way, including the Handler type within the Provider model pollutes it slightly. By rights, the Handler should call the ASP.NET ProvidersHelper namespace to give it back the correct Provider for that configuration. To be completely correct, the Handler type and the GoogleSiteMapService type should be in a separate namespace and assembly. But as I intend to create separate assemblies for providers down the track, I'm happy to live with my model. Others may claim it incorrect, and they have a valid point.

Program flow

When an HTTP request is made for GoogleSiteMapHandler.axd (either by the Google crawler, or by typing in 'yoursite.com/googleSiteMapHandler.axd' into a browser), ASP.NET loads up the named type/assemby in the httpHandlers web.config section. In this instance it is the same DLL as the Provider, though it doesn't need to be as discussed previously. ASP.NET calls ProcessRequest(HttpContext context) as any type implementing IHttpHandler must have. This then calls the GoogleSiteMapService.GetGoogleSiteMap() method, which then asks ASP.NET for the default provider as named in the googlesitemaps configuration section.

ASP.NET reads in the providers, and instantiates an object of the type named as the default provider. This provider object is then asked for the XML that makes up the site map. As the assembly also includes a basic implementation of the default provider, it is this provider that is called. The base implementation in the demo project simply iterates the directories and reads in all of the files that match the named extension in the sitePageTypes attribute. This XML is then passed back up through the call stack and returned as XML through the HTTPHandler, resulting in XML being output either to the browser or the Google crawler.

Expansion possibilities

As mentioned before, this project was made with the intent of developing a better understanding of the provider model, and providing a base implementation that can be expanded to better handle more complicated ASP.NET application models.

To expand this code, there are two possible directions. The first, and simplest, is to just modify the code in the GoogleSiteMapProvider IteratePages() procedure. This can be modified in order to better provide a site map for a particular site - the possibilities are quite open in this respect.

The second, and conceptually better but slightly more complicated, is to simply reference the provided assembly and create your own provider by inheriting from the GoogleSiteMapProvider type. You will need to redefine the IteratePages() in the derived class to index the pages in the site in a better method, but everything else can be left as is. The new provider would be compiled into a separate assembly and then named as the default provider in the googlesitemaps configuration section.

For instance, let's say you create a new provider class called 'MyNewGoogleSiteMapProviderType' and compile it into an assembly called 'MyNewGoogleSiteMapProviderAssembly.dll'. The config entry would be:

<add name="BaseGoogleSiteMapProvider" type="MyNewGoogleSiteMapProviderType, 
           MyNewGoogleSiteMapProviderAssembly" 
     defaultPagePriority="0.5" defaultPageUpdateFrequency="daily"/>

This would mean that your new type would be called to provide the list of pages for the website. The Base provider would take care of formatting it into the Sitemap format and outputting the XML. You can leave all the other web.config entries as is - the built in HttpHandler would take care of calling your provider for the list of pages in your site. How you provide that list is up to you!

What's next

I will be developing a new implementation of the provider model to suit DotNetNuke, as this is the platform I do a lot of development in. DotNetNuke uses an HttpRedirection method to serve many URLs from a single default.aspx page, and as such can't be used to generate a Sitemap from physical files.

I will then create different providers for each of the separate specialised modules that I use in DotNetNuke websites. Some modules provide a wide range of different content for one URL, depending on database-driven content. With conventional Google indexing, much of the content may not be found and indexed correctly.

Please note that the XML examples in this page have had page breaks placed in them to get them to fit, there is no need to do this in your web.config file.

Copyright notice

You are free to use, modify, and extend the supplied code provided that you do not remove the copyright messages in the source, or attempt to pass either the code or this article off as your own. Obviously with free demo code, there's no warranty that it will actually work and there may be bugs in the provided download.

If you use this code and find it useful, I appreciate links back to my website, http://www.ifinity.com.au/.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Bruce Chapman DNN
Product Manager DNN Corp
Australia Australia
Bruce Chapman is the Product Manager for Cloud Services at DNN. He’s been an active member of the DNN Community since 2006 as a contributor, vendor and now employee of DNN Corp.
 
You can read his blog at http://dnnsoftware.com/blog or follow him on Twitter @brucerchapman
Follow on   Twitter   Google+

Comments and Discussions

 
GeneralImplementation [modified] PinmemberMember 905224611-Jul-12 23:11 
QuestionMultiple providers? Pinmembergilm007928-Jun-11 13:36 
GeneralGreat Piece of Software Pinmemberartkuntz11-May-10 12:21 
GeneralUnable to locate IteratePages() method Pinmembermrweener28-Jan-10 6:09 
GeneralPriority always comes 0 PinmemberTarzaan25-Jul-08 1:30 
GeneralRe: Priority always comes 0 Pinmemberfguenaud13-Feb-09 2:04 
QuestionI want it in my website Pinmembersubhadeep.mitra14-May-08 20:43 
AnswerRe: I want it in my website Pinmemberbrucerchapman14-May-08 21:32 
Questiongoogle site map writting in physical xml file PinmemberAnbarasan Sampath4-Apr-07 4:34 
AnswerRe: google site map writting in physical xml file Pinmemberbrucerchapman4-Apr-07 13:21 
Of course! First, bring up your sitemap by typing the address in to the browser. You should see it as an Xml file. Then just use the File->Save As... menu command in IE, or the File->Save Page As... command in Firefox. This way you can save the physical Xml file to your local computer.
 

Bruce Chapman
iFinity.com.au - Websites and Software Development
 
Plithy remark available in Beta 2

GeneralRe: google site map writting in physical xml file PinmemberMember 905224611-Jul-12 23:53 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web04 | 2.8.141220.1 | Last Updated 10 Oct 2006
Article Copyright 2006 by Bruce Chapman DNN
Everything else Copyright © CodeProject, 1999-2014
Layout: fixed | fluid