Click here to Skip to main content
Click here to Skip to main content

Sitemap and Sitemap Index Generator

, 25 Mar 2009
Rate this:
Please Sign up or sign in to vote.
Base class for generating sitemaps and sitemap indexes for Google, Yahoo!, and MSN.

Introduction

This article explains a clean and reusable approach to Sitemap Generation and Sitemap Index Generation for the SEO (Search Engine Optimization) used by Google, Yahoo!, MSN and others.

Background

Search engines often ask for a sitemap index so they can ensure they find all the pages in your site that you want them to find. There is a specific xml format that it has to be in, and there are certain rules that need to be followed. Such as 50,000 items per sitemap - with a maximum of 10MB per sitemap size. http://www.sitemaps.org/

Using the code

The attached code contains two classes of importance: BaseSitemapGenerator and BaseSitemapIndexGenerator. The former is used if you know that your sitemap is going to be small (i.e., less than 50,000 and less than 10MB), the latter is for large sites.

BaseSitemapGenerator

To use the former BaseSitemapGenerator, we simply inherit a class from it and overwrite GenerateUrlNodes(). In this method, you call WriteUrlLocation and write each page (without the domain information).

public class SitemapIndexGenerator : BaseSitemapIndexGenerator
{

#region Protected Members

/// <summary>
/// Generate all the category link nodes.
/// </summary>

protected override void GenerateUrlNodes()
{
    WriteUrlLocation("sitemap.aspx", UpdateFrequency.Weekly, DateTime.Now);
    WriteUrlLocation("blog.aspx", UpdateFrequency.Daily, DateTime.Now);
}

#endregion
}

Then, it is a matter of calling the appropriate Generate() method, to get the string code back. Very easy.

I normally link the sitemap.xml to generate this on the fly (if it is quick). See this link for more information on this.

BaseSitemapIndexGenerator

This is similar to the above; however, there are a few properties that you can set.:

  • SitemapIndexFileName - this is the base index filename (will normally be sitemap.xml).
  • SitemapFileNameFormat - this is the format to use for each sitemap file generated within the index (default is "sitemap{0}.xml").

Normally, this will need to be run by a scheduler as it will take a long time to generate.

What is the XML format?

For the sitemaps, the XML format is:

<?xml version="1.0" encoding="utf-8" ?> 
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" 
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
     xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 
         http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
  <url>
    <loc>http://www.domain.com/</loc> 
    <changefreq>weekly</changefreq> 
    <lastmod>2009-03-25</lastmod> 
  </url>
</urlset>

For the sitemap indexes, the XML format is:

<?xml version="1.0" encoding="utf-8" ?> 
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" 
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
       xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 
             http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd">
  <sitemap>
    <loc>http://www.domain.com/sitemap1.xml.gz</loc> 
    <lastmod>2009-03-24</lastmod> 
  </sitemap>
  <sitemap>
    <loc>http://www.domain.com/sitemap2.xml.gz</loc> 
    <lastmod>2009-03-24</lastmod> 
  </sitemap>
</sitemapindex>

History

  • 1.0 - Initial version.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Andrew_Thomas
Chief Technology Officer Intuitive Search Technologies
United Kingdom United Kingdom
Originally from New Zealand, currently work as Development Directory at a software company in the UK specialising in online marketing and advertising.
I have a blog located at: http://andrew.thomas.net.nz, which is all about development in Microsoft .Net, focused on C#, Asp .NET, SQL Server and SEO. Check it out...

Comments and Discussions

 
GeneralMy vote of 1 Pinmemberbrsecu17-Jan-12 7:51 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web02 | 2.8.140721.1 | Last Updated 25 Mar 2009
Article Copyright 2009 by Andrew_Thomas
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid