65.9K
CodeProject is changing. Read more.
Home

Sitemap and Sitemap Index Generator

starIconstarIconstarIcon
emptyStarIcon
starIcon
emptyStarIcon

3.14/5 (5 votes)

Mar 25, 2009

CPOL

2 min read

viewsIcon

40591

downloadIcon

1109

Base class for generating sitemaps and sitemap indexes for Google, Yahoo!, and MSN.

Introduction

This article explains a clean and reusable approach to Sitemap Generation and Sitemap Index Generation for the SEO (Search Engine Optimization) used by Google, Yahoo!, MSN and others.

Background

Search engines often ask for a sitemap index so they can ensure they find all the pages in your site that you want them to find. There is a specific xml format that it has to be in, and there are certain rules that need to be followed. Such as 50,000 items per sitemap - with a maximum of 10MB per sitemap size. http://www.sitemaps.org/

Using the code

The attached code contains two classes of importance: BaseSitemapGenerator and BaseSitemapIndexGenerator. The former is used if you know that your sitemap is going to be small (i.e., less than 50,000 and less than 10MB), the latter is for large sites.

BaseSitemapGenerator

To use the former BaseSitemapGenerator, we simply inherit a class from it and overwrite GenerateUrlNodes(). In this method, you call WriteUrlLocation and write each page (without the domain information).

public class SitemapIndexGenerator : BaseSitemapIndexGenerator
{

#region Protected Members

/// <summary>
/// Generate all the category link nodes.
/// </summary>

protected override void GenerateUrlNodes()
{
    WriteUrlLocation("sitemap.aspx", UpdateFrequency.Weekly, DateTime.Now);
    WriteUrlLocation("blog.aspx", UpdateFrequency.Daily, DateTime.Now);
}

#endregion
}

Then, it is a matter of calling the appropriate Generate() method, to get the string code back. Very easy.

I normally link the sitemap.xml to generate this on the fly (if it is quick). See this link for more information on this.

BaseSitemapIndexGenerator

This is similar to the above; however, there are a few properties that you can set.:

  • SitemapIndexFileName - this is the base index filename (will normally be sitemap.xml).
  • SitemapFileNameFormat - this is the format to use for each sitemap file generated within the index (default is "sitemap{0}.xml").

Normally, this will need to be run by a scheduler as it will take a long time to generate.

What is the XML format?

For the sitemaps, the XML format is:

<?xml version="1.0" encoding="utf-8" ?> 
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" 
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
     xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 
         http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
  <url>
    <loc>http://www.domain.com/</loc> 
    <changefreq>weekly</changefreq> 
    <lastmod>2009-03-25</lastmod> 
  </url>
</urlset>

For the sitemap indexes, the XML format is:

<?xml version="1.0" encoding="utf-8" ?> 
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" 
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
       xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 
             http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd">
  <sitemap>
    <loc>http://www.domain.com/sitemap1.xml.gz</loc> 
    <lastmod>2009-03-24</lastmod> 
  </sitemap>
  <sitemap>
    <loc>http://www.domain.com/sitemap2.xml.gz</loc> 
    <lastmod>2009-03-24</lastmod> 
  </sitemap>
</sitemapindex>

History

  • 1.0 - Initial version.