Click here to Skip to main content
Click here to Skip to main content

Sitemap and Sitemap Index Generator

By , 25 Mar 2009
 

Introduction

This article explains a clean and reusable approach to Sitemap Generation and Sitemap Index Generation for the SEO (Search Engine Optimization) used by Google, Yahoo!, MSN and others.

Background

Search engines often ask for a sitemap index so they can ensure they find all the pages in your site that you want them to find. There is a specific xml format that it has to be in, and there are certain rules that need to be followed. Such as 50,000 items per sitemap - with a maximum of 10MB per sitemap size. http://www.sitemaps.org/

Using the code

The attached code contains two classes of importance: BaseSitemapGenerator and BaseSitemapIndexGenerator. The former is used if you know that your sitemap is going to be small (i.e., less than 50,000 and less than 10MB), the latter is for large sites.

BaseSitemapGenerator

To use the former BaseSitemapGenerator, we simply inherit a class from it and overwrite GenerateUrlNodes(). In this method, you call WriteUrlLocation and write each page (without the domain information).

public class SitemapIndexGenerator : BaseSitemapIndexGenerator
{

#region Protected Members

/// <summary>
/// Generate all the category link nodes.
/// </summary>

protected override void GenerateUrlNodes()
{
    WriteUrlLocation("sitemap.aspx", UpdateFrequency.Weekly, DateTime.Now);
    WriteUrlLocation("blog.aspx", UpdateFrequency.Daily, DateTime.Now);
}

#endregion
}

Then, it is a matter of calling the appropriate Generate() method, to get the string code back. Very easy.

I normally link the sitemap.xml to generate this on the fly (if it is quick). See this link for more information on this.

BaseSitemapIndexGenerator

This is similar to the above; however, there are a few properties that you can set.:

  • SitemapIndexFileName - this is the base index filename (will normally be sitemap.xml).
  • SitemapFileNameFormat - this is the format to use for each sitemap file generated within the index (default is "sitemap{0}.xml").

Normally, this will need to be run by a scheduler as it will take a long time to generate.

What is the XML format?

For the sitemaps, the XML format is:

<?xml version="1.0" encoding="utf-8" ?> 
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" 
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
     xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 
         http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
  <url>
    <loc>http://www.domain.com/</loc> 
    <changefreq>weekly</changefreq> 
    <lastmod>2009-03-25</lastmod> 
  </url>
</urlset>

For the sitemap indexes, the XML format is:

<?xml version="1.0" encoding="utf-8" ?> 
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" 
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
       xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 
             http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd">
  <sitemap>
    <loc>http://www.domain.com/sitemap1.xml.gz</loc> 
    <lastmod>2009-03-24</lastmod> 
  </sitemap>
  <sitemap>
    <loc>http://www.domain.com/sitemap2.xml.gz</loc> 
    <lastmod>2009-03-24</lastmod> 
  </sitemap>
</sitemapindex>

History

  • 1.0 - Initial version.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Andrew_Thomas
Chief Technology Officer Intuitive Search Technologies
United Kingdom United Kingdom
Member
Originally from New Zealand, currently work as Development Directory at a software company in the UK specialising in online marketing and advertising.
I have a blog located at: http://andrew.thomas.net.nz, which is all about development in Microsoft .Net, focused on C#, Asp .NET, SQL Server and SEO. Check it out...

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
Hint: For improved responsiveness ensure Javascript is enabled and choose 'Normal' from the Layout dropdown and hit 'Update'.
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralMy vote of 1memberbrsecu17 Jan '12 - 7:51 
GeneralEnumHelpermemberMember 86279131 Jan '11 - 15:52 
The name 'EnumHelper' does not exist in the current context
GeneralCore.Abstractionsmemberjornj7924 Jun '09 - 23:11 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web03 | 2.6.130516.1 | Last Updated 25 Mar 2009
Article Copyright 2009 by Andrew_Thomas
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid