![]() |
Desktop Development »
Miscellaneous »
General
Intermediate
License: The Code Project Open License (CPOL)
Google Site Map CrawlerBy Summer_sonConsole application that chacks all URLs listed in sitemap.xml file |
C# (C#1.0, C#2.0, C#3.0), Windows (Win2K, WinXP, Win2003, Vista, TabletPC, Embedded), Win32, CEO
|
||||||||
|
Advanced Search Add to IE Search |
|
|
|
||||||||||||||||
Have you
ever thought of trying to validate each URL listed in your sitemap file?
I have a
site with dynamically generated page links. Those links are generated based on
a page title which can be any combination of letters, numbers and symbols. Of
course, the site does remove all forbidden characters from the page title
before generating its URL, trims and shortens it a bit... however errors still
occur from time to time. For example, a page with a title: ''...IS_BROKEN'' ''' due to my URL conversion specifics will have
the following URL: /.IS_BROKEN+ There
are thousands of pages so it�s clear that I can not verify each separate page
that the site�s database contains.
Based on a
list of dynamically generated URLs I generate a sitemap.xml file. Which contains all of the site pages. So each
time a map-file is generated I need to ensure that there are no repeating items
(this may happen if different pages have same titles) and each separate URL is
accessible, i.e. does not produce either bad request, or 404 or anything like
that.
I use XmlDocument class for loading a sitemap.xml; WebRequest and WebResponse classes for determination of whether a URL exists.
| You must Sign In to use this message board. | |||||||||||||||
|
|||||||||||||||
|
|||||||||||||||
|
|||||||||||||||
|
|||||||||||||||
General
News
Question
Answer
Joke
Rant
Admin
Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads.
|
PermaLink |
Privacy |
Terms of Use
Last Updated: 13 Dec 2007 Editor: |
Copyright 2007 by Summer_son Everything else Copyright © CodeProject, 1999-2010 Web18 | Advertise on the Code Project |