Contents
In this article, I will try to show you how to create an HttpModule to handle requests and redirect (HTTP status codes 301 and 302) them to another URL, in order to have normalized URLs and avoid duplicate content.
This article is not directly related to and does not cover URL rewriting nor ASP.NET Routing.
URL normalization or URL canonicalization is the process by which URLs are modified and standardized in a consistent manner. The goal of the normalization process is to transform a URL into a normalized URL so it is possible to determine if two syntactically different URLs are equivalent.
For Search Engine Optimization, it is very important to have canonical URLs in order to avoid duplicate content. You can read further about this subject on this SEO advice from Google's Matt Cutts: mattcutts.com/blog/seo-advice-Url-canonicalization/.
There are several types of normalization that may be performed. Here is a list of the most commonly used in web projects:
- Removing "www" as the first domain label
http://www.example.com/ to http://example.com/
- Adding or removing trailing slash
http://example.com/display/ to http://example.com/display
http://example.com/display to http://example.com/display/
- Removing the directory index file name
http://example.com/display/index.html to http://example.com/display/
To handle every request to determine if there is a match to redirect, you have to subscribe to the BeginRequest
event of the application.
Shown below is the code for RedirectRequest
. It is important not to evaluate every URL pattern on every request because it can impact performance as this method is called on every request.
public void RedirectRequest(HttpContextBase context,
RedirectorConfiguration config)
{
HttpRequestBase request = context.Request;
HttpResponseBase response = context.Response;
string rawUrl = request.Url.AbsoluteUri;
if (request.HttpMethod.ToUpper() != "GET" &&
request.HttpMethod.ToUpper() != "HEAD")
{
return;
}
if (!String.IsNullOrEmpty(config.IgnoreRegex))
{
if (Regex.IsMatch(rawUrl, config.IgnoreRegex))
{
return;
}
}
foreach (RedirectorUrlGroup group in config.UrlGroups)
{
if (Regex.IsMatch(rawUrl, group.Regex))
{
foreach (RedirectorUrl Url in group.Urls)
{
if (Regex.IsMatch(rawUrl, Url.Regex))
{
string UrlResult = Regex.Replace(rawUrl, Url.Regex, Url.Replacement);
response.StatusCode = Url.ResponseStatus;
response.AddHeader("Location", UrlResult);
response.End();
break;
}
}
break;
}
}
}
Let me summarize the above code.
- Ignore FORM requests.
- Ignore URLs that match the
IgnoreRegex
pattern.
- Group URL patterns to evaluate the minimum amount of
if
statements for every URL.
- Check every URL group; if a group matches (only then), evaluate the individual URL patterns.
Use the configuration to determine the URL patterns and redirections.
Here is a sample:
<redirector ignoreRegex=".*(\.css|\.txt|\.js|\.gif|\.jpg|\.png)">
<UrlGroups>
<add regex="/news-items/.+">
-->
<Urls>
<add regex="/news-items/index.html" replacement="/news-items/" />
<add regex="/news-items/article(\d+).html"
replacement="/news-items/interesting-article$1.aspx" />
</Urls>
</add>
<add regex="/press-releases.*">
-->
<Urls>
<add regex="/press-releases(.*)" replacement="/press$1" />
</Urls>
</add>
<add regex="^http://www.contoso.com.*">
-->
<Urls>
<add regex="^http://www.contoso.com(.*)$"
replacement="http://contoso.com$1" />
</Urls>
</add>
</UrlGroups>
</redirector>
To use it in your project, you need to:
- Register the HttpModule in the web.config file.
<httpModules>
<add name="RedirectorModule"
type="SampleRedirector.Modules.RedirectorModule, SampleRedirector"/>
</httpModules>
- Add a configuration file, redirector.config, and reference it in web.config.
<configSections>
<section name="redirector"
type="SampleRedirector.Configuration.RedirectorConfiguration, SampleRedirector" />
</configSections>
<redirector configSource="Redirector.config"/>
- Adapt redirector.config to your needs.
Hope you enjoy it!
History
- January 19th, 2010 - Article submitted.