When creating a website, you must be careful to avoid duplicate content, as this can result in search engines penalizing your ranking. If a search engine sees duplicated content, it will give that page a very low relevancy score - the idea being that only the original version of any duplicated content is relevant. There are two kinds of duplicated content: intentionally duplicated - i.e., copied from another website, and unintentionally duplicated - i.e., appears in multiple places in the same website when it was only meant to appear once. If your content is intentionally duplicated, then the obvious solution is to remove it, or at least rewrite it in your own words. DotNetNuke does, however, allow you to unintentionally duplicate your content - you may have duplicated content and not even be aware of it. We will look at what causes unintentionally duplicated content, and what to do about it.
The login and register controls
As part of your DotNetNuke website's skin, you probably have the login control and the register controls, which render a hyperlink like this: http://www.mywebsite.com/default.aspx?ctl=Login. Clicking on this link reloads the page and shows only the login or register module instead of the modules you have placed on the page. The problem with this is that the search engine will see the normal page and the page with the login or register control on it as being two separate physical pages. The search engine will see your ctl=Login page as an approximately 80% duplicate of the normal page, and as an approximately 90% duplicate of the ctl=Register page. Since these two controls will probably be on nearly every single page in a DotnetNuke website, the number of duplicates will be the number of pages times 4. Since there is no need at all to have the search engine index the login or register pages anyway, we will just exclude them using robots.txt, the technique for which is explained below.
The privacy and terms controls
Most DotNetNuke skins will also contain the terms control and the privacy control, which render a hyperlink like this: http://www.mywebsite.com/default.aspx?ctl=Terms. Clicking on this link reloads the page and shows only the terms module or privacy module. These two modules are the worst offenders for duplicate content in DotNetNuke websites. Why is that? Take a look at your own DotNetNuke website's terms page and privacy page. These pages contain a very large amount of text which is a 99.99% duplicate of every single other DotNetNuke website out there, the only difference being the website administrator email. By having these two pages, a website has almost exactly duplicated two large pieces of text that can also be found on thousands of other websites. This terms and privacy content text can also be found on the main DotNetNuke website, www.dotnetnuke.com, a site with a very high page ranking of 8 - so will the search engine think that you copied DotNetNuke, or that DotNetNuke copied you? Obviously, the search engine will think that you did the copying. It should then be no surprise that your terms and privacy pages are given a very low ranking, but what effect will it have on the rest of your website's ranking? It is hard to tell, since all search engines are black boxes that won't tell you how they calculate rankings. It definitely will not help a website's ranking, and should be fixed immediately.
Fix duplicated content with robots.txt
Now that we have identified the problem, how do we fix it? The only way is to use a robots.txt file to exclude all pages that end with &ctl=Login, &ctl=Register, &ctl=Privacy, and &ctl=Terms. Unfortunately, the robots.txt file does not support wildcard URLs, so we must manually enter the URL for each and every one of them. If we are using the correct search engine friendly URLs, our robots.txt will look like this:
If using the built-in DotNetNuke friendly URLs, then robots.txt will look like this:
Every single page that has one of these controls in its skin must be included in robots.txt. Once it has been excluded, the search engines will not look at these pages anymore, and your duplicated content will eventually be removed from the search engine's index. Be patient, search engines are not very fast at updating sometimes. Note that the robots.txt file can only be placed at www.yourwebsite.com/robots.txt; the search engines will not look for it anywhere else.
We have seen that it is possible to unintentionally break the search engine rules about duplicated content and suffer a loss in ranking as a result. We have also shown that the solution is a simple set of rules that must be added to your robots.txt file. Some of this duplicated content is worse than others, and since it is quite an easy fix, there really is no excuse not to do it immediately.
Points of interest
This article submission came about as part of a series of DotNetNuke blog posts. The original blog post "Remove Duplicated DotNetNuke Content to improve PR" can be seen at my website www.bestwebsites.co.nz. The previous article in this series, DotNetNuke Search Engine Optimization is also available on CodeProject. We have now also released a DotNetNuke module to automatically generate the contents of your robots.txt file, saving you time and effort - SEO Robots.txt Generator Module for DotNetNuke 4.x.x.
- 28-March-07: Added link to the previous article, DotNetNuke Search Engine Optimization pt1.
- 31-March-07: "DotNetNuke Newsletter Vol. III, Number 3" mentions this article.