65.9K
CodeProject is changing. Read more.
Home

Hiding Email Address and URLs from Crawlers

starIconstarIcon
emptyStarIcon
starIcon
emptyStarIconemptyStarIcon

2.60/5 (6 votes)

May 21, 2018

Public Domain

1 min read

viewsIcon

5681

A method of preventing crawlers from seeing email addresses and URLs while still showing links to the user

Introduction / Background

Email addresses in plain text on web sites (whether links or not) are often harvested by crawlers to be used for spamming.

To avoid this, they are often obfuscated by writing them in a form that a human reader can convert back to an email address (e.g. "user at domain dot com"), shown as images, or only shown when the user enters a captcha. These methods are inconvenient for the user/reader (they can't click them).

The same applies to URLs in contexts where those maintaining the web site do not want them to be visible to search engines (for example, to discourage spam in user-submitted content).

A Solution

A simple solution is to embed a client-side script (in the HTML page) that produces what the legitimate user should see (when executed by a web browser), without including the actual value as a single string in the script.

Crawlers generally won't run the scripts, since the results of scripts would not usually be useful to them. If you know of any that do, please specify in a comment.

For example:

Email address:
<SCRIPT LANGUAGE="Javascript">
  document.write('<A HREF="mai' + 'lto' + String.fromCharCode(58) + 'user');
  document.write(String.fromCharCode(64) + 'doma' + 'in.com">user' + String.fromCharCode(64));
  document.write('doma' + 'in.com</A>');
</SCRIPT> <!-- to protect against crawlers -->
<NOSCRIPT>user at domain dot com</NOSCRIPT>

This produces (HTML):

Email address: <a href="mailto:user@domain.com">user@domain.com</a>

Or if JavaScript is not supported by the client:

user at domain dot com

You can manually put a single address in code like this, or have a web application do this for all addresses it outputs.