
Introduction
This is the last in a series of six articles, following the design, development and practical use of a fully functional ASP.NET custom control.
The full list of articles is as follows:
These articles are not intended to be a comprehensive look at custom control development (there are 700+ page books that barely cover it), but they do cover a significant number of fundamentals, some of which are poorly documented elsewhere.
The intent is to do so in the context of a single fully reusable and customizable control (as opposed to many contrived examples) with some awareness that few people will want many parts of the overall article but many people will want few parts of it.
This article looks at techniques for customizing the NoSpamEmailHyperlink
to make it unique for any given page without downloading and editing the code directly, making it difficult to incorporate any future improvements to the base control.
It assumes at least a basic knowledge of C# and class inheritance.
Customizing the NoSpamEmailHyperlink
With a few simple overrides, it is possible to completely change the nature of the NoSpamEmailHyperlink
so that each site implementing the control uses it in a slightly different way, making it nearly impossible for email harvesters to detect and decode the email addresses. It is even possible to use numerous derived controls on the same page (as seen in the screenshot above), but this is not recommended.
The aim of this control is to force harvesting software to handle JavaScript as effectively as any browser and thus push up the price of such software, in turn pulling down the profit margins of the email spammers. If all they have to do is detect the link array, or the code key and either use it or move on, this would not cause them too many problems.
If, on the other hand, the control acts slightly different on some sites and very different on others, the email harvesters are going to have to work a lot harder.
The NoSpamEmailHyperlink
offers six properties and three methods which can be overridden to create a completely different control with the same principles.
Custom Coding Key
Replacing the encoding string is easy as long as you follow a couple of simple rules. Create a new control, derived from NoSpamEmailHyperlink
and override the protected .CodeKey
property.
public class NoSpamEmailHyperlinkEx : NoSpamEmailHyperlink
{
protected override string CodeKey
{
get
{
return "QpWlEmR5ToY4UkInO0PiAjS19DbFuGhHvJ2KyLgZc3XtCxVf6BNrdMeszawq78";
}
}
}
It is essential that you never include the same character twice. This will confuse the decoding algorithm. If it finds the duplicated character, it cannot possibly know which character it was translated from.
It is also essential that you only include alphanumeric characters, unless you are also overriding the Encode/Decode functionality to handle it. Any other characters may translate a valid email address to an invalid one (for example, if the first character becomes a hyphen).
It is not essential to include every alphanumeric character in the key. Missing out one or two characters can actually make it more difficult to decode. For example, if you miss the "a" and "A" characters out of the key string, all other characters will be substituted except for the As. Once you realize that a string is encoded using substitution, the last thing you expect is for some characters not to be substituted at all. And yet, the decoding algorithm will handle the missing characters correctly.
Custom Variable Names
Should the NoSpamEmailHyperlink
become excessively popular, there are a number of ways in which a harvester could identify the encoded hyperlinks and discount them, or even decode them without using JavaScript.
Because the NoSpamEmailHyperlink
uses GetType().Name
to build the array names, function names and global-level variable names, any control derived from it will automatically use different names to avoid clashes.
However, a harvester could easily look for arrays with names ending _LinkArray
and discount any links with IDs found in those arrays. Without too much more effort, it could find the _SeedArray
and the ky
variable and attempt to decode them.
But if we change the names of those variables on just a few pages, the process of detecting them becomes a lot more difficult.
public class NoSpamEmailHyperlinkEx : NoSpamEmailHyperlink
{
protected override string CallScriptName {
get {
return GetType().Name + "Elephant";
}
}
protected override string FuncScriptName {
get {
return GetType().Name + "SilverFish";
}
}
protected override string SeedArrayName {
get {
return GetType().Name + "TexasHoldem";
}
}
protected override string LinkArrayName {
get {
return GetType().Name + "Leichtenstein";
}
}
protected override string CodeKeyName {
get {
return "ck";
}
}
}
As you can see, it is not entirely necessary for these strings to be related in any way to their function. For example, the above code changes the name of the seed array definition so that it resembles the following:
var NoSpamEmailHyperlinkExTexasHoldem = new Array("23");
You may know what this means, and the calling script will adjust itself to find the new array name, but the harvester will no longer find the array simply by hunting for _SeedArray
.
Note that all of the above properties, except for the CodeKeyName
are used in the JavaScript at a global level. It is always advisable to use GetType().Name
somewhere in the definition to allow for further controls deriving from yours and failing to override these properties.
Custom Coding Algorithm
For the more adventurous tinkerer, it is also possible to override the .Encode()
and .GetFuncScript()
methods to provide a completely new algorithm for encoding and decoding the email address.
The new algorithm may be as simple or as complex as you like. Just because your favorite algorithm is simple, do not assume that this is a bad thing. As long as it is different, it is more confusing for the harvesters.
Maybe you want to make a simple change, such as accelerating the rate of change in the base number (initially the seed). Simply copy the code as described in NoSpamEmailHyperlink: 3. Email Encoding and Decoding into your derived control and amend it however you please.
For example:
public class NoSpamEmailHyperlinkEx : NoSpamEmailHyperlink
{
protected override string GetFuncScript()
{
#if DEBUG
JavaScriptBuilder jsb = new JavaScriptBuilder(true);
#else
JavaScriptBuilder jsb = new JavaScriptBuilder();
#endif
jsb.AddLine("function ", FuncScriptName, "(link, seed)");
jsb.OpenBlock();
jsb.AddCommentLine("This is the decoding key for all ",
LinkArrayName, " objects");
jsb.AddLine("var ", CodeKeyName, " = \"", CodeKey, "\";");
jsb.AddLine();
jsb.AddCommentLine("Store the innerText so that it doesn't get");
jsb.AddCommentLine("distorted when updating the href later");
jsb.AddLine("var storeText = link.innerText;");
jsb.AddLine();
jsb.AddCommentLine("Initialize variables");
jsb.AddLine("var baseNum = parseInt(seed);");
jsb.AddLine("var atSym = link.href.indexOf(\"@\");");
jsb.AddLine("if (atSym == -1) atSym = 0;");
jsb.AddLine("var dotidx = link.href.indexOf(\".\", atSym);");
jsb.AddLine("if (dotidx == -1) dotidx = link.href.length;");
jsb.AddLine("var scramble = link.href.substring(7, dotidx);");
jsb.AddLine("var unscramble = \"\";");
jsb.AddLine("var su = true;");
jsb.AddLine();
jsb.AddCommentLine("Go through the scrambled section of the address");
jsb.AddLine("for (i=0; i < scramble.length; i++)");
jsb.OpenBlock();
jsb.AddCommentLine("Find each character in the scramble key string");
jsb.AddLine("var ch = scramble.substring(i,i + 1);");
jsb.AddLine("var idx = ", CodeKeyName, ".indexOf(ch);");
jsb.AddLine();
jsb.AddCommentLine("If it isn't there then add the character");
jsb.AddCommentLine("directly to the unscrambled email address");
jsb.AddLine("if (idx < 0)");
jsb.OpenBlock();
jsb.AddLine("unscramble = unscramble + ch;");
jsb.AddLine("continue;");
jsb.CloseBlock();
jsb.AddLine();
jsb.AddCommentLine("Decode the character");
jsb.AddLine("idx -= (su ? -baseNum : baseNum);");
jsb.AddLine("baseNum -= (su ? -Math.pow(i, 2) : Math.pow(i, 2));");
jsb.AddLine("while (idx < 0) idx += ", CodeKeyName, ".length;");
jsb.AddLine("idx %= ", CodeKeyName, ".length;");
jsb.AddLine();
jsb.AddCommentLine("... and add it to the unscrambled email address");
jsb.AddLine("unscramble = unscramble + ",
CodeKeyName, ".substring(idx,idx + 1);");
jsb.AddLine("su = !su;");
jsb.CloseBlock();
jsb.AddLine();
jsb.AddCommentLine("Adjust the href property of the link");
jsb.AddLine("var emAdd = unscramble + ",
"link.href.substring(dotidx, link.href.length + 1);");
jsb.AddLine("link.href = \"mailto:\" + emAdd;");
jsb.AddLine();
jsb.AddCommentLine("If the scrambled email address is also in the text");
jsb.AddCommentLine("of the hyperlink, replace it");
jsb.AddLine("var findEm = storeText.indexOf(scramble);");
jsb.AddLine("while (findEm > -1)");
jsb.OpenBlock();
jsb.AddLine("storeText = storeText.substring(0, findEm) + emAdd ",
"+ storeText.substring(findEm + emAdd.length, storeText.length);");
jsb.AddLine("findEm = storeText.indexOf(scramble);");
jsb.CloseBlock();
jsb.AddLine();
jsb.AddLine("link.innerText = storeText;");
jsb.CloseBlock();
return jsb.ToString();
}
protected override string Encode (string Unencoded)
{
char[] scramble = Email.ToCharArray();
int baseNum = ScrambleSeed;
bool subtract = true;
int atSymbol = Array.IndexOf(scramble, '@');
if (atSymbol == -1) atSymbol = 0;
int stopAt = Array.IndexOf(scramble, '.', atSymbol);
if (stopAt == -1) stopAt = scramble.Length;
for (int i=0; i < stopAt; i++)
{
char ch = scramble[i];
int idx = CodeKey.IndexOf(ch);
if (idx < 0) continue;
idx += (subtract ? -baseNum : baseNum);
baseNum -= (subtract
? -(int)Math.Pow(i, 2) : (int)Math.Pow(i, 2));
while (idx < 0) idx += CodeKey.Length;
idx %= CodeKey.Length;
scramble[i] = CodeKey[idx];
subtract = !subtract;
}
return new string(scramble);
}
}
Only the highlighted lines have been changed, but this is a massive change to the coding algorithm and an extra JavaScript command for the harvesters to understand.
Conclusion
The variations on this theme are limited only by your imagination. You could use multiple keys, perhaps one upper-case and one lower-case key. Perhaps you want to substitute underscores and hyphens, prefixing with a random letter to keep the address valid.
You could simulate the World War II "one time pad" system, by "adding" the first letter of the email address to the first letter of the key, the second letter of the email address to the second letter of the key, and so on.
You do not have to limit yourself to substitution algorithms. You could reverse the characters in both the user and domain segments of the email address (e.g. pdriley@santt.com becomes yelirdp@ttnas.com) or use a more complex transposition algorithm.
It really makes no difference what approach you take, the more people that add their own personal touch to the NoSpamEmailHyperlink
the more painful it becomes for the email harvesters.
Let your imagination run wild.