Introduction
I'm sure everyone reading this is familiar with spam. There are two schools of thought when it comes to fighting spam:
- Bayesian filtering, such as POPFile.
- Challenge/response human verification, such as SpamArrest.
Both of these approaches have their pros and cons, of course. This article will only deal with the second technique: verifying that the data you are receiving is coming from an actual human being and not a robot or script. A CAPTCHA is a way of testing input to ensure that you're dealing with a human. Now, there are a lot of ways to build a CAPTCHA, as documented in this MSDN article on the subject, but I will be focusing on a visual data entry CAPTCHA.
There's already a great ASP.NET article on a CAPTCHA control here on CodeProject, so you may be wondering what this article is for. I wanted to rebuild that solution for the following reasons:
- more control settings and flexibility
- conversion to my preferred VB.NET language
- abstracted into a full blown ASP.NET server control.
So, this article will document how to turn a set of existing ASP.NET web pages into a simple, drag and drop ASP.NET server control -- with a number of significant enhancements along the way.
Implementation
The first thing I had to deal with was the image generated by the CAPTCHA class. This was originally done with a dedicated .aspx form-- something that won't exist for a server control. How could I generate an image on the fly? After some research, I was introduced to the world of HttpModules and HttpHandlers. They are extremely powerful -- and a single HttpHandler
solves this problem neatly.
All we need is a small Web.config modification in the <system.web>
section:
<httpHandlers>
<add verb="GET" path="CaptchaImage.aspx"
type="WebControlCaptcha.CaptchaImageHandler, WebControlCaptcha" />
</httpHandlers>
This handler defines a special page named CaptchaImage.aspx. Now, this "page" doesn't actually exist. When a request for CaptchaImage.aspx occurs, it will be intercepted and handled by a class that implements the IHttpHandler
interface: CaptchaImageHandler
. Here's the relevant code section:
Public Sub ProcessRequest(ByVal context As System.Web.HttpContext) _
Implements System.Web.IHttpHandler.ProcessRequest
Dim app As HttpApplication = context.ApplicationInstance
Dim strGuid As String = Convert.ToString(app.Request.QueryString("guid"))
Dim ci As CaptchaImage
If strGuid = "" Then
ci = New CaptchaImage
Else
ci = CType(app.Context.Cache(strGuid), CaptchaImage)
app.Context.Cache.Remove(strGuid)
End If
ci.Image.Save(app.Context.Response.OutputStream, _
Drawing.Imaging.ImageFormat.Jpeg)
app.Response.ContentType = "image/jpeg"
app.Response.StatusCode = 200
app.Response.End()
End Sub
A new CAPTCHA image will be generated, and the image streamed directly to the browser from memory. Problem solved!
However, there's another problem. There has to be communication between the HttpHandler
responsible for displaying the image, and the web page hosting the control -- otherwise, how would the calling control know what the randomly generated CAPTCHA text was? If you view source on the rendered control, you'll see that a GUID is passed in through the querystring:
<img src="CaptchaImage.aspx?guid=99fecb18-ba00-4b60-9783-37225179a704"
border='0'>
This GUID (globally unique identifier) is a key used to access a CAPTCHA object that was originally stored in the ASP.NET Cache by the control. Take a look at the CaptchaControl.GenerateNewCaptcha
method:
Private Sub GenerateNewCaptcha()
LocalGuid = Guid.NewGuid.ToString
If Not IsDesignMode Then
HttpContext.Current.Cache.Add(LocalGuid, _captcha, Nothing, _
DateTime.Now.AddSeconds(HttpContext.Current.Session.Timeout), _
TimeSpan.Zero, Caching.CacheItemPriority.NotRemovable, Nothing)
End If
Me.CaptchaText = _captcha.Text
Me.GeneratedAt = Now
End Sub
It may seem a little strange, but it works great! The sequence of ASP.NET events is as follows:
- Page is rendered.
- Page calls
CaptchaControl1.OnPreRender
. This generates a new GUID and a new CAPTCHA object reflecting the control properties. The resulting CAPTCHA object is stored in the Cache by GUID. - Page calls
CaptchaControl1.Render
; the special <img>
tag URL is written to the browser. - Browser attempts to retrieve the special
<img>
tag URL. CaptchaImageHandler.ProcessRequest
fires. It retrieves the GUID from the querystring, the CAPTCHA object from the Cache, and renders the CAPTCHA image. It then removes the Cache object.
Note that there is a little cleanup involved at the end. If, for some reason, the control renders but the image URL is never retrieved, there would be an orphan CAPTCHA object in the Cache. This can happen, but should be rare in practice-- and our Cache entry only has a 20 minute lifetime anyway.
One mistake I made early on was storing the actual CAPTCHA text in the ViewState. The ViewState is not encrypted and can be easily decoded! I've switched to ControlState for the GUID, which is essential for retrieving the shared Captcha control from the Cache -- but by itself, it is useless.
CaptchaControl Properties
The CaptchaControl
is a good ASP.NET citizen, and properly implements all the default ASP.NET Server Control properties. It also has a few properties of its own:
Property | Default | Description |
CacheStrategy | HttpRuntime | For security reasons, the CAPTCHA text is never sent to the client; it is only stored on the server. It can be stored in Session (web-farm friendly) or HttpRuntime (very fast, but local to one webserver). |
CaptchaBackgroundNoise | Low | Amount of background noise to add to the CAPTCHA image. Ranges from None to Extreme . |
CaptchaChars | A-Z, 1-9 | A whitelist of characters to use when building CAPTCHA text. A character will be picked randomly from this string. By default, I omit some characters likely to be confused, such as O, 0, I, 1, 8, B, etcetera. |
CaptchaFont | "" | Font family to use for the CAPTCHA text. If not provided, a random installed font will be chosen for each character. A font whitelist is maintained internally so only known legible fonts will be used (e.g., not WingDings). |
CaptchaFontWarping | Low | Level of warping used on each character of the CAPTCHA text. Ranges from None to Extreme . |
CaptchaHeight | 50 | Default height of the CAPTCHA image, in pixels. |
CaptchaLength | 5 | Number of characters used in the randomly generated CAPTCHA text. |
CaptchaLineNoise | None | Amount of "scribble" line noise to add to the CAPTCHA image. Ranges from None to Extreme . |
CaptchaMaxTimeout | 90 | Number of seconds that the CAPTCHA will remain valid and stored in the cache after it is generated. |
CaptchaMinTimeout | 3 | Minimum number of seconds the user must wait before entering a CAPTCHA. |
CaptchaWidth | 180 | Default width of the CAPTCHA image, in pixels. |
UserValidated | False | After postback, returns True if the user entered text that matches the randomly generated CAPTCHA text. Note that the standard IValidation interface is implemented as well. |
LayoutStyle | Horizontal | Determines if the text and input box are to the right, or below, the image. Allows greater layout flexibility. |
Many of these properties have to do with the inherent tradeoff between human readability and machine readability. The harder a CAPTCHA is for OCR software to read, the harder it will be for us human beings, too! For illustration, compare these two CAPTCHA images:
The CAPTCHA on the left is generated with all "medium" settings, which are a reasonable tradeoff between human readability and OCR machine readability. The CAPTCHA on the right uses a lower CaptchaFontWarping
, and a smaller CaptchaLength
. If the risk of someone writing OCR scripts to defeat your CAPTCHA is low, I strongly urge you to use the easier-to-read CAPTCHA settings. Remember, just having a CAPTCHA at all raises the bar quite high.
The CaptchaTimeout
property was added later to alleviate concerns about CAPTCHA farming. It is possible to "pay" humans to solve harvested CAPTCHAs by re-displaying a CAPTCHA and giving the user free MP3s or access to pornography if they solve it. However, this technique takes time, and it doesn't work if the CAPTCHA has a time-limited expiration.
Conclusion
Many thanks to BrainJar for creating his simple yet effective CAPTCHA image class. Now that I've wrapped it up into an ASP.NET server control, it should be easier than ever to simply drop on a web form, set a few properties, and start defeating spammers at their own game!
There are many more details and comments in the demonstration solution provided at the top of the article, so check it out. And please don't hesitate to provide feedback, good or bad! I hope you enjoyed this article. If you did, you may also like my other articles as well.
History
- Monday, November 8, 2004 - Published.
- Friday, December 17, 2004 - Version 1.1
- added
UserValidationEvent
- changed defaults to be less aggressive (more user friendly)
- added
LayoutStyle
property for choice of horizontal or vertical layout - changed random font approach from blacklist to whitelist
- corrected intermittent order-of-retrieval bug reported by Robert Sindall
- converted to VB.NET 2005 compatible XML comments
- Sunday, October 29, 2006 - Version 2.0
- major rewrite for .NET 2.0
- removed dependency on
Session
- removed dependency on
ViewState
- uses
ControlState
to store GUID - implemented standard
IValidator
- complete rewrite of the renderer for more secure CAPTCHAs
- added more tweakable properties
- switched to
HttpRuntime
caching - changed cache priority to
Caching.CacheItemPriority.NotRemovable
(this was a bug, fixed in the old and new versions)
- Monday, January 29, 2007 - Version 2.1
- Correct length bug
- Correct caching bug (units were set to minutes, not seconds!)
- Add option to store CAPTCHA text in
Session
for web farms - Add minimum time to prevent aggressive robots
- Improved response messages to display exactly why the CAPTCHA was rejected (timed out, bad entry, too fast)