Regular Expressions are often used to handle all sorts of validation:
- E-mail Addresses
- Postal Codes
- Telephone Numbers
- Dates and Times
- Social Security Numbers
This blog post focuses on validating the last of those: Social Security Numbers and why they may be a bit deceivingly more difficult to validate than you might expect. I’ll provide a few different methods to handle the actual validation through Regular Expressions and let you determine which one might best suit your needs.
Validating Social Security Numbers may not always be just about the format of the number itself.
Background Information: Social Security Numbers
A Social Security number (SSN) is a nine-digit number issued to U.S. citizens, permanent residents and temporary (working) residents under section 205(c)(2) of the Social Security Act. The number is issued to an individual by the Social Security Administration and the primary purpose is to track individuals for the U.S. Social Security program, which can allocate benefits and is commonly referred to as a unique identifier for individuals within the United States for a variety of federal purposes. (The United States is not the only country to use Social Security Numbers, but for demonstration purposes, we will focus on validating U.S. Social Security Numbers.)
The Number Itself
An example demonstrating the different components of the number (which are no longer used for validation)
The number itself can be broken into three major components and appears in the following format:
AAA-GGG-SSSS
or (without dashes):
AAAGGGSSSS
These sections can be broken down as follows:
- The Area Number (A) – This is a three digit number that was used to assign a Social Security number based on geographic location. This could either be the specific office code that the card itself was issued from or based on the zip code that the applicant lived in.
- The Group Number (G) - This is a two digit number that ranges from 00-99 and used a specific set of rules to build the sequence from the first value to the last within the group to avoid consecutive values being used.
- The Serial Number (S) – This is a straight-forward numeric sequence that is output consecutively from 0000-9999.
You may notice a great deal of past-tense within the previous descriptions. This is because on June 25th, 2011, the Social Security Administration revised their assignment process to use a system of randomization that had the following changes:
- Eliminated the geographical significance of the Area Number - The first three digits would now no longer be related in any way to the individual’s geographic location or state.
- Eliminated the Group Number method of assigning values – This doesn’t make any real significant changes as the Group Number will now simply be randomly assigned as well.
- Previously unassigned Area Numbers would now be allowed for assignment – This would allow any previously unused area numbers (with a few exceptions) to be assigned to new Social Security Numbers.
Validating the Number
An advertising mishap ultimately resulted in Social Security Number being printed and distributed into wallets from Woolworths.
The previously mentioned “randomization act” made it significantly easier to use a mechanism such as a Regular Expression to handle validating Social Security Numbers. Since you no longer would have to worry about the highest group number and area number that had been assigned.
However, you still can’t just go throwing any values into there and expecting it to be valid as the Social Security Administration still has a few Social Security Numbers that are explicitly “off limits” as mentioned below.
A Social Security number CANNOT:
- Contain all zeroes in any specific group (ie 000-##-####, ###-00-####, or ###-##-0000)
- Begin with ’666′.
- Begin with any value from ’900-999′
- Be ’078-05-1120′ (due to the Woolworth’s Wallet Fiasco)
- Be ’219-09-9999′ (appeared in an advertisement for the Social Security Administration)
Given those rules, let’s go over a few different methods for validation using Regular Expressions that range in severity and logic.
Basic Format Checking (“I don’t care” Validation)
^\d{3}-\d{2}-\d{4}$
or (without dashes):
^\d{9}$
This method is about as straight-forward as possible and surprisingly, it will actually validate every single possible Social Security Number out there, but the problem is that many of the values that it would accept aren’t even valid values. Let’s look at a simple break-down of this expression:
^ #Beginning of expression
\d{3} #Exactly three digits (denoted by \d and the {3} specifies the number)
- #An explicit 'dash'
\d{2} #Exactly two digits
- #Another explicit 'dash'
\d{4} #Exactly four digits
$ #End of expression
This doesn’t take into consideration any of the previously suggested rules and exceptions that were mentioned above, but it does function to ensure that the number is in the proper format.
You can see an example of this in action below:
Actual Validation (takes into consideration all of the previous rules)
^(?!219-09-9999|078-05-1120)(?!666|000|9\d{2})\d{3}-(?!00)\d{2}-(?!0{4})\d{4}$
or (without dashes):
^(?!219099999|078051120)(?!666|000|9\d{2})\d{3}(?!00)\d{2}(?!0{4})\d{4}$
While this may not look very fun at all, it will validate any valid Social Security Number as per the constraints (and exceptions) listed by the Social Security Administration. Let’s break this one down to see what is going on:
^ #Start of expression
(?!219-09-9999|078-05-1120) #Don't allow "219-09-999" or "078-05-1120" explicitly
(?!666|000|9\d{2})\d{3} #Don't allow the SSN to begin with 666, 000 or anything between 900-999
- #Explicit dash (separating Area and Group numbers)
(?!00)\d{2} #Don't allow the Group Number to be "00"
- #Another dash (separating Group and Serial numbers)
(?!0{4})\d{4} #Don't allow last four digits to be "0000"
$ #End of expression
This should work for just about all major cases that you want to validate for when dealing with Regular Expressions.
You can see an example of this in action below:
Over-The-Top Validation (checks for commonly faked / forged values as well)
^(?!\b(\d)\1+-(\d)\1+-(\d)\1+\b)(?!123-45-6789|219-09-9999|078-05-1120)
(?!666|000|9\d{2})\d{3}-(?!00)\d{2}-(?!0{4})\d{4}$
or (without dashes):
^(?!\b(\d)\1+\b)(?!123456789|219099999|078051120)(?!666|000|9\d{2})\d{3}(?!00)\d{2}(?!0{4})\d{4}$
This is an expression that I wrote that would elaborate a bit more on the previous example and would handle some commonly encountered “fake” values or ones that might easily be forged in an area that might require a user to enter one.
Let’s break this one down:
^ #Start of expression
(?!\b(\d)\1+-(\d)\1+-(\d)\1+\b) #Don't allow all matching digits for every field
(?!123-45-6789|219-09-9999|078-05-1120) #Don't allow "123-45-6789", "219-09-9999" or "078-05-1120"
(?!666|000|9\d{2})\d{3} #Don't allow the SSN to begin with 666, 000 or
anything between 900-999
- #A dash (separating Area and Group numbers)
(?!00)\d{2} #Don't allow the Group Number to be "00"
- #Another dash (separating Group and Serial numbers)
(?!0{4})\d{4} #Don't allow last four digits to be "0000"
$ #End of expression
This expression will not allow a Social Security Number that:
- Contains all zeroes in any specific group (ie 000-##-####, ###-00-####, or ###-##-0000)
- Begins with ’666′
- Begins with any value from ’900-999′
- Is equal to ’078-05-1120′ (due to the Woolworth’s Wallet Fiasco)
- Is equal to ’219-09-9999′ (appeared in an advertisement for the Social Security Administration)
and:
- Contains all matching values (i.e., 000-00-0000, 111-11-1111, 222-22-2222, etc.)
- Contains all incrementing values (i.e., 123-45-6789)
It’s important to keep in mind that although this expression will help foil possible fraudulent attempts, it is a bit “over-the-top” and will actually consider some valid values (such as 123-45-6789) to be invalid. Use your best judgement to determine if this is a viable option for you and your business needs.
You can see this expression in action below:
Using the Social Security Administration’s Validation Service
If you truly need a reliable method of handling Social Security Number validation, the Social Security Administration offers a service that will properly validate a number for you, however it may not always be free or very “quick”. You can visit the following page for more information on using the services they provide:
Additional Examples
You can find an interactive page with the three examples that were mentioned in this post below through JSBin:
You can also access a .NET-specific version of these examples (both with implementations in C# and JavaScript) from the following repository on github:
An experienced Software Developer and Graphic Designer with an extensive knowledge of object-oriented programming, software architecture, design methodologies and database design principles. Specializing in Microsoft Technologies and focused on leveraging a strong technical background and a creative skill-set to create meaningful and successful applications.
Well versed in all aspects of the software development life-cycle and passionate about embracing emerging development technologies and standards, building intuitive interfaces and providing clean, maintainable solutions for even the most complex of problems.