Error Detection Based on Check Digit Schemes

Jeffrey Walton

Rate me:

4.72/5 (27 votes)

27 Nov 2007CPOL12 min read

91.2K

A Survey of Popular Check Digit Schemes

Introduction

A Check Digit is an alphanumeric character added to a number to detect errors. Check Digits are a simple and easy way to nuetralize the errors introduced by the human element of keying data. Check Digits are not Error Correcting Codes. Should the reader want to investigate robust Error Correcting Codes, he or she is referred to Error-Correcting Codes by Peterson and Weldon.

It is also notebale that CRC codes are not a check digit scheme, since more than one alphanumeric digit is generated to detect errors. If the reader desires a system for very long datasets or binary data sets, consider using a 32 bit CRC, Adler checksum, or a hash-based checksum. An Adler checksum is similar to a redundancy check, trading speed for reliability. A sample hashing application using CRC32, MD4, MD5, RIPEMD-160, SHA-1, Whirlpool, and SHA-2 family of hashes is presented in A File Checksum Shell Menu Extension Dll.

This article will discuss the following Check Digit Schemes:

Mod 9 Scheme
Mod 7 Scheme
Mod 11 Scheme
3-Weight Method
IBM Scheme
UPC Scheme
Verhoeff Algorithm
ISO 7064 Mod N/N+1

In addition, a superficial treatment of Group Theory is presented following the algorithms.

Common Errors

According to Richard Hamming, the two most common errors are:

12 becomes 21 (adjacent characters are transposed)
112 becomes 122 (incorrect doubling of triples)

Jacobus Verhoeff presents the follow stastics in Error Detecting Decimal Codes:

Error	Approximate Percentage	Comment
a → b	60% - 90%	Single Error
ab → aab	10% - 20%	Adding a Digit
aab → ab	10% - 20%	Omitting a Digit
ab → ba	10% - 20%	Transposition Error
aa → bb	0.5% - 1.5%	Twin Errors
acb → bca	> 1%	Jump Twin Error
13 → 30	0.5% - 1.5%	Phonetic Error (similar pronunciations)

Downloads

There are 5 downloads available with this article. They are presented at the end of the article.

Mod 9 Scheme

The mod 9 system is used by the United States Postal Service on money orders. If a money order number is 123456789 and c = 0, the number imprinted on the money order is 1234567890. The system is c = n % 9. This system is also known as "casting out nines". Casting Out Nines suffers from the fact that 0 → 9 and 9 → 0 are not detected; and no transposition errors are detected (unless it invlolves the check digit).

Mod 7 Scheme

A more robust system than Casting Out Nines is a mod 7 system. Bressoud and Wagon prove the following in Computationl Number Theory:

Single Error Detection Rate is 93.81%
Transposition Error Detection Rate is 93.87%

Similar to the Mod 9 Scheme, c = n % 7. However, only 3 transposition errors are undetectable: 70 ↔ 07, 81 ↔ 18, and 92 ↔ 29.

Mod 11 Scheme

The mod 11 scheme is a weighted scheme also known as International Standard Book Number (ISBN). The first 9 digits represent the unique identifier, the 10^th digit is the check digit. Since the scheme is mod 11, the check digit can be 0-9 or X.

This mod 11 based system operates on a dot product (i.e., the individual digits of the number n). n · (10, 9, ..., 2, 1) mod 10. Stated another way, let a_i be the i^th digit of n (most significant to least significant). Then

c = 10a₁ + 9a₂ + 8a₃ + 7a₄ + 6a₅ + 5a₆ + 4a₇ + 3a₈ + 2a₉ mod 11

The 10^th (a₁₀) digit would be -c so the following equation (called a check equation) holds true.

0 = 10a₁ + 9a₂ + 8a₃ + 7a₄ + 6a₅ + 5a₆ + 4a₇ + 3a₈ + 2a₉ + 1a₁₀ mod 11

This scheme detects all single error and the transposition of any two digits at any distance (assuming the overall number is 10 or fewer digits long).

// Check Digit Sample 3

_tstring number = _T("073560753"); // c should be 2
_tcout << _T("ISBN: ") << number << endl;

int i = 10, c = 0;
for( _tstring::const_iterator it  = number.begin();
                              it != number.end(); i--, it++ )
{
    if( _T('X') == *it )
        { c += 10 * i; }
    else
        { c += CharToNumber( *it ) * i; }

    c %= 11;        
}
    
// a_10 = -c
c = 11 - c;

if( 10 == c )
    { _tcout << _T("Check digit is: ") << _T("X") << endl; }
else
    { _tcout << _T("Check digit is: ") << c << endl; }

3-Weight Method

The 3-Weight Methods is also known as the Electronic Funds Transfer Routing Number Scheme used by the banking industry. It is a mod 10 based system, but rather than operating on n as a whole number, it operates on a dot product (i.e., the individual digits of the number n).

So, given an 8 digit routing number, the check digit c = n · (7, 3, 9, 7, 3, 9, 7, 3), mod 10. Stated another way, let a_i be the i^th digit of n (most significant to least significant). Then

c = 7a₁ + 3a₂ + 9a₃ + 7a₄ + 3a₅ + 9a₆ + 7a₇ + 3a₈ mod 10

c is then concatenated to the 8 digit route, forming a 9 digit routing number

This scheme is based on the fact that multiplication modulo 10 yields a permutation of the 10 decimal digits if the multiplication factor is a digit of the set { 1, 3, 7, 9 }; but only a subset of the decimal digits if the factor is 5 or an even digit. This system cannot detect adjacent transpositions of digits that differ by 5.

The 3-Weight method does not use simple modular reductions, so c = 10 - c is not needed to hold the equality.

The reader should find the following check equation holds true (where a₉is -c):

0 = 7a₁ + 3a₂ + 9a₃ + 7a₄ + 3a₅ + 9a₆ + 7a₇ + 3a₈ + a₉ mod 10

The 3-Weight Scheme is presented below. As the reader can see, it generalizes to a arbitrary length of digits easily. However, if the reader desires a system for very long datasets, or binary data sets consider using a CRC or Alder checksums, or a hash-based checksum.

// Check Digit Sample 2

_tstring number = "12345678";

int i = 0, c = 0;
for( _tstring::const_iterator it  = number.begin();
                              it != number.end(); i++, it++ )
{
    switch( i % 3 + 1 )
    {
        case 1:
            c += 7 * CharToNumber( *it );
            break;
        case 2:
            c += 3 * CharToNumber( *it );
            break;
        case 3:
            c += 9 * CharToNumber( *it );
            break;
        default:
            assert( false );
            break;
    }

    c %= 10;
}

IBM Check Digit Scheme

The IBM Check Digit Scheme is as close to a perfect system as can be obtained while working over the Field of Integers (Z). This scheme is used by companies such as Mastercard and Visa, and Canadian Social Insurance Numbers. The system is also known as Luhn's Algorithm. The scheme detects all Single Errors, and all Transposition Errors except 0 ↔ 9.

The check digit c is defined as the dot product:

-(a₁, a₂, a₃, a₄, ..., a_n-1, a_n) · (2, 1, 2, 1, ..., 1, 2) - r (mod 10)

r is the number of odd indexed digits greater than 5. This means r = r + 1 if and only if a_i> 5, where i is odd.

In practice, the algorithm is implemented quite differently.

// Check Digit Sample 4

int i = 1, c = 0;
for( _tstring::reverse_iterator it  = number.rbegin();
                                it != number.rend(); it++, i++)
{
    if( 0 == i % 2 )
    { 
        // Add even indexes
        c += CharToNumber( *it );
    }
    else
    {
        // Cast Out Nines on the odd indexed digits
        //   after multiplying by 2
        int t = 2 * CharToNumber( *it );
        if( t > 9 ) t -= 9;  // mod 9

        c += t;
    }        
}

UPC Scheme

The Universal Product Code system is similar to IBM's system, except a weight of 3 is given for the even indexed digits. The implementation is left as an exercise to the reader.

Verhoeff Algorithm

Unlike previously exmamined methods (which operate over Z), Verhoeff's Algorithm operates over the dihedral group D₅.

The algorithm is in use by companies such as ESB Networks of the Republic of Ireland. The algorithm is perfect in that it detects all Single Errors and all Transposition Errors. Additionally, it detects most (but not all) Twin Errors, Jump Twin Errors, Jump Transpositions Errors (abc → cba), and Phonetic Errors.

Verhorff's observation was as follows: one can use the 10 symmetries to encode the 10 digits by using the identity for 0; the rotations (in order) for 1, 2, 3, 4; and the reflections for 5, 6, 7, 8, 9.

The final step in preparation for the implementation is defining the permutation of digits.

Let δ be the permutation of digits as follows:

δ performs no transformation on 0
δ switches 1 and 4
δ switches 2 and 3
δ cyclically permutes 5, 8, 6, 9, 7

δ(0) = 0
δ(1) = 4
δ(4) = 1
δ(2) = 3
δ(3) = 2
δ(5) = 8
δ(8) = 6
δ(6) = 9
δ(9) = 7
δ(7) = 5

Unlike Z, D₅ is not communative: 8 ° 7 ≠ 7 ° 8.

°	0	1	2	3	4	5	6	7	8	9
0	0	1	2	3	4	5	6	7	8	9
1	1	2	3	4	0	6	7	8	9	5
2	2	3	4	0	1	7	8	9	5	6
3	3	4	0	1	2	8	9	5	6	7
4	4	0	1	2	3	9	5	6	7	8
5	5	9	8	7	6	0	4	3	2	1
6	6	5	9	8	7	1	0	4	3	2
7	7	6	5	9	8	2	1	0	4	3
8	8	7	6	5	9	3	2	1	0	4
9	9	8	7	6	5	4	3	2	1	0

The reader should realize that D₅ is a complex entity (dysfunctional) whose elements are somewhat arbitrarily mapped to the integers from 0 to 9, and the group operation is not at all like addition or multiplication. Also, there is no concept of order for D₅: < and > have no meaning. The red shading of the table denotes the loss of the Communative Property when multiplying elements.

A small digression: should one desire to perform 6 ° 1, first locate 6^th row, and then the 1^st column. 6 ° 1 = 5.

c = δⁿ(a₁) ° δ^n-1(a₂) ° δ^n-2(a₂) ° ... ° δ²(a_n-1) ° δ(a_n)

So, to encode 1793, the reader would:

c = δ⁴(1) ° δ³(7) ° δ²(9) ° δ¹(3)

δ⁴(1) = δ³(4) = δ²(1) = δ¹(4) = 1
δ³(7) = δ²(5) = δ¹(8) = 6
δ²(9) = δ¹(7) = 5
δ¹(3) = 2

1 ° 6 ° 5 ° 2
(1 ° (6 ° (5 ° 2))) = 4

As with other examples, the inverse of 4 is required so that the check equation hold true. Therefore, the check digit equals 1 (1 ° 4 = 0).

Below is the result of running checkdigit5 against 1000372996. Due to Multiplicative, Permutation, and Inverse Tables, the code is rather easy. Note that c is no longer a sum from a previous stage - it is used as a lookup, and then effectively discarded.

// Check Digit 5

// skip i = 0 - it is the checkdigit
//  (which is being calculated)
int i = 1, c = 0;
for( _tstring::reverse_iterator it  = number.rbegin();
                                it != number.rend(); it++, i++)
{
    // c = M[ c ][ P[ i % 8 ][ CharToNumber( *it ) ]];
    unsigned int n = CharToNumber( *it );
    unsigned int p = P[ i % 8 ][ n ];
                 c = M[ c ][ p ];        
     
}

c = Inv[ c ];

ISO 7064 Mod N/N+1

This section presents an overview of a family of check digits. Implemetations are presented in Downloads.

The ISO specifications are designed for the following alphabets:

numeric (10 digits: 0 to 9)
alphabetic (26 letters: A to Z)
alphanumeric (letters and digits)

The ISO specifications are designed to detect the following (taking from the ISO website):

all single substitution errors (the substitution of a single character for another, for example 4234 for 1234);
all or nearly all single (local) transposition errors (the transposition of two single characters, either adjacent or with one character between them, for example 12354 or 12543 for 12345);
all or nearly all shift errors (shifts of the whole string to the left or right);
a high proportion of double substitution errors (two separate single substitution errors in the same string, for example 7234587 for 1234567);
a high proportion of all other errors.

	Comment
Mod 10/11	European Blood Banks, Unique Identifier for People (U.S. NIH)
Mod 16/17	European Bank Routing
Mod 36/37	U.S. Livestock Identification

Group Theory

A Group is a set of objects and an associated operation. Groups are used everyday. For example, counting people in a room. The Group is over the Integers, and the operation is addition, denoted +. There are other operations that can be performed, such a as multiplication, denoted x.

Under addition, there is an identity element: 0. This provides the expected result: 1 + 0 = 1, 0 + 1 = 1, etc. Under multiplication, the identity element is 1. The same is true with multiplication: 1 x 1 = 1, 5 x 1 = 5, and so forth.

Just as one can operate over the Integers, operations can be performed over D₅. The operations will be defined as follows.

The first operation that will be defined is ρ. This is the Greek letter rho, and it denotes rotate in D₅. ρ⁰ is the identity element under rotate, as it leaves the element unchanged. ρ¹ rotates the dihedral 72°. ρ² rotates the dihedral 144°. And finally, ρ⁵ = ρ⁰.

The next operation that will be defined is σ. This is the Greek letter sigma. It follows rho in the alphabet. It will be used to denote the operation reflection. σ⁰ leaves the element unchanged, while σ¹ 'flips' the element over it's horizontal axis. So σ² = σ⁰. The perceptive reader should realize ρ⁵ = σ².

To visualize the observation, number the partitions of the dihedral as follows.

Now, perform operations on the element. For example, ρ⁷σ⁵. ρ⁷ = ρ², so:

And next, σ⁵ = σ¹:

The final visual representation is pictured below. Note that this was presented to the user as a visual demonstration. It is not meant to visually demonstrate Verhoeff 's Algorithm. Green denotes the new position of partition 2.

Summary

Though a perfect check digit system was presented, academia has provided much more. In an extraordinary 1978 paper, A. S. Sethi, V. Rajaraman, and P. S. Kenjale described a check system which can be used to correct any Single Error or Transposition Error. It is noteworthy that the system requires two check digits rather than one, and is only effective for certain bases.

The reader is encouraged to visit Error Detecting Decimal Digits (included with this article) for Wagner and Putter's experience in developing a check digit system for a commercial mail order house.

Downloads

Revisions

07.27.2007 Fixed Grammatical Errors
06.19.2007 Removed Sample 1 Fodder
06.02.2007 Reworked Introduction
06.02.2007 Added 'Common Error' Section
04.17.2007 Cleaned Formatting
12.19.2006 Added Reference to A File Checksum Shell Menu Extension Dll
12.10.2006 Added ISO 7064 Material
12.09.2006 Updated Example of Verhoeff (expanded hand calculation)
11.26.2006 Initial Release

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Written By

Jeffrey Walton

Systems / Hardware Administrator

United States

This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

°	0	1	2	3	4	5	6	7	8	9
0	0	1	2	3	4	5	6	7	8	9
1	1	2	3	4	0	6	7	8	9	5
2	2	3	4	0	1	7	8	9	5	6
3	3	4	0	1	2	8	9	5	6	7
4	4	0	1	2	3	9	5	6	7	8
5	5	9	8	7	6	0	4	3	2	1
6	6	5	9	8	7	1	0	4	3	2
7	7	6	5	9	8	2	1	0	4	3
8	8	7	6	5	9	3	2	1	0	4
9	9	8	7	6	5	4	3	2	1	0

°	0	1	2	3	4	5	6	7	8	9
0	0	1	2	3	4	5	6	7	8	9
1	1	2	3	4	0	6	7	8	9	5
2	2	3	4	0	1	7	8	9	5	6
3	3	4	0	1	2	8	9	5	6	7
4	4	0	1	2	3	9	5	6	7	8
5	5	9	8	7	6	0	4	3	2	1
6	6	5	9	8	7	1	0	4	3	2
7	7	6	5	9	8	2	1	0	4	3
8	8	7	6	5	9	3	2	1	0	4
9	9	8	7	6	5	4	3	2	1	0

Error Detection Based on Check Digit Schemes

Introduction

Common Errors

Downloads

Mod 9 Scheme

Mod 7 Scheme

Mod 11 Scheme

3-Weight Method

IBM Check Digit Scheme

UPC Scheme

Verhoeff Algorithm

ISO 7064 Mod N/N+1

Group Theory

Summary

Downloads

Revisions

License

Comments and Discussions

°	0	1	2	3	4	5	6	7	8	9
0	0	1	2	3	4	5	6	7	8	9
1	1	2	3	4	0	6	7	8	9	5
2	2	3	4	0	1	7	8	9	5	6
3	3	4	0	1	2	8	9	5	6	7
4	4	0	1	2	3	9	5	6	7	8
5	5	9	8	7	6	0	4	3	2	1
6	6	5	9	8	7	1	0	4	3	2
7	7	6	5	9	8	2	1	0	4	3
8	8	7	6	5	9	3	2	1	0	4
9	9	8	7	6	5	4	3	2	1	0