Click here to Skip to main content
15,878,959 members
Articles / Web Development / ASP.NET
Article

Soundex Implementation in C# and VB.NET

Rate me:
Please Sign up or sign in to vote.
4.11/5 (14 votes)
3 Mar 20061 min read 100.9K   45   14
A simple soundex implementation in C# and VB.NET to recognize phonetically similar words based on basic soundex algorithms.

Introduction

While working on adding an English dictionary to a company website, I ran upon the problem of mispelling a word while testing the application.  As this is likely to be a common user error, I decided to read up on basic phonetic matching.  While SQL Server implements the Soundex function, Microsoft Access (the format in which the dictionary is stored) does not.

So the task was simple.  Find an algorithm on the internet that could be used to populate a Soundex field within the database, for use in phonetic comparisons.

Unfortunately, when I went looking for sample code on the internet, most of it was terribly outdated.  Most of the code, written for either VBScript, or Visual Basic 6 or earlier, made heavy use of expensive functions such as MID and LEFT.  These functions, to put it mildly,  are not effecient, when compared to accessing characters directly via a character array.

Since I was going to be processing well over 100,000 articles, I decided to write my own Soundex functions based on standardized algorithms, using a tighter, more effecient loop.  The resulting code is included below.

VISUAL BASIC CODE SAMPLE

Public Shared Function Compute(ByVal Word As String) As String
        Return Compute(Word, 4)
    End Function 
    Public Shared Function Compute(ByVal Word As String, ByVal Length As Integer) As String
        ' Value to return
        Dim Value As String = ""
        ' Size of the word to process
        Dim Size As Integer = Word.Length
        ' Make sure the word is at least two characters in length
        If (Size > 1) Then
            ' Convert the word to all uppercase
            Word = Word.ToUpper()
            ' Conver to the word to a character array for faster processing
            Dim Chars() As Char = Word.ToCharArray()
            ' Buffer to build up with character codes
            Dim Buffer As New System.Text.StringBuilder
            Buffer.Length = 0
            ' The current and previous character codes
            Dim PrevCode As Integer = 0
            Dim CurrCode As Integer = 0
            ' Append the first character to the buffer
            Buffer.Append(Chars(0))
            ' Prepare variables for loop
            Dim i As Integer
            Dim LoopLimit As Integer = Size - 1
            ' Loop through all the characters and convert them to the proper character code
            For i = 1 To LoopLimit
                Select Case Chars(i)
                    Case "A", "E", "I", "O", "U", "H", "W", "Y"
                        CurrCode = 0
                    Case "B", "F", "P", "V"
                        CurrCode = 1
                    Case "C", "G", "J", "K", "Q", "S", "X", "Z"
                        CurrCode = 2
                    Case "D", "T"
                        CurrCode = 3
                    Case "L"
                        CurrCode = 4
                    Case "M", "N"
                        CurrCode = 5
                    Case "R"
                        CurrCode = 6
                End Select 
                ' Check to see if the current code is the same as the last one
                If (CurrCode <> PrevCode) Then
                    ' Check to see if the current code is 0 (a vowel); do not proceed
                    If (CurrCode <> 0) Then
                        Buffer.Append(CurrCode)
                    End If
                End If
                ' If the buffer size meets the length limit, then exit the loop
                If (Buffer.Length = Length) Then
                    Exit For
                End If
            Next
            ' Padd the buffer if required
            Size = Buffer.Length
            If (Size < Length) Then
                Buffer.Append("0", (Length - Size))
            End If
            ' Set the return value
            Value = Buffer.ToString()
        End If
        ' Return the computed soundex
        Return Value
    End Function


C SHARP CODE SAMPLE

public static string Compute(string word)
{
        return Compute(word, 4);
}
public static string Compute(string word, int length)
{
    // Value to return
    string value = "";
    // Size of the word to process
    int size = word.Length;
    // Make sure the word is at least two characters in length
    if (size > 1)
    {
        // Convert the word to all uppercase
        word = word.ToUpper();
        // Convert the word to character array for faster processing
        char[] chars = word.ToCharArray();
        // Buffer to build up with character codes
        StringBuilder buffer = new StringBuilder();
        buffer.Length = 0;
        // The current and previous character codes
        int prevCode = 0;
        int currCode = 0;
        // Append the first character to the buffer
        buffer.Append(chars[0]);
        // Loop through all the characters and convert them to the proper character code
        for (int i = 1; i < size; i++)
        {
            switch (chars[i])
            {
                case 'A':
                    currCode = 0;
                    break;
                case 'E':
                    currCode = 0;
                    break;
                case 'I':
                    currCode = 0;
                    break;
                case 'O':
                    currCode = 0;
                    break;
                case 'U':
                    currCode = 0;
                    break;
                case 'H':
                    currCode = 0;
                    break;
                case 'W':
                    currCode = 0;
                    break;
                case 'Y':
                    currCode = 0;
                    break;
                case 'B':
                    currCode = 1;
                    break;
                case 'F':
                    currCode = 1;
                    break;
                case 'P':
                    currCode = 1;
                    break;
                case 'V':
                    currCode = 1;
                    break;
                case 'C':
                    currCode = 2;
                    break;
                case 'G':
                    currCode = 2;
                    break;
                case 'J':
                    currCode = 2;
                    break;
                case 'K':
                    currCode = 2;
                    break;
                case 'Q':
                    currCode = 2;
                    break;
                case 'S':
                    currCode = 2;
                    break;
                case 'X':
                    currCode = 2;
                    break;
                case 'Z':
                    currCode = 2;
                    break;
                case 'D':
                    currCode = 3;
                    break;
                case 'T':
                    currCode = 3;
                    break;
                case 'L':
                    currCode = 4;
                    break;
                case 'M':
                    currCode = 5;
                    break;
                case 'N':
                    currCode = 5;
                    break;
                case 'R':
                    currCode = 6;
                    break;
             }
// Check to see if the current code is the same as the last one
        if (currCode != prevCode)
        {
            // Check to see if the current code is 0 (a vowel); do not process vowels
            if (currCode != 0)
                buffer.Append(currCode);
        }
        // Set the new previous character code
        prevCode = currCode;
        // If the buffer size meets the length limit, then exit the loop
        if (buffer.Length == length)
            break;
        }
            // Pad the buffer, if required
            size = buffer.Length;
            if (size < length)
                buffer.Append('0', (length - size));
            // Set the value to return
            value = buffer.ToString();
        }
        // Return the value
        return value;
    }
}

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Web Developer
United States United States
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
QuestionThanks. This works great! Pin
yahoo6614-Dec-16 10:44
yahoo6614-Dec-16 10:44 
QuestionAlternative for Soundex Algorithm Pin
shawnablu28-Oct-14 5:23
shawnablu28-Oct-14 5:23 
AnswerRe: Alternative for Soundex Algorithm Pin
Member 1110501820-Mar-15 20:40
Member 1110501820-Mar-15 20:40 
QuestionLicense Pin
KerenYaari26-Jun-14 5:51
KerenYaari26-Jun-14 5:51 
BugPlease correct the logic Pin
Dr.Alrawi28-Nov-11 10:34
Dr.Alrawi28-Nov-11 10:34 
GeneralA note about VB.NET Chars Pin
Dimitri Troncquo22-Feb-11 3:39
Dimitri Troncquo22-Feb-11 3:39 
GeneralThanks! Pin
iskandar700718-Aug-10 22:20
iskandar700718-Aug-10 22:20 
GeneralSlightly Optimized and Compact version Pin
Member 31105728-Dec-09 4:02
Member 31105728-Dec-09 4:02 
<code>
            public static string Soundex(string word, int length)
            {
                  // Value to return
                  string value = string.Empty;
                  // Make sure the word is at least two characters in length
                  if (!string.IsNullOrEmpty(word) &amp;&amp; word.Length &gt; 1)
                  {
                        // Convert the word to all uppercase
                        word = word.ToUpper();
                        // The current and previous character codes
                        int prevCode = 0;
                        int currCode = 0;
                        // Append the first character
                        value += word[0];
                        // Loop through all the characters and convert them to the proper character code
                        for (int i = 1; i &lt; word.Length; i++)
                        {
                              switch (word[i])
                              {
                                    case 'A':
                                    case 'E':
                                    case 'I':
                                    case 'O':
                                    case 'U':
                                    case 'H':
                                    case 'W':
                                    case 'Y':
                                          currCode = 0;
                                          break;
                                    case 'B':
                                    case 'F':
                                    case 'P':
                                    case 'V':
                                          currCode = 1;
                                          break;
                                    case 'C':
                                    case 'G':
                                    case 'J':
                                    case 'K':
                                    case 'Q':
                                    case 'S':
                                    case 'X':
                                    case 'Z':
                                          currCode = 2;
                                          break;
                                    case 'D':
                                    case 'T':
                                          currCode = 3;
                                          break;
                                    case 'L':
                                          currCode = 4;
                                          break;
                                    case 'M':
                                    case 'N':
                                          currCode = 5;
                                          break;
                                    case 'R':
                                          currCode = 6;
                                          break;
                              }
                              // Add only if the current code is not the same as the last one and the current code is not 0 (a vowel)
                              if (currCode != prevCode &amp;&amp; currCode != 0)
                                    value += currCode;
                              // Set the new previous character code
                              prevCode = currCode;
                              // If the buffer size meets the length limit, then exit the loop
                              if (value.Length == length)
                                    break;
                        }
                        // Pad the buffer, if required
                        value = value.PadRight(length, '0');
                  }
                  // Return the value
                  return value;
            }
</code>

P.S: I have removed the StringBuilder on purpose as it has negligible affect on a small number of string concatenations (usually 4 in this case).
GeneralUseful Example Pin
CodeMasterMP15-Sep-07 13:57
CodeMasterMP15-Sep-07 13:57 
GeneralOracle Soundex Pin
MonkeyMafia27-Jun-07 23:15
MonkeyMafia27-Jun-07 23:15 
GeneralA Note About Soundex Pin
Mike C#2-Feb-07 19:28
Mike C#2-Feb-07 19:28 
GeneralCleaner C# Pin
Furty4-Mar-06 16:35
Furty4-Mar-06 16:35 
GeneralRe: Cleaner C# Pin
renzea4-Mar-06 16:58
renzea4-Mar-06 16:58 
GeneralRe: Cleaner C# Pin
RK KL28-Mar-06 10:19
RK KL28-Mar-06 10:19 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.