Click here to Skip to main content
15,861,172 members
Articles / Mobile Apps / Windows Mobile
Article

Implement Phonetic ("Sounds-like") Name Searches with Double Metaphone Part V: .NET Implementation

Rate me:
Please Sign up or sign in to vote.
4.74/5 (18 votes)
19 Mar 20076 min read 274.9K   5.2K   103   43
Presents a C# implementation of Double Metaphone, for use with any of the .NET languages.

Abstract

Simple information searches -- name lookups, word searches, etc. -- are often implemented in terms of an exact match criterion. However, given both the diversity of homophonic (pronounced the same) words and names, as well as the propensity for humans to misspell surnames, this simplistic criterion often yields less than desirable results, in the form of reduced result sets, missing records that differ by a misplaced letter or different national spelling.

This article series discusses Lawrence Phillips' Double Metaphone phonetic matching algorithm, and provides several useful implementations, which can be employed in a variety of solutions to create more useful, effective searches of proper names in databases and other collections.

Introduction

This article series discusses the practical use of the Double Metaphone algorithm to phonetically search name data, using the author's implementations written for C++, COM (Visual Basic, etc.), scripting clients (VBScript, JScript, ASP), SQL, and .NET (C#, VB.NET, and any other .NET language). For a discussion of the Double Metaphone algorithm itself, and Phillips' original code, see Phillips' article in the June 2000 CUJ, available here.

Part I introduces Double Metaphone and describes the author's C++ implementation and its use. Part II discusses the use of the author's COM implementation from within Visual Basic. Part III demonstrates use of the COM implementation from ASP and with VBScript. Part IV shows how to perform phonetic matching within SQL Server using the author's extended stored procedure. Part V demonstrates the author's .NET implementation. Finally, Part VI closes with a survey of phonetic matching alternatives, and pointers to other resources.

Background

Part I of this article series discussed the Double Metaphone algorithm, its origin and use, and the author's C++ implementation. While this section summarizes the key information from that article, readers are encouraged to review the entire article, even if the reader has no C++ experience.

The Double Metaphone algorithm, developed by Lawrence Phillips and published in the June 2000 issue of C/C++ Users Journal, is part of a class of algorithms known as "phonetic matching" or "phonetic encoding" algorithms. These algorithms attempt to detect phonetic ("sounds-like") relationships between words. For example, a phonetic matching algorithm should detect a strong phonetic relationship between "Nelson" and "Nilsen", and no phonetic relationship between "Adam" and "Nelson."

Double Metaphone works by producing one or possibly two phonetic keys, given a word. These keys represent the "sound" of the word. A typical Double Metaphone key is four characters long, as this tends to produce the ideal balance between specificity and generality of results.

The first, or primary, Double Metaphone key represents the American pronunciation of the source word. All words have a primary Double Metaphone key.

The second, or alternate, Double Metaphone key represents an alternate, national pronunciation. For example, many Polish surnames are "Americanized", yielding two possible pronunciations, the original Polish, and the American. For this reason, Double Metaphone computes alternate keys for some words. Note that the vast majority (very roughly, 90%) of words will not yield an alternate key, but when an alternate is computed, it can be pivotal in matching the word.

To compare two words for phonetic similarity, one computes their respective Double Metaphone keys, and then compares each combination:

  • Word 1 Primary - Word 2 Primary
  • Word 1 Primary - Word 2 Alternate
  • Word 1 Alternate - Word 2 Primary
  • Word 1 Alternate - Word 2 Alternate

Obviously if the keys in any of these comparisons are not produced for the given words, the comparisons involving those keys are not performed.

Depending upon which of the above comparisons matches, a match strength is computed. If the first comparison matches, the two words have a strong phonetic similarity. If the second or third comparison matches, the two words have a medium phonetic similarity. If the fourth comparison matches, the two words have a minimal phonetic similarity. Depending upon the particular application requirements, one or more match levels may be excluded from match results.

.NET implementation

The .NET implementation of Double Metaphone is very similar in design and use to the C++ implementation presented in Part I. To use the .NET implementation, simply add the Metaphone.NET.dll assembly to your project's references in Visual Studio. NET, import the nullpointer.Metaphone namespace into the source files, and instantiate the DoubleMetaphone or ShortDoubleMetaphone classes, for string and unsigned short Metaphone keys, respectively.

For example, to compute the Metaphone keys for the name "Nelson", code similar to that listed below may be used (C# code listed; the .NET implementation is callable from VB.NET, J#, and all other .NET languages):

C#
using nullpointer.Metaphone;

DoubleMetaphone mphone = new DoubleMetaphone("Nelson");
System.Console.WriteLine(String.Format("{0} {1}",
                             mphone.PrimaryKey,
                            mphone.AlternateKey));

Note that the Metaphone keys are obtained via the PrimaryKey and AlternateKey properties.

As with the C++ implementation, an existing instance of a DoubleMetaphone or ShortDoubleMetaphone class can be used to compute the Metaphone keys for a new word, by calling the computeKeys method:

C#
using nullpointer.Metaphone;

DoubleMetaphone mphone = new DoubleMetaphone();
mphone.computeKeys("Nelson");
System.Console.WriteLine(String.Format("{0} {1}",
                             mphone.PrimaryKey,
                             mphone.AlternateKey));

As with all of the implementations presented in this article series, a sample application—CS Word Lookup--written in C# is presented to demonstrate the use of the .NET implementation. CS Word Lookup uses a Hashtable collection class to map Metaphone phonetic keys to an ArrayList class, containing the words which produce the said Metaphone keys.

Performance notes

While the .NET CLR performs reasonably well, it must be stated that the C++ implementation of Double Metaphone will likely perform significantly faster than the .NET version, due primarily to the fact that the C++ version judiciously avoids memory allocation and buffer copies, while the .NET implementation is unable to avoid such constructs. The ambitious reader is encouraged to optimize the .NET implementation, perhaps through the use of the unsafe keyword, to perform direct memory access, at the expense of CLR compliance.

Conclusion

This brief article introduced the author's .NET implementation of Double Metaphone, including code snippets and a brief discussion of performance issues. Continue to Part VI for a review of alternative phonetic matching techniques, and a list of phonetic matching resources, including links to other Double Metaphone implementations.

History

  • 7-22-03 Initial publication
  • 7-31-03 Added hyperlinks between articles in the series

Article Series

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Web Developer
United States United States
My name is Adam Nelson. I've been a professional programmer since 1996, working on everything from database development, early first-generation web applications, modern n-tier distributed apps, high-performance wireless security tools, to my last job as a Senior Consultant at BearingPoint posted in Baghdad, Iraq training Iraqi developers in the wonders of C# and ASP.NET. I am currently an Engineering Director at Dell.

I have a wide range of skills and interests, including cryptography, image processing, computational linguistics, military history, 3D graphics, database optimization, and mathematics, to name a few.

Comments and Discussions

 
QuestionHow do I run this .net code Pin
Member 1401597511-Oct-18 5:15
Member 1401597511-Oct-18 5:15 
Questionwhat i do for this keyword when i enter it not give output what i want? Pin
ashishgupta121222-Oct-14 1:35
ashishgupta121222-Oct-14 1:35 
SuggestionModified code for more idiomatic, C# portable library Pin
Daniele Fusi23-May-14 23:46
Daniele Fusi23-May-14 23:46 
QuestionXPMetaphone in 64 bit? Pin
Dunc_NZ10-Apr-11 10:32
Dunc_NZ10-Apr-11 10:32 
GeneralStrongly signed dll Pin
Chris Copac2-Mar-11 10:59
Chris Copac2-Mar-11 10:59 
GeneralRe: Strongly signed dll Pin
Adam Nelson15-Mar-11 14:15
Adam Nelson15-Mar-11 14:15 
GeneralNice implementation of a killer algorithm Pin
Dimitri Troncquo22-Feb-11 22:55
Dimitri Troncquo22-Feb-11 22:55 
GeneralRe: Nice implementation of a killer algorithm Pin
Adam Nelson23-Feb-11 4:02
Adam Nelson23-Feb-11 4:02 
Generalhelp implementing this in asp.net Pin
velascojames23-Aug-10 22:59
velascojames23-Aug-10 22:59 
Generaldll required Pin
P20098-Sep-09 8:43
P20098-Sep-09 8:43 
GeneralSuper implementation thanks Pin
Paul Sinnema12-Mar-09 6:25
Paul Sinnema12-Mar-09 6:25 
GeneralRe: Super implementation thanks Pin
Adam Nelson12-Mar-09 10:28
Adam Nelson12-Mar-09 10:28 
QuestionLicensing? Pin
Casey Gum10-Nov-08 10:40
Casey Gum10-Nov-08 10:40 
AnswerRe: Licensing? Pin
Adam Nelson10-Nov-08 11:11
Adam Nelson10-Nov-08 11:11 
QuestionMetaphone.NET.dll Pin
sanjutvj30-May-07 0:52
sanjutvj30-May-07 0:52 
QuestionError on x64 bit compile Pin
terry091716-Apr-07 5:30
terry091716-Apr-07 5:30 
AnswerRe: Error on x64 bit compile Pin
cp197017-Apr-07 4:31
cp197017-Apr-07 4:31 
Question.NET 2.0 & Inconsistent results Pin
Mike Renno5-Dec-06 3:49
Mike Renno5-Dec-06 3:49 
AnswerRe: .NET 2.0 & Inconsistent results Pin
Adam Nelson5-Dec-06 5:49
Adam Nelson5-Dec-06 5:49 
GeneralFound a couple of bugs Pin
Mike Renno9-Jan-07 10:16
Mike Renno9-Jan-07 10:16 
GeneralRe: Found a couple of bugs Pin
Adam Nelson1-Feb-07 5:29
Adam Nelson1-Feb-07 5:29 
GeneralRe: Found a couple of bugs Pin
Mike Renno6-Feb-07 3:47
Mike Renno6-Feb-07 3:47 
GeneralRe: Found a couple of bugs Pin
Adam Nelson1-Feb-07 5:43
Adam Nelson1-Feb-07 5:43 
QuestionChecking for null alternate keys with unsigned short Pin
mill40237-Aug-06 11:25
mill40237-Aug-06 11:25 
AnswerRe: Checking for null alternate keys with unsigned short Pin
Adam Nelson25-Sep-06 4:11
Adam Nelson25-Sep-06 4:11 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.