|
|
Comments and Discussions
|
|
 |

|
You may consider the code to be licensed under the BSD license, which permits commercial use provided you do not represent the code as being your own, usually with a credit in the manual or about box.
|
|
|
|

|
I did n't get the dll of Metaphone.NET. how to get that dll file?
|
|
|
|

|
Getting the following error when i try to compile the code in x64 bit (VS.Net 2005) so i can use it on SQL x64 bit Server.
#############################################################################################
Error 7 fatal error LNK1181: cannot open input file 'opends60.lib' XPMetaphone
#############################################################################################
Please let me know if i need to anything to get rid of this error and compile without errors
thanks in advance
- T
- T
|
|
|
|

|
I am also getting the same error when trying to compile on 64bit to use in SQL server.
Can you help me how I can compile without error?
Thank you,
CP
|
|
|
|

|
Adam, Thanks for your work on these implementations - we’ve been using the extended stored procedure successfully for the past 2½ years. We’ve recently upgraded to SQL Server 2005 and will soon be changing to 64-bit hardware, which requires us to make some changes since 32-bit dlls aren’t supported on the new hardware. We would like to change this over to a CLR implementation, since Microsoft has deprecated extended stored procedures for SQL Server 2005. I’d like to request your help with a couple of issues:
1. Converting the DoubleMetaphone and ShortDoubleMetaphone classes to .NET 2.0, with interfaces suitable for use with the new CREATE ASSEMBLY statement (requires a static method), and accessible via a SQL scalar user-defined function (requires a single output parameter that matches a native SQL data type). We can handle this conversion ourselves, but I was hoping you might take an interest since the days of xp_metaphone.dll appear to be numbered.
2. The .NET implementation you published doesn’t return the same primary and alternate keys as the COM implementation for some names. (We found 1389 differences out of 159,289 names we have indexed.) I took a quick step through in debug and couldn’t see where the problem is, but based on spot checks it appears that the .NET implementation is the one with problems. Here are some examples; I’ll be happy to send you the entire list of differences if you’d like.
AGNEW, ALLOIS: No alternate key from .NET
ALLECIA, ARCHILLA: Different alternate keys
AUTHIER: This case might represent a gap in the algorithm, since neither the COM nor the .NET implementations return the keys I expected. The anglicized pronunciation is au-thir´ (key 0R), while the French pronunciation is o-tya´ (key T).
BAUMB, BAUX: Different primary keys
BEAUBIER, ROZIER: Alternate, primary keys out of sync
Thanks again,
Mike
|
|
|
|

|
Mike:
Thanks for your comments, and I'm glad you've found XP Metaphone useful.
Re-packaging the metaphone impl into a static class with a scalar function shouldn't be too hard. It shouldn't take but a few minutes.
This is the first I've heard of output disparities between the COM and .NET impl. Thanks for brining it to my attention, and with test data no less. I'll investigate further to see about fixing the problem. I might not get to it until the weekend.
Thanks again for your comments.
Adam
|
|
|
|

|
DoubleMetaphone.cs, line 139: Need 5 spaces of padding to handle the "CAESAR" case at line 219 for an input of "C". This same bug exists in the C++ version, but only raises an exception in C#.
DoubleMetaphone.cs, line 144: Need to set m_length = word.Length here, or else move the assignment statement ahead of the padding concatenation in line 139.
With these changes, the C# version returns the same values as xp_metaphone for my 158K test inputs, with the exception of "WJ" - I didn't take the time to track that one down.
I'd still be interested in your thoughts on a CLR implementation for use with SQL Server 2005. Thanks again for your work on this!
|
|
|
|

|
Mike:
Thanks for looking into this, and my apologies for the delayed response.
I've put together a test rig that runs a list of names through Philips' original Double Metaphone impl, my C++ impl, and my C# impl. I didn't see the exception you reported for the 'CAESAR' case, but I do see several names producing different results under C# vs C++. I'm looking into this now.
Regarding SQL Server, it seems a static class with the [SqlFunction] attribute wrapping the existing DoubleMetaphone class would do the trick.
Adam
|
|
|
|

|
Adam,
Thanks for your response. We ran into a few glitches with the CLR implementation for SQL Server:
1. SQL Server apparently doesn't allow namespaces in CLR classes, so we had to remove this from your original source.
2. Only a single dll file can be registered via the CREATE ASSEMBLY statement, so we had to combine the source files in order to use ShortDoubleMetaphone.
3. A SQL scalar UDF can only return a single parameter, so there wasn't a clean mapping to replace xp_metaphone with its separate output parameters for the primary and alternate metaphone keys. We opted to combine the values into a single BINARY(4) output parameter and then parse this back into two SMALLINTs after the UDF call, but this seems like a kludge. This is also where we ran into the glitch for the "WF" input parameter - we got x0000 back instead of the expected xFFFF for the alternate key.
What we have is working, but I would still be interested in your thoughts regarding a well-thought-out approach for SQL 2005.
Sorry if my 16:16 9 Jan '07 posting was unclear - the exception occurred at line 139 for an input of "C" rather than "CAESAR". The change to use 5 spaces of padding has corrected this.
Thanks again for your good work on this.
Mike Renno
|
|
|
|

|
Mike:
I've implemented the fixes you proposed, and my test rig now confirms the C# impl produces identical results for all 21k test names, including 'CAESAR' and 'WJ'. I'm going to update the article with the new code, but that's done via email and may take some time; in the meanwhile, I could send you the code if you like.
Adam
|
|
|
|

|
First, I just want to thank the author for this code. It works great. I'm looking at using the extended stored procedure as well as the .net assembly.
So my question is how do we know if a particular word has no alternate key when we are using the unsigned short version of the keys?
|
|
|
|

|
I'm glad you found my code useful.
An alternate key of '0' should not be considered valid, so as long as you don't compare two 0 keys for equality, you should be fine. Is there some other reason you need to detect null keys?
Adam
|
|
|
|

|
Thanks. When I asked the question, I was trying to figure out what value represented the lack of an alternate key for a word. In the SQL xp implementation, you get a null value, but with ShortDoubleMetaphone, you get 65535. For several reasons, we wanted to compute the metaphone keys in our .Net app and compare them against a table of keys in SQL.
The only tricky part was figuring out how to translate between SQL server's smallint values and .Net's UInt16 values. So what I ended up doing is converting the results of the SQL XP to the equivalent UInt16 values and storing those in the key tables. Here is what worked well for us:
--This is the value used to represent a null or invalid metaphone key
DECLARE @maxKeyValue int SET @maxKeyValue = 65535
EXEC master..xp_metaphone @WorkWord, @primaryMetaphoneTemp output, @alternateMetaphoneTemp output
if @alternateMetaphoneTemp is null
set @alternateMetaphone = @maxKeyValue
else
if @alternateMetaphoneTemp < 0 --convert this smallint value to the equivalent unsigned int value
set @alternateMetaphone = @alternateMetaphoneTemp + @maxKeyValue + 1
else
set @alternateMetaphone = @alternateMetaphoneTemp
|
|
|
|

|
Cool, that's a good solution.
|
|
|
|

|
I think I might have found a small bug in your otherwise excellent code (thanks for doing this!).
When running CSWordLookup with this dictionary file the function nullpoint.Metaphone.DoubleMetaphone.areStringsAt(start,length,strings) failed with a index out of range error. I added a simple check to fix it. Modified function:
private bool areStringsAt(int start, int length, params String[] strings)
{
if (start < 0 || m_word.Length < length)
{
//Sometimes, as a result of expressions like "current - 2" for start,
//start ends up negative. Since no string can be present at a negative offset, this is always false
return false;
}
String target = m_word.Substring(start, length);
for (int idx = 0; idx < strings.Length; idx++) {
if (strings[idx] == target) {
return true;
}
}
return false;
}
-ben
http://mudabone.com
|
|
|
|

|
Good catch Ben. That code is probably in need of some refactoring anyway, if start can be negative. Thanks for the fix.
Adam
|
|
|
|

|
Hey Adam,
I have readed your work "Implement Phonetic ("Sounds-like") Name Searches with Double Metaphone".
It is very interesting. Recently I found a paper (Phonetic String Matching: Lessons from Information
Retrieval - Justin Zobel,Philip Dart) talking about aproximate string matching.
Im plannig to experiment with Editex algorithm. Do you know where I can find more data about this?
Thank you for your time
Elvio Fernandez
Elvio Fernandez
|
|
|
|

|
I was almost ready to use the metaphone method when I stumbled across your articles on double metaphone. You did a VERY good job of explaining it and offering examples. The only thing I wish for was the source code in VB, but not a big deal.
The only major thing I'll need to add is looking up on multiple words.
Thanks!
|
|
|
|

|
tequilacollins wrote:
I was almost ready to use the metaphone method when I stumbled across your articles on double metaphone. You did a VERY good job of explaining it and offering examples.
Thanks, I'm glad you think so.
tequilacollins wrote:
he only thing I wish for was the source code in VB, but not a big deal
For what it's worth, you can use the COM component from VB6, and the C# component from VB.NET..
tequilacollins wrote:
The only major thing I'll need to add is looking up on multiple words.
Just make sure that you compute the Metaphone keys on each word individually; the algorithm is not designed to compute a key for multiple words at once.
Good luck.
Adam
|
|
|
|

|
For what it's worth, you can use the COM component from VB6, and the C# component from VB.NET..
I'm writing the app in ASP. It would have been nice to recreate the DLL with the additional function of multiple words, but I can just create a wrapper instead.
Just make sure that you compute the Metaphone keys on each word individually; the algorithm is not designed to compute a key for multiple words at once.
Yeah, already figured that part. I'll have to tokenize the words first.
Then I still have to figure out a scoring system. If I get one word with a strong hit and the other as a weak one, what do I call it?
I'll let you know how it turns out.
|
|
|
|

|
Does your verion of the .NET implementation produce the exact results as the orginal Philips version? I need to know this because we are currently using the Philips version and want to insure the compatibility of both versions for comparisons.
Thanks for your response.
Gary Fischbach
|
|
|
|

|
Gary:
My .NET implementation should be completely compatible with Phillips' original version; in fact, it should be algorithmically identical. In the course of development, I generated a corpus of DMetaphone keys using Phillips' original code, and compared this to the same corpus processed with my implementation, and only when the entire corpous of ~14k names matched did I consider my algorithm done.
Therefore, my answer to your question is yes, my impl SHOULD produce the same Double Metaphone keys as Phillips' impl, given the same input. Should you find names for which this assertion does not hold, that would constitute a bug in my implementation, and I would very much like to know about it.
Hope this helps, and good luck.
Thanks,
Adam
|
|
|
|

|
Is it posible to use the phonetic Search in German Language? How must I change the Code to do this?
Thomas Bock
|
|
|
|

|
Thomas:
Since different languages use different phonemes, any phonetic matching algorithm designed to support one language (say, English) will require some modification to fully support another language (say, German).
However, when Phillips designed Double Metaphone, he had ethnic pronunciations in mind, including German. Several special cases exist in his algorithm to deal with German names like "Schmidt". Therefore, if you are limiting your application to phonetic matching of German surnames, Double Metaphone might work acceptably well without modification, though obviously that depends entirely on the application.
If Double Metaphone is not adequate, modifying the code will likely be tedious and error prone. From a simple peruse of Phillip's Double Metaphone, all of the special cases are clearly the result of exhaustive trial and error until a workable algorithm was produced. Therefore, any attempts to extend the algorithm further will likely involve a similar process.
Thus, if you find Double Metaphone does not suit your needs, I suggest you consider some of the alternative techniques described in Part VI. You might also read the referenced papers, and have a look at Second String, the Java toolkit providing a number of approximate string matching techniques.
Good luck!
Adam
|
|
|
|

|
why not use a stemmer for the german langauge, I think there is one on this site.
|
|
|
|

|
Patrick:
A stemmer will solve part of the problem, by effectively normalizing German words. However, the phonetic matching issue still remains, particularly if German words, vs surnames, are being used.
Adam
|
|
|
|

|
Hope you find an alternative...../ solution...
I am looking for a thesaurus file (simple text file) for german or any romance language(spanish,french,italian,portuguese) any ideas where I could find them?
Simple format like:
word/part-of-speech/synonym,...,.../antonyms,...,
e.g
abide/(v.)/bear,stand,tolerate,put up with,endure,last,suffer/quit
per each line.
|
|
|
|

|
Sorry, I'm afraid I don't know of such a thing for German. For English the Grady Ward's Moby project has ample word lists of all sorts, including a thesaurus. Surely such a list must exist...
Adam
|
|
|
|

|
Hi, could you put, in the las article, the url of the previous articles, as others writers do.
It will facilitate the reading of the the previous.
Very interesting . Thank you.
ydlm
|
|
|
|

|
Glad you liked my articles.
I had been holding off hyperlinking between pages, since after editing, the URLs will change. However, it's now been several days since I first posted, and the articles remain unedited, therefore I have implemented your suggestion. Each time an article makes reference to another article, that reference should now be a hyperlink to that article.
Hopefully when the editors post my articles to their final destinations, the editors will update the hyperlinks as well.
Thanks for the suggestion.
Adam
|
|
|
|
 |
|
|
General News Suggestion Question Bug Answer Joke Rant Admin
|
Presents a C# implementation of Double Metaphone, for use with any of the .NET languages.
| Type | Article |
| Licence | |
| First Posted | 25 Jul 2003 |
| Views | 154,731 |
| Downloads | 1,995 |
| Bookmarked | 94 times |
|
|