Click here to Skip to main content
14,119,479 members
Rate this:
 
Please Sign up or sign in to vote.
See more:
This simple one is suggested by Brent Hoskisson.

Create a method that will compare two names intelligently. "Intelligently" means that it needs to take into account the different forms a name may take. For example:
John Paul Smith

Would match with
John Paul Smith
Smith John Paul
John P Smith
Smith John P
J Paul Smith
Smith J Paul
John Smith
Smith John

Whether you choose to use a binary match/no match, or a score indicating a degree of certainty of the match is up to you.

Points awarded for brevity of code. No restrictions on the number of different entries per challenger.

Note: next week I'm away Friday. Can someone in the peanut gallery please post a challenge next Friday (24 March)
Posted
Updated 19-Mar-17 3:43am
v2
Comments
Graeme_Grant 17-Mar-17 9:21am
   
I've done quite a few of these, so happy to take a rest and do it for you next week...
PIEBALDconsult 17-Mar-17 13:18pm
   
What about John Paul Smythe? :D
PIEBALDconsult 18-Mar-17 14:06pm
   
Aw crap, now I think I may have devised a technique and I'll _have_ to pursue it... Elephant! Elephant! Elephant! I don't want to do this!

Actually, I'm now reminded of a task I had a few years back in which I had to try to match addesses between a master list and several other lists. It was awful. The best I could do then was to use Levenshtein distance then manually review anything above some threshold but less than 100%.
Graeme_Grant 24-Mar-17 3:51am
   
Rate this: bad
 
good
Please Sign up or sign in to vote.

Solution 1

Let us first think about and specifying the requirements, in this case using Specification by Example, directly coded as executable unit tests:
[TestMethod]
public void Test_NameMatches()
{
    Assert.IsTrue("John Paul Smith".Matches("Smith John"));    // 1
    Assert.IsTrue("Smith John".Matches("Smith john"));         // 2   
    Assert.IsTrue("Smith Smith".Matches("Smith Smith"));       // 3  
    Assert.IsTrue("Smith Smith".Matches(" Smith Paul Smith "));// 4
    Assert.IsTrue("Smith John".Matches("john S"));             // 5
    Assert.IsTrue("J Smith".Matches("John Smith" ));           // 6
    Assert.IsTrue(" J Paul  Smith ".Matches("John"));          // 7 
    Assert.IsTrue("Smith, J.P".Matches("John Paul Smith"));    // 8
    Assert.IsTrue("John Smith".Matches("John Sm"));            // 9

    Assert.IsFalse("John Jonsson".Matches("John Smith"));      // 10
    Assert.IsFalse("John Smith".Matches( "John Jonsson"));     // 11
}
Notice here that I have included case with duplicate names (3 and 4), abbreviations (6-8) , case insensitivity (2 and 5), extra white spaces (4, 7) and punctuations (8). I also chose to match incomplete names such as case 9 to support auto-completion scenarios. For code readability I chose to go for a String extension method.

A concise solution that fulfills the above requirements is
static class Names
{
    public static bool Matches(this String name1, String name2)
    {
       var names1 = name1.Split(Separators, StringSplitOptions.RemoveEmptyEntries);
       var names2 = name2.Split(Separators, StringSplitOptions.RemoveEmptyEntries);
       return names1.Length < names2.Length ? !names1.Except(names2, Comparer).Any() : !names2.Except(names1, Comparer).Any();
    }

    public static char[] Separators = { ' ', '\t', '.', ',' };
}

For comparison it utilizes Comparer which is defined as an instance of a nested class:
private static readonly NameComparer Comparer = new NameComparer();

    private class NameComparer : IEqualityComparer<String>
    {
        public bool Equals(string x, string y)
        {
            return String.Compare(x, 0, y, 0, Math.Min(x.Length, y.Length), ignoreCase: true) == 0;
            return String.Compare(x, 0, y, 0, x.Length, ignoreCase:true) == 0 ||
                   String.Compare(x, 0, y, 0, y.Length, ignoreCase:true) == 0;        }

        public int GetHashCode(string obj)
        {
            return Char.ToUpper(obj[0]).GetHashCode();
        }
    }
To support matching of a single name I also added the following method which takes a whole collection as input and returns all matches:
public static IEnumerable<String> GetAllMatches(this String name1, IEnumerable<String>  dictionary)
    {
       var names1 = name1.Split(Separators, StringSplitOptions.RemoveEmptyEntries);
       foreach (var name2 in dictionary)
       {
          var names2 = name2.Split(Separators, StringSplitOptions.RemoveEmptyEntries);
          if (names1.Length < names2.Length ? !names1.Except(names2, Comparer).Any() : !names2.Except(names1, Comparer).Any())
          {
              yield return name2;
          }
     }

Does it passes the peer review?
   
v2
Comments
PIEBALDconsult 19-Mar-17 23:14pm
   
Oh, and good point about case-insensitivity, I had forgotten about that.
Rate this: bad
 
good
Please Sign up or sign in to vote.

Solution 2

To this challenge I have also an idea. In my opinion the answer of such method could not be "True" or "False" - it must be a value which gives a ratio how good the match is ...

1st the method :
Function NameCompare(SourceName As String, NameToCompare As String) As Single
     'Rules :
     'An empty String gives 0.0 as Result
     'Are both String identical the Result is 1.0 = 100% match
     'other cases :
     ' - I give 1 point for each letter which is equal in the Stringparts beginning from the start
     ' - I give an additional point if the Stringpart is complete equal
     ' - I give an additional point if the Stringpart is complete equal and at the right position
     ' - if the NameToCompare beginns with the same letters but have more than the SourceName then each letter more reduces the given points

     If SourceName.Trim = "" Or NameToCompare.Trim = "" Then Return 0.0

     Dim SourceNameArray() As String = SourceName.Split(" ")
     Dim NameToCompareArray() As String = NameToCompare.Split(" ")

     Dim maxPoints As Integer = SourceName.Replace(" ", "").Length + SourceNameArray.Length * 2
     Dim givenPoints As Single = 0.0
     Dim i, j, k As Integer

     For i = 0 To SourceNameArray.Length - 1
         For j = 0 To NameToCompareArray.Length - 1
             If SourceNameArray(i) = NameToCompareArray(j) Then
                 givenPoints += NameToCompareArray(j).Length
                 givenPoints += 1
                 If i = j Then givenPoints += 1
                 Exit For
             ElseIf SourceNameArray(i).Length >= NameToCompareArray(j).Length Then
                 For k = 1 To NameToCompareArray(j).Length
                     If SourceNameArray(i).Substring(0, k) = NameToCompareArray(j).Substring(0, k) Then givenPoints += 1
                 Next
             ElseIf SourceNameArray(i).Length < NameToCompareArray(j).Length Then
                 For k = 1 To SourceNameArray(i).Length
                     If SourceNameArray(i).Substring(0, k) = NameToCompareArray(j).Substring(0, k) Then givenPoints += 1
                 Next
                 givenPoints -= CSng(NameToCompareArray(j).Length - SourceNameArray(i).Length) * 0.25
             End If
         Next
     Next

     If givenPoints = 0 Or maxPoints = 0 Then Return 0.0
     Return givenPoints / CSng(maxPoints)
 End Function


now the test with severall names to compare :
Dim n0, nc As String
n0 = "John Paul Smith"

nc = "John Paul Smith"
Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00"))
nc = "Smith John Paul"
Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00"))
nc = "John P Smith"
Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00"))
nc = "Smith John P"
Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00"))
nc = "J Paul Smith"
Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00"))
nc = "Smith J Paul"
Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00"))
nc = "John Smith"
Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00"))
nc = "Smith John"
Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00"))
nc = "J P Smith"
Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00"))
nc = "Paul Smith"
Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00"))
nc = "Smith Paul"
Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00"))

nc = "Paula Smith"
Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00"))
nc = "Josephine Smith"
Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00"))
nc = "Johny Smith"
Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00"))


and here the results :
John Paul Smith :: John Paul Smith -> 1,00
John Paul Smith :: Smith John Paul -> 0,82
John Paul Smith :: John P Smith -> 0,72
John Paul Smith :: Smith John P -> 0,61
John Paul Smith :: J Paul Smith -> 0,72
John Paul Smith :: Smith J Paul -> 0,61
John Paul Smith :: John Smith -> 0,62
John Paul Smith :: Smith John -> 0,55
John Paul Smith :: J P Smith -> 0,45
John Paul Smith :: Paul Smith -> 0,57
John Paul Smith :: Smith Paul -> 0,61
John Paul Smith :: Paula Smith -> 0,47
John Paul Smith :: Josephine Smith -> 0,21
John Paul Smith :: Johny Smith -> 0,47
   
Comments
PIEBALDconsult 19-Mar-17 15:19pm
   
I didn't read that too closely, but I'm trying something similar. I'm weighting the letters by how close they are to the start of a "word".
Ralf Meier 19-Mar-17 16:24pm
   
You are right.
I tried to create "rules" which allow to check how near a Compare-String comes to the Source-String.
But I'm also very interested in other Solutions ...

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
Top Experts
Last 24hrsThis month


Advertise | Privacy | Cookies | Terms of Service
Web01 | 2.8.190518.1 | Last Updated 19 Mar 2017
Copyright © CodeProject, 1999-2019
All Rights Reserved.
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100