Click here to Skip to main content
Rate this: bad
good
Please Sign up or sign in to vote.
See more: VB.NET
Hello all,
 
Ok, I am banging my head against the wall for a while now trying different techniques. None of them are working well.
 
I have two strings. I need to compare them and get an exact percentage of match,
 
ie. "four score and seven years ago" TO "for scor and sevn yeres ago"
 
Well, I first started by comparing every word to every word, tracking every hit, and percentage = count \ numOfWords. Nope, didn't take into account misspelled words. ("four" <> "for" even though it is close)
 
Then I started by trying to compare every char in each char, incrementing the string char if not a match (to count for misspellings). But, I would get false hits because the first string could have every char in the second but not in the exact order of the second. ("stuff avail" <> "stu vail" (but it would come back as such, low percentage, but a hit. 9 \ 11 = 81%))
 
SO, I then tried comparing PAIRS of chars in each string. If string1[i] = string2[k] AND string1[i+1] = string2[k+1], increment the count, and increment the "k" when it doesn't match (to track mispellings. "for" and "four" should come back with a 75% hit.) That doesn't seem to work either. It is getting closer, but even with an exact match it is only returns 94%. And then it really gets screwed up when something is really misspelled. (Code at the bottom)
 
Any ideas or directions to go?
 
Thanks,
 
Josh
 

count = 0
j = 0
k = 0
While j < strTempName.Length - 2 And k < strTempFile.Length - 2
    ' To ignore non letters or digits '
    If Not strTempName(j).IsLetter(strTempName(j)) Then
        j += 1
    End If
 
    ' To ignore non letters or digits '
    If Not strTempFile(k).IsLetter(strTempFile(k)) Then
        k += 1
    End If
 
    ' compare pair of chars '
    While (strTempName(j) <> strTempFile(k) And _ 
           strTempName(j + 1) <> strTempFile(k + 1) And _ 
           k < strTempFile.Length - 2)
        k += 1
    End While
    count += 1
    j += 1
    k += 1
 
End While
 
perc = count / (strTempName.Length - 1)
Posted 18-Jan-11 20:48pm
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 1

May be this will help as a bases. You need to modify it.
 
Points to remember:
1) It compares character by character
2) Skips characters until next match
3) Wait at the end of word
4) Jumps to next word when new word starts on first string
 
Function Compare(ByVal str1 As String, ByVal str2 As String) As Double
  Dim count As Integer = If(str1.Length > str2.Length, str1.Length, str2.Length)
  Dim hits As Integer = 0
  Dim i, j As Integer : i = 0 : j = 0
  For i = 0 To str1.Length - 1
    If str1.Chars(i) = " " Then i += 1 : j = str2.IndexOf(" "c, j) + 1 : hits += 1
    While j < str2.Length AndAlso str2.Chars(j) <> " "c
      If str1.Chars(i) = str2.Chars(j) Then
        hits += 1
        j += 1
        Exit While
      Else
        j += 1
      End If
    End While
    If Not (j < str2.Length AndAlso str2.Chars(j) <> " "c) Then
      j -= 1
    End If
  Next
  Return Math.Round((hits / count), 2)
End Function
 
Sample Output:
"four"<->"for" = 0.75
"four stud"<->"for studs" = 0.89
  Permalink  
v2
Comments
losmac at 9-Mar-12 15:27pm
   
Interesting solution... My 5!
Please, see my question. Would you like to join into discussion?
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 2

You can use Levenshtein Distance[^] algorithm. It is very well known algorithm with easy implementation.
This[^] page contains Java/C++/VB implementations of the algorithm.
And here[^] you can find generic implementation of this algorithm (this time in C#, but converting to VB.NET should not be a problem).
 
I hope this helps. Smile | :)
  Permalink  
v2
Comments
John Simmons / outlaw programmer at 19-Jan-11 15:07pm
   
Proposed as answer
Nuri Ismail at 20-Jan-11 3:02am
   
Thank you John!
losmac at 9-Mar-12 15:28pm
   
Good answer, good link. My 5!
Please, see my question. Would you like to join into discussion?

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
0 OriginalGriff 410
1 Jochen Arndt 200
2 Richard MacCutchan 135
3 DamithSL 105
4 PIEBALDconsult 90
0 OriginalGriff 6,045
1 DamithSL 4,601
2 Maciej Los 4,032
3 Kornfeld Eliyahu Peter 3,480
4 Sergey Alexandrovich Kryukov 3,220


Advertise | Privacy | Mobile
Web04 | 2.8.141220.1 | Last Updated 13 Jun 2012
Copyright © CodeProject, 1999-2014
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100