12,239,733 members (58,620 online)
Rate this:
See more:
Hello all,

Ok, I am banging my head against the wall for a while now trying different techniques. None of them are working well.

I have two strings. I need to compare them and get an exact percentage of match,

ie. "four score and seven years ago" TO "for scor and sevn yeres ago"

Well, I first started by comparing every word to every word, tracking every hit, and percentage = count \ numOfWords. Nope, didn't take into account misspelled words. ("four" <> "for" even though it is close)

Then I started by trying to compare every char in each char, incrementing the string char if not a match (to count for misspellings). But, I would get false hits because the first string could have every char in the second but not in the exact order of the second. ("stuff avail" <> "stu vail" (but it would come back as such, low percentage, but a hit. 9 \ 11 = 81%))

SO, I then tried comparing PAIRS of chars in each string. If string1[i] = string2[k] AND string1[i+1] = string2[k+1], increment the count, and increment the "k" when it doesn't match (to track mispellings. "for" and "four" should come back with a 75% hit.) That doesn't seem to work either. It is getting closer, but even with an exact match it is only returns 94%. And then it really gets screwed up when something is really misspelled. (Code at the bottom)

Any ideas or directions to go?

Thanks,

Josh

```count = 0
j = 0
k = 0
While j < strTempName.Length - 2 And k < strTempFile.Length - 2
' To ignore non letters or digits '
If Not strTempName(j).IsLetter(strTempName(j)) Then
j += 1
End If

' To ignore non letters or digits '
If Not strTempFile(k).IsLetter(strTempFile(k)) Then
k += 1
End If

' compare pair of chars '
While (strTempName(j) <> strTempFile(k) And _
strTempName(j + 1) <> strTempFile(k + 1) And _
k < strTempFile.Length - 2)
k += 1
End While
count += 1
j += 1
k += 1

End While

perc = count / (strTempName.Length - 1)
```
Posted 18-Jan-11 20:48pm
Edited 28-Mar-16 13:47pm

Rate this:

Solution 2

You can use Levenshtein Distance[^] algorithm. It is very well known algorithm with easy implementation.
This[^] page contains Java/C++/VB implementations of the algorithm.
And here[^] you can find generic implementation of this algorithm (this time in C#, but converting to VB.NET should not be a problem).

I hope this helps.
v2

Nuri Ismail 20-Jan-11 3:02am

Thank you John!
losmac 9-Mar-12 15:28pm

Please, see my question. Would you like to join into discussion?
Rate this:

Solution 1

May be this will help as a bases. You need to modify it.

Points to remember:
1) It compares character by character
2) Skips characters until next match
3) Wait at the end of word
4) Jumps to next word when new word starts on first string

```Function Compare(ByVal str1 As String, ByVal str2 As String) As Double
Dim count As Integer = If(str1.Length > str2.Length, str1.Length, str2.Length)
Dim hits As Integer = 0
Dim i, j As Integer : i = 0 : j = 0
For i = 0 To str1.Length - 1
If str1.Chars(i) = " " Then i += 1 : j = str2.IndexOf(" "c, j) + 1 : hits += 1
While j < str2.Length AndAlso str2.Chars(j) <> " "c
If str1.Chars(i) = str2.Chars(j) Then
hits += 1
j += 1
Exit While
Else
j += 1
End If
End While
If Not (j < str2.Length AndAlso str2.Chars(j) <> " "c) Then
j -= 1
End If
Next
Return Math.Round((hits / count), 2)
End Function```

Sample Output:
"four"<->"for" = 0.75
"four stud"<->"for studs" = 0.89
v2
losmac 9-Mar-12 15:27pm

Interesting solution... My 5!
Please, see my question. Would you like to join into discussion?

Top Experts
Last 24hrsThis month
 OriginalGriff 780 KARTHIK Bangalore 249 Nigam,Ashish 238 Sergey Alexandrovich Kryukov 228 ppolymorphe 190
 OriginalGriff 9,373 F-ES Sitecore 4,778 Jochen Arndt 4,258 Dave Kreskowiak 4,018 Richard MacCutchan 3,791