15,795,233 members
See more:
Hello all,

Ok, I am banging my head against the wall for a while now trying different techniques. None of them are working well.

I have two strings. I need to compare them and get an exact percentage of match,

ie. "four score and seven years ago" TO "for scor and sevn yeres ago"

Well, I first started by comparing every word to every word, tracking every hit, and percentage = count \ numOfWords. Nope, didn't take into account misspelled words. ("four" <> "for" even though it is close)

Then I started by trying to compare every char in each char, incrementing the string char if not a match (to count for misspellings). But, I would get false hits because the first string could have every char in the second but not in the exact order of the second. ("stuff avail" <> "stu vail" (but it would come back as such, low percentage, but a hit. 9 \ 11 = 81%))

SO, I then tried comparing PAIRS of chars in each string. If string1[i] = string2[k] AND string1[i+1] = string2[k+1], increment the count, and increment the "k" when it doesn't match (to track mispellings. "for" and "four" should come back with a 75% hit.) That doesn't seem to work either. It is getting closer, but even with an exact match it is only returns 94%. And then it really gets screwed up when something is really misspelled. (Code at the bottom)

Any ideas or directions to go?

Thanks,

Josh

```count = 0
j = 0
k = 0
While j < strTempName.Length - 2 And k < strTempFile.Length - 2
' To ignore non letters or digits '
If Not strTempName(j).IsLetter(strTempName(j)) Then
j += 1
End If

' To ignore non letters or digits '
If Not strTempFile(k).IsLetter(strTempFile(k)) Then
k += 1
End If

' compare pair of chars '
While (strTempName(j) <> strTempFile(k) And _
strTempName(j + 1) <> strTempFile(k + 1) And _
k < strTempFile.Length - 2)
k += 1
End While
count += 1
j += 1
k += 1

End While

perc = count / (strTempName.Length - 1)```
Posted
Updated 28-Mar-16 13:47pm

## Solution 2

You can use Levenshtein Distance[^] algorithm. It is very well known algorithm with easy implementation.
This[^] page contains Java/C++/VB implementations of the algorithm.
And here[^] you can find generic implementation of this algorithm (this time in C#, but converting to VB.NET should not be a problem).

I hope this helps. :)

v2
#realJSOP 19-Jan-11 15:07pm
Nuri Ismail 20-Jan-11 3:02am
Thank you John!
Maciej Los 9-Mar-12 15:28pm
Please, see my question. Would you like to join into discussion?

## Solution 1

May be this will help as a bases. You need to modify it.

Points to remember:
1) It compares character by character
2) Skips characters until next match
3) Wait at the end of word
4) Jumps to next word when new word starts on first string

VB
```Function Compare(ByVal str1 As String, ByVal str2 As String) As Double
Dim count As Integer = If(str1.Length > str2.Length, str1.Length, str2.Length)
Dim hits As Integer = 0
Dim i, j As Integer : i = 0 : j = 0
For i = 0 To str1.Length - 1
If str1.Chars(i) = " " Then i += 1 : j = str2.IndexOf(" "c, j) + 1 : hits += 1
While j < str2.Length AndAlso str2.Chars(j) <> " "c
If str1.Chars(i) = str2.Chars(j) Then
hits += 1
j += 1
Exit While
Else
j += 1
End If
End While
If Not (j < str2.Length AndAlso str2.Chars(j) <> " "c) Then
j -= 1
End If
Next
Return Math.Round((hits / count), 2)
End Function```

Sample Output:
"four"<->"for" = 0.75
"four stud"<->"for studs" = 0.89

v2
Maciej Los 9-Mar-12 15:27pm
Interesting solution... My 5!
Please, see my question. Would you like to join into discussion?