Click here to Skip to main content
15,879,535 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I'm working on a Java project where I use min-hashing to compute the Jaccard similarity between two documents.Both documents represent texts which are given in the form of unsorted integer arrays ex. arr[0] = first word(as an int ) ... I compute the min hashing similarity between two sets and then use it to compute the jaccard coefficient .
The problem is that when I divide the min hashing similarity with the number of elements in the union of the 2 arrays I get a number not accurate to the division

ex. arr1={4,5,6,7} arr2={6,7}

min hashing similarity : 0.5
union array = {0,1,2,3,6,7} length = 6
jaccard coefficient = min-hashing similarity / length = 0.5/6 = 0.0833333333333
but I get 0.096 when I compute the jaccard coefficient
I have the code down below .
Thank you for your time .

What I have tried:

@SuppressWarnings("static-access")
	public double jaccard(Document doc) // returns similarity/number of elems in union array 
	{
		
	return this.minhash(doc)/(double(this.unionArrays(a,b,a.length,b.length));
	}
	
	public static int unionArrays(int[] a ,int[] b ,int m , int n)
    {
		int counter=0;  //number of elems in union array 
		
		 if (m > n)  //make sure first array is smaller 
	        { 
	            int tempp[] = a; 
	            a = b; 
	            b = tempp; 
	  
	            int temp = m; 
	            m = n; 
	            n = temp; 
	        } 
		 
	        Arrays.sort(a); //sort first array 
	        
	        for (int i = 0; i < m; i++)
	        {          
	        	counter++; //number of elems of first array 
	        }
	        
	  
	        for (int i = 0; i < n; i++)  
	        { 
	            if (binarySearch(a, 1, m-1 , b[i]) == -1)
	            {
	            	counter++;  //if elem of second array doesn't exist in first array increase counter 
	            }
	          
	            
	        }
	        
	        
	        return counter;
	    }

	private static  int binarySearch(int[] arr, int l, int r, int x) {


        if (r >= l)  
        { 
            int mid = l + (r - l) / 2; 
  
            // If the element is present at the middle itself 
            if (arr[mid] == x) 
                return mid; 
  
            // If element is smaller than mid, then it can only  
            // be present in left subarray 
            if (arr[mid] > x) 
                return binarySearch(arr, l, mid - 1, x); 
  
            // Else the element can only be present in right subarray 
            return binarySearch(arr, mid + 1, r, x); 
        } 
  
        // We reach here when element is not present in array 
   
		
		return -1;
	} 
Posted
Updated 4-Jan-20 3:10am
v2
Comments
[no name] 4-Jan-20 9:27am    
Here return this.minhash(doc)/(double(this.unionArrays(a,b,a.length,b.length)); brackets do not match. Does your example really compile?
Richard MacCutchan 4-Jan-20 11:51am    
Put some print statements into your code so that you can see all the values that are being calculated. That will help to find out where a calculation is incorrect.

1 solution

Your code do not behave the way you expect, or you don't understand why !

There is an almost universal solution: Run your code on debugger step by step, inspect variables.
The debugger is here to show you what your code is doing and your task is to compare with what it should do.
There is no magic in the debugger, it don't know what your code is supposed to do, it don't find bugs, it just help you to by showing you what is going on. When the code don't do what is expected, you are close to a bug.
To see what your code is doing: Just set a breakpoint and see your code performing, the debugger allow you to execute lines 1 by 1 and to inspect variables as it execute.

Debugger - Wikipedia, the free encyclopedia[^]

Mastering Debugging in Visual Studio 2010 - A Beginner's Guide[^]
Basic Debugging with Visual Studio 2010 - YouTube[^]

http://docs.oracle.com/javase/7/docs/technotes/tools/windows/jdb.html[^]
https://www.jetbrains.com/idea/help/debugging-your-first-java-application.html[^]

The debugger is here to only show you what your code is doing and your task is to compare with what it should do.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900