Click here to Skip to main content
15,899,679 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
i have a k nearest neighborhood c++ code which deal with two text files the first file has 1405 train vectors, the second file has 810 test vectors. knn take these two files and classify each vector in test file using train file finally the code give me an accuracy (i have an error with this accuracy).
when i run the code with: k=1 the accuracy is 89%, k=3 the accuracy is 186%
my question how did the accuracy exceed 100%!

this is the knn code:


int TestKNN (TRAINING_EXAMPLES_LIST *tlist, TRAINING_EXAMPLES_LIST data, 
			 bool isInstanceWeighted, MODE mode,
			 bool isBackwardElimination, bool isAttWKNN)
{
	int correctlyClassifiedInstances = 0;
	TRAINING_EXAMPLES_LIST::iterator testIter;
	TrainingExample tmpTestObj;
	uint index[K];

	for(testIter = data.begin(); testIter != data.end(); ++testIter)
	{
		tmpTestObj = *testIter;
		/* Predict the class for the query point */
		int predictedClass = PredictByKNN(tlist, tmpTestObj.Value, 
											isInstanceWeighted, 
											index, mode, isBackwardElimination, 
											isAttWKNN);
		/* Count the number of correctly classified instances */
		if(((int)(tmpTestObj.Value[NO_OF_ATT-1])) == predictedClass)
			correctlyClassifiedInstances ++;
	}	
	return correctlyClassifiedInstances;
}


int PredictByKNN (TRAINING_EXAMPLES_LIST *tlist, double *query, 
				  bool isWeightedKNN, uint *index, MODE mode, 
				  bool isBE, bool isAttWeightedKNN)
{
	double distance = 0.0;
	TRAINING_EXAMPLES_LIST::iterator iter;
	TrainingExample tmpObj;
	TRAINING_EXAMPLES_LIST elistWithD;

	if(!elistWithD.empty())
		elistWithD.clear ();

	/* If we are in for backward elimination or attribute WKNN */
	/* then Instance WKNN has to be false                      */
	if(isBE || isAttWeightedKNN)
		isWeightedKNN = false;

	/* Calculate the distance of the query */
	/* point from all training instances   */
	/* using the euclidean distance        */
	for(iter = tlist->begin(); iter != tlist->end(); ++iter)
	{
		tmpObj = *iter;
		distance = 0.0;

		for(int j = 0; j < NO_OF_ATT - 1; j++)
		{
			
			
				distance += (abs(query[j] - tmpObj.Value[j]) * 
							abs(query[j] - tmpObj.Value[j])) * 
								(attWeights[j] * attWeights[j]);
			
			
		}
		distance = sqrt(distance);
		/* If the distance is zero then set it to some high value */
		/* since it the query point itself                        */
		if((int)(distance*1000) == 0)
			distance = 999999999999999.9;
		
		tmpObj.Distance = distance; 
		elistWithD.insert (elistWithD.end(), tmpObj);
	}

	/* Sort the points on distance in ascending order */
	elistWithD.sort(compare);

	
	
		/* Simple KNN, Attribute Weighted KNN, Backward Elimination */
		int classCount[NO_OF_CLASSES];

		for(int i = 0; i < NO_OF_CLASSES; i++)
			classCount[i] = 0;

		int knn = K;
		for(iter = elistWithD.begin(); iter != elistWithD.end(); ++iter)
		{
			/* Calculate how the K nearest neighbors are classified */
			tmpObj = *iter;
			classCount[(int)tmpObj.Value[NO_OF_ATT-1]]++;
			knn--;
			if(!knn)
				break;
		}

		int maxClass = 0;
		int maxCount = 0;

		/* Find the class represented maximum number of times */
		/* among the k neighbors                              */
		for(int i = 0; i < NO_OF_CLASSES; i++)
		{
			if(classCount[i] > maxCount)
			{
				maxClass = i;
				maxCount = classCount[i];
			}
		}

		return maxClass;
	}
Posted
Updated 24-Sep-14 8:11am
v2

1 solution

Please use the debugger or logging, locate the place where you calculate accuracy (assuming this is your code ;-)), and detect the case when the accuracy exceeds 100%. You can simply to modify the code to add the check and generate some log or error message. This way, you find out exact steps to reproduce the problem. (Most likely, you already know those steps.) Then put a breakpoint at the point of the code which will be executed only if the accuracy exceeds 100%. In this case, the debugger will stop the execution only at this point when the condition for problem reproductions are met. At this point, look at the Debug window "Call stack"; it will give you exact information where the execution came from. It will show you where to do further debugging and see the problem closer to the root of the problem. So, you will locate exact reason using small number of steps.

It may appear more efficient than just staring at your code sample. Even better, it will help you to go along without asking another question next time. The skill is more important than the resolution.

—SA
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900