Click here to Skip to main content
15,884,986 members
Please Sign up or sign in to vote.
2.00/5 (1 vote)
I Get dataset of neighbours using kNN and then I want to apply k-mean on that dataset. By using this, is it possible that I get more accurate result? Is it logically correct that use kNN and then after use k-mean or vice-versa?
Posted
Comments
Sergey Alexandrovich Kryukov 18-Nov-14 15:49pm    
Not clear... More accurate? It depends on what result do you want to get... :-)
—SA

1 solution

They are different machine learning techniques and it is a long story. I will try my best to be concise.

KNN is used to:
1. Classify a new data into a known group (category); or
2. Predict a target value for a new data;
It works by comparing the similarity between features of the new data and those of a set of historical data of known categories or known target values. The "K" refers to the number of data that has the closest match to it. The final outcome may be determine by the majority of category in the K group in the case of classification or a simple average in the case of prediction.
So, for KNN you need to have historical data with known targets and it is called supervised machine learning.

K-means, on the other hand, is a clustering algorithm. It works by first grouping data points into K number of partitions (or clusters). It starts by selecting K number of data points randomly as the centers of these k clusters, then assign the rest of data points to these cluster based on the features similarity between them and the cluster centers. Once all the data points are being grouped into their clusters, each cluster will re-select the most suitable data points among its members to be its new cluster centers. Then, the whole process of re-clustering begins. This will go on until there is no sigificant changes in the clusters. Once the K-clusters are identified and settled, new data then finds its place in one of these clusters through similarity matching.
so, K-means needs historical data with no known outcomes and it is called unsupervised machine learning.

Combining Them:
You may do a K-means first to group new data into a cluster and then apply KNN using the data points in that cluster. But you would need a very large dataset. Whether it produces better result, it depends on many factors - nature of the problem, the quality and quantity of the dataset, and the value of K . You just have to explore and experiment. Good luck.

Some reading that may help: Combination of K-Nearest Neighbor and K-Means based
on Term Re-weighting for Classify Indonesian News
[^]
 
Share this answer
 
v2
Comments
rushiraj_11 19-Nov-14 3:56am    
Thanks Peter...u solved the prob completely...and that paper is also helpful...and I will come back to u with result ...thanks a lot...

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900