15,567,221 members
Articles / Web Development / ASP.NET
Article
Posted 3 Jan 2009

100.6K views
42 bookmarked

# K-Means Clustering Used in Intention Based Scoring Projects

Rate me:
3 Jan 2009CPOL5 min read
The use of K-Means clustering for data mining purposes
In this project, you will learn how to create an automated Intention Based Score for each programming assignment, and use K-Mean Clustering to identify which students belong to which group.

## Introduction

Intention based scoring (IBS) is a scoring system, its goal is to provide a score that more accurately assesses a student’s ability to solve a composition problem, and therefore assess direct effects of the student's programming ability. This process is dubbed “intention based” because, firstly, the programs are inspected by inferring what goals the student is trying to achieve, and secondly, what plan component is being attempted when a program statement is not syntactically correct.

The three steps necessary to perform IBS were introduced by [1]: inspection of an on-line protocol, bug identification, and scoring.

The final goal of this project is to create an automated Intention Based Score for each programming assignment, and use K-Mean Clustering to identify which students belong to which group.

## K-Means Algorithm

The K-Means algorithm is an algorithm to cluster n objects based on attributes into k partitions, k < n. It is similar to the expectation-maximization algorithm for mixtures of Gaussians in that they both attempt to find the centers of natural clusters in the data. It assumes that the object attributes form a vector space. The objective it tries to achieve is to minimize total intra-cluster variance, or, the squared error function:

where there are k clusters Si, i = 1, 2, ..., k, and µi is the centroid or mean point of all the points xj in Si. The K-Means clustering was invented in 1956. The most common form of the algorithm uses an iterative refinement heuristic known as Lloyd's algorithm. Lloyd's algorithm starts by partitioning the input points into k initial sets, either at random, or using some heuristic data. It then calculates the mean point, or centroid, of each set. It constructs a new partition by associating each point with the closest centroid. Then, the centroids are recalculated for the new clusters, and algorithm repeated by alternate application of these two steps until convergence, which is obtained when the points no longer switch clusters (or alternatively, the centroids are no longer changed).

Lloyd's algorithm and K-means are often used synonymously, but in reality, Lloyd's algorithm is a heuristic for solving the K-Means problem, as with certain combinations of starting points and centroids, Lloyd's algorithm can, in fact, converge to the wrong answer (i.e., a different and optimal answer to the minimization function above exists). Other variations exist, but Lloyd's algorithm has remained popular because it converges extremely quickly in practice. In fact, many have observed that the number of iterations is typically much less than the number of points. Recently, however, David Arthur and Sergei Vassilvitskii showed that there exist certain point sets on which K-Means takes super-polynomial time: 2Ω(√n) to converge approximate K-Means algorithms have been designed that make use of core sets: small subsets of the original data.

In terms of performance, the algorithm is not guaranteed to return a global optimum. The quality of the final solution depends largely on the initial set of clusters, and may, in practice, be much poorer than the global optimum. Since the algorithm is extremely fast, a common method is to run the algorithm several times and return the best clustering found. A drawback of the K-Means algorithm is that the number of clusters k is an input parameter. An inappropriate choice of k may yield poor results. The algorithm also assumes that the variance is an appropriate measure of cluster scatter.

The K-Means clustering algorithm was developed by J. MacQueen (1967) and then by J. A. Hartigan and M. A. Wong around 1975. Simply speaking, K-Means clustering is an algorithm to classify or to group your objects based on attributes/features, into K number of groups. K is a positive integer number. The grouping is done by minimizing the sum of squares of distances between data and the corresponding cluster centroid. Thus, the purpose of K-means clustering is to classify the data.

## Pseudo Code

Initialize the algorithm either heuristically or randomly. Iterate the following steps until convergence (stopping criteria met):

1. For each data point x, compute its membership in clusters by choosing the nearest centroid.
2. For each centroid, recompute its location according to members:
JavaScript
```    var m = initialCentroids(x, K);
var N = x.length;
while (!stoppingCriteria) {
var w = [][];
// calculate membership in clusters
for (var n = 1; n <= N; n++) {
v = arg min (v0) dist(m[v0], x[n]);
w[v].push(n);
}
// recompute the centroids
for (var k = 1; k <= K; k++) {
m[k] = avg(x in w[k]);
}
}
return m;```

For example, please see K-Means clustering tutorials.

## Program

To cluster `student`s, we set certain attributes inside an array for each `student`. For the purposes of this article, I have only used three variables for each `student`.

C#
`double[] Student = new double[] { Compiles, Errors, Time(Minutes), ETC…. };`

In my sample, I used the following:

C#
```double[] StudentA = new double[] { 15, 32, 35.6 };
double[] StudentB = new double[] { 19, 54, 65.1 };
double[] StudentC = new double[] { 22, 95, 45.6 };```

The number of attributes should all be equal; so, if a `student` does not have an assignment, the value will be zero. To add a collection of clusters, just add your data as:

C#
```double[,] data = { StudentA, StudentB, StudentC};
double[,] data = { { 15, 32, 35.6 }, { 19, 54, 65.1 }, {22,95,45.6 } };```

To get the clusters for these three students, you set a K for the number of cluster you would want to partition them in to. For example, K=2:

C#
```clusters = KMeans.ClusterDataSet( K, data);

ClusterCollection clusters;
clusters = KMeans.ClusterDataSet(2, data);```

The output of the `ClusterDataSet` function will be of type `CollectionBase`. This data is serialized so it can be converted to XML and so on as you wish. I have placed this in a `DataGrid` for viewing.

Other functions that can be used are:

C#
```//Test the Euclidean Distance calculation between two data points
double distance = KMeans.EuclideanDistance(StudentA, StudentB);
Response.Write("<br/>Euclidean Distance A and B:" + distance);```

Euclidean distance or simply 'distance' examines the root of square differences between coordinates of a pair of objects.

C#
```//Test the Manhattan Distance calculation between two data points
double distances = KMeans.ManhattanDistance(StudentA, StudentB);
Response.Write("<br/>Manhattan Distance A and B:" + distances);```

It is also known as Manhattan distance, boxcar distance, or absolute value distance. It represents the distance between points in a city road grid. It examines the absolute differences between coordinates of a pair of objects.

C#
```//Test the Cluster Mean calculation between two data points
double[,] cluster = { { 15, 32, 35.6 }, { 19, 54, 65.1 } };
double[] centroid = KMeans.ClusterMean(cluster);
Response.Write("<br/>Cluster mean Calc: "+ centroid.ToString());```

## Sources

1. H. Chad Lane, Kurt VanLehn; Intention Based Scoring: An Approach to Measuring Success at Solving the Composition Problem; Technical Symposium on Computer Science Education. Proceedings of the 36th SIGCSE technical symposium on Computer Science education, 2005, St. Louis, Missouri, USA.
2. K-Means clustering tutorials
3. Wikipedia

Written By
Web Developer
Philippines
My name : Aresh Saharkhiz.
Origin: Unknown

Education : BS Computer Science
MS Computer Science
Interests : Genetic Programming
Neural Networks
Game AI
Programming: (language == statementEnd(semicolon)

Skill:
Flash
Carrara 3D
PHP,ASP,ASP.NET
J2SE

## Comments and Discussions

 First Prev Next
 GridView Capacity Member 1046253521-Apr-15 0:34 Member 10462535 21-Apr-15 0:34
 My vote of 4 vinayakJJ23-May-13 3:31 vinayakJJ 23-May-13 3:31
 I just want a 2-d array of all the points. rohit5767-Jul-12 0:56 rohit576 7-Jul-12 0:56
 This is the answer vinayakJJ23-May-13 21:23 vinayakJJ 23-May-13 21:23
 i knw this is late but this can help someone else C# ```clusters = ClusterDataSet(2, data); for (int i = 0; i < clusters.Count; i++) { Cluster c = clusters[i]; // Console.WriteLine(c[0].ToString()); Console.WriteLine("Cluster " + (i + 1)); for (int j = 0; j < c.Count; j++) { double[] d = c[j]; for (int k = 0; k < d.Length; k++) { Console.Write(d[k].ToString() + " "); } Console.WriteLine(Environment.NewLine); } }``` i looked at code yesterday and solved it today
 euclidean distance Member 384042825-Aug-11 6:31 Member 3840428 25-Aug-11 6:31
 [My vote of 2] Plagiarism juggler19-Apr-11 2:17 juggler 19-Apr-11 2:17
 Re: [My vote of 2] Plagiarism saharkiz19-Jun-11 3:13 saharkiz 19-Jun-11 3:13
 Bugs Wernight29-Mar-11 3:28 Wernight 29-Mar-11 3:28
 how to use it to solve my problem Huisheng Chen26-Jan-11 22:29 Huisheng Chen 26-Jan-11 22:29
 k-means clustering for large amount of data sets Gayathri Asokan5-Oct-10 6:26 Gayathri Asokan 5-Oct-10 6:26
 Very good article on K-means. Trevor Misfeldt18-Oct-09 19:24 Trevor Misfeldt 18-Oct-09 19:24
 can this cluster academic paper? Alenty12-May-09 18:06 Alenty 12-May-09 18:06
 Student & Cluster seoulaja20-Apr-09 19:57 seoulaja 20-Apr-09 19:57
 this can be applied to nXn dataset hellohowur11-Apr-09 19:44 hellohowur 11-Apr-09 19:44
 Very Nice Article if you are into Unsupervised Clustering João Rollo de Sá6-Jan-09 0:49 João Rollo de Sá 6-Jan-09 0:49
 Intention based scoring and clustering jconwell5-Jan-09 10:05 jconwell 5-Jan-09 10:05
 Re: Intention based scoring and clustering saharkiz5-Jan-09 15:28 saharkiz 5-Jan-09 15:28
 Re: Intention based scoring and clustering Gayathri Asokan5-Oct-10 17:41 Gayathri Asokan 5-Oct-10 17:41
 Nice Article. Razan Paul (Raju)4-Jan-09 4:00 Razan Paul (Raju) 4-Jan-09 4:00
 Re: Nice Article. jconwell5-Jan-09 5:38 jconwell 5-Jan-09 5:38
 Re: Nice Article. jconwell5-Jan-09 10:07 jconwell 5-Jan-09 10:07
 Re: Nice Article. saharkiz5-Jan-09 13:55 saharkiz 5-Jan-09 13:55
 Last Visit: 31-Dec-99 19:00     Last Update: 6-Feb-23 3:12 Refresh 1

General    News    Suggestion    Question    Bug    Answer    Joke    Praise    Rant    Admin

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.