Hello

Im doing K-means clustering and am about to implement the Mahalanobis distance. I have a problem with sometimes the matrix is singular. Im not really sure what it means in this case and what to do about it? Im fairly sure that my code is ok, but here is the code for calculating the covariance matrix:

<br />public static Matrix CovarianceMatrix(List<double[]> dataset)<br /> {<br /> /*<br /> cov_xx cov_xy ...<br /> cov_yx cov_yy ...<br /> ...<br /> */<br /><br /> //Calculate mean for this cluster<br /> // cov_xx = sum[x*x]/n, cov_xy = sum[x*y]/n<br /> double[] means = new double[dataset[0].Length];<br /> Matrix cov = new Matrix(dataset[0].Length, dataset[0].Length);<br /> double sum = 0;<br /><br /> for (int i = 0; i < dataset[0].Length; i++)<br /> {<br /> for (int j = 0; j < dataset.Count; j++)<br /> {<br /> means[i] += dataset[j][i];<br /> }<br /> means[i] /= dataset.Count;<br /> }<br /><br /> double[,] subresults = new double[dataset[0].Length, dataset.Count];<br /> for (int j = 0; j < dataset.Count; j++)<br /> {<br /> for (int i = 0; i < dataset[0].Length; i++)<br /> {<br /> subresults[i, j] = dataset[j][i] - means[i];<br /> }<br /> }<br /> <br /> //fill covariance<br /> for (int i = 0; i < dataset[0].Length; i++)<br /> {<br /> for (int j = i; j < dataset[0].Length; j++)<br /> {<br /> double s = 0;<br /> for (int x = 0; x < dataset.Count; x++)<br /> {<br /> s += subresults[i, x] * subresults[j, x];<br /> }<br /> cov.SetElement(i, j, s / dataset.Count);<br /> if (i != j) cov.SetElement(j, i, s / dataset.Count);<br /> }<br /> }<br /> return cov;<br /> }<br />

And here for the distance:

<br /> public static double Mahalanobis(double[] vector1, double[] vector2, Matrix covariance)<br /> {<br /> Matrix v1 = new Matrix(vector1, vector1.Length);<br /> Matrix v2 = new Matrix(vector2, vector2.Length);<br /> Matrix m = v1.Subtract(v2);<br /><br /> return (double)(m.Transpose()).Multiply(covariance.Inverse()).Multiply(m).GetElement(0, 0);<br /> }<br />

If more information (or comments), or a working code sample is perfered, let me know. However, some times it can cluster without problem, so I think it is more about how to handle the singularity than the code it self.

Looking forward to hear from you

modified on Friday, May 22, 2009 5:32 AM