Click here to Skip to main content
11,644,286 members (64,748 online)
Rate this: bad
good
Please Sign up or sign in to vote.
See more: (untagged)
Hello
Im doing K-means clustering and am about to implement the Mahalanobis distance. I have a problem with sometimes the matrix is singular. Im not really sure what it means in this case and what to do about it? Im fairly sure that my code is ok, but here is the code for calculating the covariance matrix:

public static Matrix CovarianceMatrix(List<double[]> dataset)
        {
            /*
                cov_xx cov_xy ...
                cov_yx cov_yy ...
                ...
             */

            //Calculate mean for this cluster
            // cov_xx = sum[x*x]/n, cov_xy = sum[x*y]/n
            double[] means = new double[dataset[0].Length];
            Matrix cov = new Matrix(dataset[0].Length, dataset[0].Length);
            double sum = 0;

            for (int i = 0; i < dataset[0].Length; i++)
            {
                for (int j = 0; j < dataset.Count; j++)
                {
                    means[i] += dataset[j][i];
                }
                means[i] /= dataset.Count;
            }

            double[,] subresults = new double[dataset[0].Length, dataset.Count];
            for (int j = 0; j < dataset.Count; j++)
            {
                for (int i = 0; i < dataset[0].Length; i++)
                {
                    subresults[i, j] = dataset[j][i] - means[i];
                }
            }
            
            //fill covariance
            for (int i = 0; i < dataset[0].Length; i++)
            {
                for (int j = i; j < dataset[0].Length; j++)
                {
                    double s = 0;
                    for (int x = 0; x < dataset.Count; x++)
                    {
                        s += subresults[i, x] * subresults[j, x];
                    }
                    cov.SetElement(i, j, s / dataset.Count);
                    if (i != j) cov.SetElement(j, i, s / dataset.Count);
                }
            }
            return cov;
        }


And here for the distance:
        public static double Mahalanobis(double[] vector1, double[] vector2, Matrix covariance)
        {
            Matrix v1 = new Matrix(vector1, vector1.Length);
            Matrix v2 = new Matrix(vector2, vector2.Length);
            Matrix m = v1.Subtract(v2);

            return (double)(m.Transpose()).Multiply(covariance.Inverse()).Multiply(m).GetElement(0, 0);
        }

If more information (or comments), or a working code sample is perfered, let me know. However, some times it can cluster without problem, so I think it is more about how to handle the singularity than the code it self.

Looking forward to hear from you

modified on Friday, May 22, 2009 5:32 AM
Posted 21-May-09 22:41pm

1 solution

Rate this: bad
good
Please Sign up or sign in to vote.

Solution 1

A singular matrix has a determinant of zero. That means you can't invert it.

That's probably happening when you're inverting your covariance matrix: covariance.Inverse()

It also means that your covariance matrix isn't positive semi-definite and therefore it's not invertable. So the vectors you are seding to the Mahalanobis function are probably linear combinations of one another.

If these are random vectors, it could be that a component of the vector is extraneous. Better check your vectors.
  Permalink  
  Print Answers RSS
0 OriginalGriff 610
1 jyo.net 484
2 Afzaal Ahmad Zeeshan 453
3 Sergey Alexandrovich Kryukov 369
4 CPallini 352
0 OriginalGriff 925
1 DamithSL 636
2 Afzaal Ahmad Zeeshan 622
3 Sergey Alexandrovich Kryukov 548
4 CPallini 521


Advertise | Privacy | Mobile
Web04 | 2.8.150731.1 | Last Updated 22 May 2009
Copyright © CodeProject, 1999-2015
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100