binary classifier giving wrong results in libsvm

Question

0.00/5 (No votes)

See more:

I have a 584 by 100 data set with each data having 584 feature vectors(total 100 training vectors). I have implemented Libsvm in Java. ((1). trainX size is 584 x 100, (2). biny is the array which has +1 for class one and -1 for class 2, (3). LinearSVMNormVector is the resultant w (weight vector) of the model). Below is my code -

Java

<pre lang="java">// scale train data between 0 and 1
        double[][] trainX_scale = new double[trainX.length][trainX[0].length];
        for (int i = 0; i < trainX.length; i++) {
            double min = Double.MAX_VALUE;
            double max = Double.MIN_VALUE;
            for (int inner = 0; inner < trainX[i].length; inner++) {
                if (trainX[i][inner] < min)
                    min = trainX[i][inner];
                if (trainX[i][inner] > max)
                    max = trainX[i][inner];
            }
            double difference = max - min;
            for (int inner = 0; inner < trainX[i].length; inner++) {
                trainX_scale[i][inner] = (trainX[i][inner] - min)/ difference;
            }
        }

    // prepare the svm node
        svm_node[][] SVM_node_Train = new svm_node[trainX[0].length][trainX.length];

        for (int p = 0; p < trainX[0].length; p++) {
            for (int q = 0; q < trainX.length; q++) {
                SVM_node_Train[p][q] = new svm_node();
                SVM_node_Train[p][q].index = q;
                SVM_node_Train[p][q].value = trainX_scale[q][p];
            }
        }

        double[] biny_SVM = new double[biny.length];// for svm compatible
        for (int p = 0; p < biny.length; p++) {
            biny_SVM[p] = biny[p];
        }

        svm_problem SVM_Prob = new svm_problem();
        SVM_Prob.l = trainX[0].length;
        SVM_Prob.x = SVM_node_Train;
        SVM_Prob.y = biny_SVM;

        svm_parameter SVM_Param = new svm_parameter();
        SVM_Param.svm_type = 0;
        SVM_Param.kernel_type = 2;
        SVM_Param.cache_size = 100;
        SVM_Param.eps = 0.0000001;
        SVM_Param.C = 1.0;
        SVM_Param.gamma = 0.5;

        svm_model SVM_Model = new svm_model();
        SVM_Model.param = SVM_Param;
        SVM_Model.l = trainX[0].length;
        SVM_Model.nr_class = 2;
        SVM_Model.SV = SVM_node_Train;
        //SVM_Model.label = biny;

        // String check =svm.svm_check_parameter(SVM_Prob, SVM_Param); //
        // System.out.println(check);

        double[] target = new double[biny.length];// for svm compatible
        Arrays.fill(target, 0.0);
        svm.svm_cross_validation(SVM_Prob, SVM_Param, 2, target);

        // train the classifier
        svm_model test_model = svm.svm_train(SVM_Prob, SVM_Param);

        /********** get the training results of libsvm **********/

        //double[][] weights1 = test_model.sv_coef;

        double Bias = test_model.rho[0];
        double NumberOfSupportVectors = svm.svm_get_nr_sv(test_model);

        double [] SupportVectorIDs = new int[NumberOfSupportVectors];
        svm.svm_get_sv_indices(test_model, SupportVectorIDs);
        svm_node[][] SV= test_model.SV;
        double [][]SupportVectors=new double [SV.length][SV[0].length];
        for(int ii=0;ii<SV.length;ii++){
            for(int jj=0;jj<SV[0].length;jj++){
                SupportVectors[ii][jj]=SV[ii][jj].value;
            }
        }
        double [] SupportVectorWeights=test_model.sv_coef[0];
        double[] LinearSVMNormVector = new double [SupportVectors[0].length];
        for (int ii=0;ii<msvm[0].SupportVectors[0].length;ii++){
            for (int jj=0;jj<SupportVectors.length;jj++){
                LinearSVMNormVector[ii] += (SupportVectors[jj][ii] * SupportVectorWeights[jj]);
            }

        }

with this model on my test data I am getting more than 90% mis-classification. I am a little confused. Can someone please tell me if there is anything wrong in the classifier set up?

Thanks!

Posted 29-Oct-13 13:42pm

Member 10305598

Updated 29-Oct-13 13:45pm

v3

Add a Solution

2 solutions

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Premith Unnikrishnan · Answer 1 · 2013-10-29T16:32:00

Solution 1

How did you obtain the C and gamma values
try using a grid search approach to find it
http://scikit-learn.org/stable/modules/grid_search.html
The result might improve

Also try a tool like weka to verify if the features are good enough
http://www.cs.waikato.ac.nz/ml/weka/

Posted 29-Oct-13 16:32pm

Premith Unnikrishnan

Comments

Member 10305598 29-Oct-13 22:40pm

Thanks for the answer! Is the LIBSVM setup code(the node and model)fine?

Premith Unnikrishnan · Answer 2 · 2013-10-29T19:22:00

Solution 2

This is my understanding

Each feature represents a row and each column an instance of the training samples
584 rows and 100 columns
If trainX[0].length = 100 and trainX.length = 584
then nodes looks ok

The svm model looks like
A CSVM with rbf kernel which looks ok

Posted 29-Oct-13 19:22pm

Premith Unnikrishnan