Click here to Skip to main content
16,015,658 members
Articles / Artificial Intelligence / Neural Networks
Tip/Trick

Crosstabs/Confusion Matrix for AI Classification Projects

Rate me:
Please Sign up or sign in to vote.
5.00/5 (2 votes)
18 Nov 2017CPOL3 min read 18.2K   299   4  
Construct a confusion matrix or crosstab for binary or multi-level classifier training or validation data in C#

Introduction

Image 1

If you are working with languages like Python and/or using tools like TensorFlow or MatLab, then there are a variety of tools or ways to obtain “evaluation metrics” when performing binary or multi-classification training of neural nets, svms, logistic regressions, or decision forests, etc. If you use tools like SPSS or SAS, obtaining crosstabulations and related statistics is also easy. But when you want to use C#, as often is the case, you need to build your own method.

For C# AI and statistical work, I’ve had good success with the free (but unthreaded) version of ALGLIB, which has fairly good and relatively simple-to-use tools for building and training neural nets and decision forests and for using those trained networks in other C# applications to classify new data. However, I wanted to be able to visualize the results in ordinary crosstabs or “confusion matrix” format. So this CodeProject article presents and describes a method I found useful and wanted to share.

Given two lists, one a list of the “supervised” or “labelled” values used for training and the second a list of the classifications produced by applying the trained network to the same “features” data, along with options for supplying nice variable and value labels, this Crosstabs() method will return a nice “confusion matrix” (a square crosstabs plus a column for row precision and a row for column recall values, plus an overall accuracy), either as text output to the console (as shown below) or as a populated DataGridView control for use in a form (as shown above).

Using the Code

The CrosstabsDemo runs a winform that contains a datagridview control. The Form1.cs code (partially shown below) creates a list (named LabelledData) of NofCases = 10,000 for a variable (say, VOTING) that uniform randomly assumes NofStates = 4. Pretend that this is the supervised or labelled criterion data used for training or validating a network.

C#
using System;
using System.Collections.Generic;
using System.Windows.Forms;

namespace CrosstabsDemo
{
    public partial class Form1 : Form
    {
        public static Random Rand = new Random(0); // Note: random seed is set to 0 
                                                   // for reproducibility

        public Form1()
        {
            InitializeComponent();

            int NofCases         = 10000;
            int NofStates        = 4;
            List<int> LabelledData   = new List<int>();
            List<int> ClassifiedData = new List<int>();

            // Generate some random data
            for (int i = 0; i < NofCases; i++)
            {
                int avalue = Rand.Next(1, NofStates+1);
                int pvalue = avalue + (int)RandomGausian(0, .6); // add a bit of random error
                if (pvalue > NofStates) pvalue = NofStates;
                if (pvalue < 1) pvalue = 1;
                LabelledData.Add(avalue);
                ClassifiedData.Add(pvalue);
            }

            Dictionary<int, string=""> RowValueLabels = new Dictionary<int, string="">()
            {
                { 1,"One" }, { 2,"Two" }, { 3,"Three" },{ 4,"Four" },{ 5,"Five" }
            };

            // For example:
            Crosstabs ct    = 
                      new Crosstabs(ClassifiedData, LabelledData, tablelabel: "VOTING");
            ct.RowLabels    = RowValueLabels;
            ct.ColumnLabels = RowValueLabels; // In a confusion matrix, 
                                              // both vars have the same distinct values

            ct.WritetoConsole();
            ct.View(dataGridView1);
        }
......
    }
}

Next, it creates a second list (named ClassifiedData) that contains a bit of added-in error to simulate the results of applying the trained network to the feature data (used for training) and getting back the network’s predicted classifications. (Or, of course, these lists might simulate a validation data set.)

These two lists are the inputs for the Crosstabs object’s constructor method (along with maybe a criterion variable name like VOTING). The initial Crosstabs object (named ct in the demo) is a simple numerical integer array [,] which you can retrieve using the ct.GetTableArray property.

Optionally, for more readable output, you can add row and column value labels using the ct.RowLabels and ct.ColumnLabels properties. (If you add fewer row or column labels, than there are distinct state values, remaining labels will be padded as empty strings. If you add more labels than values, those will be ignored.)

Then you can write as text, a nicely formatted results table to the console window (probably just for testing) by calling the ct.WritetoConsole() method, as shown below. Or better (when using a form), by supplying the name of the Form1.datagridview control to the ct.View() method, you get a more attractive output visualization of the crosstabs/confusion matrix results, as shown above. Either output is easy to copy and paste into other documents.

Image 2

Points of Interest

There is little, if any, clever programming involved here, but this sort of tool has been very useful in my work for visualizing the results of AI trainings and validations using C#. I hope readers may find it useful directly or with minor modifications and/or additions for similar needs. For example, if you just want crosstabs (with a square or rectangular grid), this method will do that. If you want other statistical tests (Chi Square, F-ratio, etc.), they would be simple to add.

History

  • 13th November, 2017: Original article
  • 18th November, 2017: Update
    • Redesigned crosstabs.cs to correct issues when the crosstabs or confusion matrix is rectangular: e.g., if a trained model does not classify any cases for some labelled values. Changed the type for row and column value labels from List<string> to Dictionary<int><string>.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
CEO Academic Software, Inc.
United States United States
My public profile: http://www.linkedin.com/in/warrenlacefield
My research profile: https://www.researchgate.net/profile/Warren-Lacefield
My company homepage: http://www.acsw.com

Comments and Discussions

 
-- There are no messages in this forum --