Image Classification Using Neural Networks in .NET

hemanthk119

5.00/5 (8 votes)

Jun 26, 2019

CPOL

15 min read

27829

980

Image Classification implementation using Deep Belief Networks and Convolutional Neural Networks in .NET

Basic Theory

Image classification is one of the most common use cases for non-recurrent neural networks. The basic concept is that a neural network is given an input image, whose input layer has the same number of neurons as the pixels in the image (assuming the image is grayscale). Also depending on the number of classifications to be made available, this neural network should have the same number of output neurons. The neural network could use either convolutional, fully connected layers or a combination of both. Convolutional networks are faster as they squish the input image and convolute them using multiple kernels to extract important features. More details on convolution can be found here. Convolution greatly reduces the size of the fully connected networks which are used to classify the image after series of convolutions and pooling. As the neural network using appropriate activation functions can only have inputs and outputs as a double ranging from 0 to 1, to input an image to a neural network will require some pre-processing on the input end to normalize the pixels into this form. Code implementation of classification using deep belief networks and convolution neural networks can be found in later sections in this article.

Pre-processing the Images

Grey-scaling then shrinking the original image to 50x50 pixels will drastically reduce the number of input neurons to about 2500. This will also reduce the complexity and features that the neural network needs to analyse. The EmguCV library, which is a .NET wrapper around OpenCV, is very handy at doing these tasks. Equalizing the histogram could also be done but just grey scaling the image preserves the features more naturally. Any size like 75x75 or 100x100 can be used but increasing the resolution will also increase the number of input neurons and drastically increase the training and detection time when non-convolution networks are used.

public static Image<Gray, Byte> 
    ConvertOriginalImageToGrayScaleAndProcess(Image<Bgr, Byte> orginalImage)
{
    var grayScale = orginalImage.Convert<Gray, Byte>();
    return grayScale.Resize(50,50, Emgu.CV.CvEnum.Inter.Cubic, false);
}

Normalizing the Pixel Values

As the pixel values are in bytes, each greyscale pixel value is divided by 256 to convert it to value ranging from 0 to 1 so that it can be fed into a neural network. EmguCV image has a property called Bytes, which gives us an array of pixel values.

public static double[] GetNetworkFeedArray(Image<Gray, Byte> image)
{
    var imageBytes = image.Bytes;
    double[] networkFeed = new double[imageBytes.Count()];
    for (int i = 0; i < imageBytes.Length; i++)
    {
        networkFeed[i] = ((double)imageBytes[i] / 256);
    }
    return networkFeed;
}

Training Networks

Every neural network can be training using an associated algorithm that can be chosen. But regardless of the kind of neural network or training algorithm used, a dataset is provided to it with input data and corresponding output data. For classification, the expected output values will either be 0 or 1. Below is an example with three classifications of input data. Each output neuron’s output is abstract on its own, but can be associated with a classification type, which could be something like “car”, “bird”, “cat”, “human”, etc.

After training with this dataset for many iterations where the error rate keeps decreasing (provided appropriate parameters to the training algorithm and other hyperparameters), the network achieves a state where it can produce valid outputs from what sense it made of the input. All normalized image data has to be converted to a double jagged array of combined images so that a neural network can be trained with it. This step is common, regardless of which kind of neural network is selected. Combining the images into batches instead of training the network one image at a time speeds up the training process.

public (double[][], double[][]) GetBatchDataFromImages(IPagedList<LocalImage> localImages)
{
    var numberOfImages = localImages.Count;
    double[][] batchInputs = new double[numberOfImages][];
    double[][] batchOutputs = new double[numberOfImages][];
    foreach (int i in Enumerable.Range(0, numberOfImages))
    {
        var currentLocalImage = localImages[i];
        (double[] normalizedImageData, ImageType imageType) = 
             LocalImage.GetImageInformationForNeuralNetwork(currentLocalImage);
        batchInputs[i] = normalizedImageData;
        batchOutputs[i] = new double[] 
                { currentLocalImage.ImageType == ImageType.CAT ? 1 : 0, 
                  currentLocalImage.ImageType == ImageType.CAT ? 0 : 1 };
    }
    return (batchInputs, batchOutputs);
}

Generating Outputs

The neural network outputs values ranging from 0 - 1 from the output neurons after forwarding/computing the input values. For classification, the output values have to be thresholded as the generated values will mostly not be a concrete 0 or 1. These values can also be used to generate probability of valid classification. Below is the code used to classify whether an image is a “cat” or not and return a Boolean value, possible as this neural network has just two outputs. Here, the code uses an upper threshold of 0.95 and lower threshold of 0.05. These values are arbitrary and values closer to 1 and 0 can be used, depending on the amount of training the neural network has gone through and to balance with amount of overfitting. This approach can be scaled to any number of classifications.

private static bool DetectCat(INeuralNetwork neuralNetwork, Image<Gray, Byte> image)
{
    double[] networkFeed = LocalImage.GetNetworkFeedArray(image);
    var networkOutput = neuralNetwork.GenerateOutput(networkFeed);
    var outputValue = networkOutput[0];
    var complementaryOutputValue = networkOutput[1];
    return outputValue > 0.95 && complementaryOutputValue < 0.05;
}

Classification using Fully Connected Networks

Fully connected neural networks with enough number of hidden layers can classify any dataset of images, and increasing the data points in the dataset will correspond to increasing the number of neurons in the hidden layers for a proper prediction. As these are very simple feed forward networks and the number of connections increases exponentially with each neuron added to the hidden or input layers, they both forget things too easily and require a lot of time to train. This network can only be training by learning. Fine tuning the learning rate and momentum of the training class (either be backpropagation or resilient backpropagation) according to the use case and input dataset length gives us the best results with this approach. Tests have shown that about 90% accuracy rate can be achieved on training dataset using a fully connected networks, although results can vary. For increased accuracy up to 100%, Deep Belief Networks can be used.

Classification using Deep Belief Networks

Deep belief networks are a special kind of neural networks where for it to properly learn, it has to pass through the stages of unsupervised learning on each layer and then followed by supervised learning on the whole network. It is defined as a stack of Restricted Boltzmann Machines where each layer communicates with both the previous and next layers.

More information on these networks can be found here and here. But even though it seems complicated at first, this network just needs to be trained using the whole dataset on each restricted Boltzmann machines first in an unsupervised fashion and after a lower threshold level of error rates, training can move on to supervised learning phase on whole network using the same backpropogation or resilient backpropogation training classes which are also used for fully connected networks. Implementation and code for this kind of network using Accord.NET where 100% accuracy rate is achieved on the training data set of 500 images is explained in a later section in this article.

Classification using Convolutional Neural Networks

Convolution is a powerful image processing tool where a kernel is used to transform an image by iterating over it and performing calculations using this kernel. Kernels of different kinds can be used to create an edge detection, burring and many other effects. Every convolutional neural network typically consists of a convolution layer, followed by a rectified linear activation layer and then a pooling layer. Sets of these three layers can be used in series with different parameters to create a neural network that is well suited for image classification problems. The output of the convolutional layers is then fed into a fully connected network which will usually be much smaller than networks of deep belief networks and similar network types. As convolution of image using appropriate kernel extracts certain features from the image and then the rectified linear activation function eliminates negative values and then finally a pooling layer decreases the dimensions of the output while preserving important features, it is well fit to extract proper features that define a classification, however abstract it might be, like the difference between a dog and a cat.

Convolutional layers placed before the fully connected network is basically a very simple but powerful pre-processing step to decrease the dimensions of the input while preserving the important features. In the end, the fully connected network produces the same kind of classification output as with any regular neural network. In .NET, a nuget package ConvNetSharp is available with which any kind of convolutional neural network can be created using the different order of layers of various kinds. Code implementation of image classifier using this library is detailed in a later section of this article.

Image Localization using Anchor Boxes

Any classification neural network can be modified to perform localization of the objects in an image. Localization basically means location of the object is in the image can be determined and a box can be drawn around it. Anchor boxes are randomly generated areas in the image where each area can be scaled up to the input size and fed into a neural network to determine if this area can be classified into any of defined classification types. If an area is properly classified, a rectangle is drawn around it and it can be concluded that this area contains that object in question. Usually about 1000-2000 random anchor boxes are generated, whose corresponding areas are fed into the neural network in sequence and output analysed. Below are examples of randomly generated anchor boxes over an image.

After processing each anchor box through the classification network, rectangles can be drawn around positive detections. Valid rectangles which are inside each other can be combined into a final bounding rectangle. Below is the code to generate 1000 random anchor boxes.

public static AnchorBox GenerateNew()
{
    double x = (random.NextDouble()) / (double)1.1;
    double y = (random.NextDouble()) / (double)1.1;
    var smallerValue = x > y ? x : y;
    double width = random.Next(100000, 
                   (int)(((double)1 - smallerValue) * 10000000)) / (double)10000000;
    double height = width;
    return new AnchorBox()
    {
        X = x,
        Y = y,
        Width = width,
        Height = height
    };
}

public static List<AnchorBox> GenerateRandomAnchorBoxes()
{
    var anchorBoxList = new List<AnchorBox>() { new AnchorBox() 
                   { Height = 1, Width = 1, X = 0, Y = 0 } };
    foreach (int i in Enumerable.Range(0, 1000))
    {   
        anchorBoxList.Add(AnchorBox.GenerateNew());
    }
    return anchorBoxList;
}

Each anchor box is used to produce a ROI (Region of Interest) where EmguCV can be used to crop the original image to produce another image which is scaled to the same dimensions of original image. Running these regions through the neural network determines if the object is present in those sub-regions. All detected anchor boxes are combined to produce the final bounding rectangle.

private static (Image<Gray, Byte>, System.Drawing.Rectangle) 
    GetAreaUnderAnchorBox(Image<Bgr, Byte> originalImage, AnchorBox anchorBox)
{
    var originalImageCopy = originalImage.Copy();
    var rectangle = GetRectangleFromAnchroBox(originalImageCopy, anchorBox);
    originalImageCopy.ROI = rectangle;
    var croppedImage = LocalImage.ConvertOriginalImageToGrayScaleAndProcess
                       (originalImageCopy.Copy());
    originalImageCopy.ROI = System.Drawing.Rectangle.Empty;
    return (croppedImage, rectangle);
}

After the anchor boxes are generated, determination of which areas have valid detections can be done. Below is the implementation to detect cats in an image.

private static List<System.Drawing.Rectangle> GetDetectionRectangles
 (INeuralNetwork neuralNetwork, Image<Bgr, Byte> originalImage, Action<double> progressUpdater)
{
    List<System.Drawing.Rectangle> DetectedRectangles = new List<System.Drawing.Rectangle>();
    var anchorBoxes = AnchorBox.GenerateRandomAnchorBoxes();
    int counter = 0;
    Parallel.ForEach(anchorBoxes, new ParallelOptions() 
             { MaxDegreeOfParallelism = 10 }, (anchorBox) => {
        (var croppedImage, var rectangle) = GetAreaUnderAnchorBox(originalImage, anchorBox);
        var catDetected = DetectCat(neuralNetwork, croppedImage);
        if (catDetected)
        {
            DetectedRectangles.Add(rectangle);
        }
        progressUpdater(((double)counter / (double)anchorBoxes.Count()) * 100);
        counter++;
    });
    return DetectedRectangles;
}

This method of localization is very slow as thousands of areas are passed through the network in a loop, but this is the easiest to implement.

Implementation of Deep Belief Network using Accord.NET for Classification

This library is available on nuget and the following command installs it. The documentation is available here.

As mentioned earlier in this article, a deep belief network learns properly by unsupervised learning on each restricted boltzmann machine of a deep belief network, followed by supervised learning on the whole network. Hence, two different teacher algorithms are needed to do this. Accord.NET has many classes to play with but for creating a deep belief network, there is a DeepBeliefNetwork class. For unsupervised learning, there’s a DeepBeliefNetworkLearning class and for supervised learning BackpropagationLearning, ResilientBackpropagationLearning or ParallelResilientBackpropagationLearning classes are available to choose. The deep belief network is initialized with random weights using a NguyenWidrow class. The code below is used for binary classification hence the number of output neurons are just 2. It has two hidden layers with 1200 and 600 neurons in them respectively.

public AccordNetwork()
{
    network = new DeepBeliefNetwork(new BernoulliFunction(), inputLength, 1200, 600, 2);
    new NguyenWidrow(network).Randomize();
    network.UpdateVisibleWeights();
    unsuperVisedTeacher = GetUnsupervisedTeacherForNetwork(network);
    supervisedTeacher = GetSupervisedTeacherForNetwork(network);
}

private DeepBeliefNetworkLearning 
      GetUnsupervisedTeacherForNetwork(DeepBeliefNetwork deepNetwork)
{
    var teacher = new DeepBeliefNetworkLearning(deepNetwork)
    {
        Algorithm = (hiddenLayer, visibleLayer, i) => 
                     new ContrastiveDivergenceLearning(hiddenLayer, visibleLayer)
        {
            LearningRate = 0.1,
            Momentum = 0.5
        }
    };
    return teacher;
}

private ResilientBackpropagationLearning GetSupervisedTeacherForNetwork
                                         (DeepBeliefNetwork deepNetwork)
{
    var teacher = new ResilientBackpropagationLearning(deepNetwork)
    {
        LearningRate = 0.1
        //Momentum = 0.5
    };
    return teacher;
}

Training Unsupervised on all Restricted Boltzmann Machines

From the dataset, only the inputs are used to train in an unsupervised fashion. Below is the code which takes the inputs from our dataset and trains all restricted boltzmann machines by iterating over them.

private void TrainUnsupervised(double[][] batchInputs, int iterations, 
                               Action<double, int, string> progressCallback)
{
    for (int layerIndex = 0; layerIndex < network.Machines.Count - 1; layerIndex++)
    {
        unsuperVisedTeacher.LayerIndex = layerIndex;
        var layerData = unsuperVisedTeacher.GetLayerInput(batchInputs);
        foreach (int i in Enumerable.Range(1, iterations))
        {
            var error = unsuperVisedTeacher.RunEpoch(layerData) / batchInputs.Length;
            if (progressCallback != null)
            {
                progressCallback(error, i, $"Unsupervised Layer {layerIndex}");
            }
            if (this.ShouldStopTraning)
            {
                this.ShouldStopTraning = false;
                break;
            }
        }
    }
}

Training Supervised on Whole Network

This second step requires both inputs and outputs from the dataset. These are used to train a network in supervised fashion. Below is the code which passes the datasets to a supervised teacher and reports back the error rate. This error rate can be visualized in a graph or for other purposes.

private void TrainSupervised(double[][] batchInputs, double[][] batchOutputs, 
                 int iterations, Action<double, int, string> progressCallback)
{
    foreach (int i in Enumerable.Range(1, iterations))
    {
        var error = supervisedTeacher.RunEpoch(batchInputs, batchOutputs) / batchInputs.Length;
        if (progressCallback != null)
        {
            progressCallback(error, i, "Supervised");
        }
        if (this.ShouldStopTraning)
        {
            this.ShouldStopTraning = false;
            break;
        }
    }
}

Calculating the Output

Calculating the output of a single image input is done in a single line. The length of the output array is the same as the number of output neurons in the network.

public double[] GenerateOutput(double[] inputs)
{
    return network.Compute(inputs);
}

Serializing/Deserializing the Trained Model

After training any network successfully, serialization of its state can be done so that it can be saved, loaded and tested later anywhere else. There are two methods in Accord.NET classes, Load and Save for these purposes.

public void LoadNetworkFromFile(string filePath)
{
    network = DeepBeliefNetwork.Load(filePath);
    supervisedTeacher = GetSupervisedTeacherForNetwork(network);
    unsuperVisedTeacher = GetUnsupervisedTeacherForNetwork(network);
}

public void SaveNetwork(string filePath)
{
    network.Save(filePath);
}

Implementation of Convolutional Neural Network using ConvNetSharp for Classification

The below command installs ConvNetSharp in the project. It is still in alpha release but works very well. The documentation is available here.

In ConvNetSharp library convolutional, pooling layers and fully connected layers are available to choose from to build a neural network specific to classify images. In the code below, the parameters and sequence of layers are specified by the constructor.

public ConvNetSharpNetwork()
{
    network = new Net<double>();

    network.AddLayer(new InputLayer(50, 52, 1));
    network.AddLayer(new ConvLayer(3, 3, 8) { Stride = 1, Pad = 2 });
    network.AddLayer(new ReluLayer());
    network.AddLayer(new PoolLayer(3, 3) { Stride = 2 });
    network.AddLayer(new ConvLayer(3, 3, 16) { Stride = 1, Pad = 2 });
    network.AddLayer(new ReluLayer());
    network.AddLayer(new PoolLayer(3, 3) { Stride = 2 });
    network.AddLayer(new ConvLayer(3, 3, 32) { Stride = 1, Pad = 2 });
    network.AddLayer(new FullyConnLayer(20));
    network.AddLayer(new FullyConnLayer(50));
    network.AddLayer(new FullyConnLayer(2));
    network.AddLayer(new SoftmaxLayer(2));

    trainer = GetTrainerForNetwork(network);
}

The above network contains three convolutional layers and a fully connected neural network in the end for classification. Number of neurons in this fully connected neurons is way less than what is needed for fully connected networks.

The teacher for this network is a stochastic gradient descent – SgdTrainer class provided by the same library.

private TrainerBase<double> GetTrainerForNetwork(Net<double> net)
{
    return new SgdTrainer<double>(net)
    {
        LearningRate = 0.003
    };
}

Training the Network

In this convolution network, the inputs and output must be converted into Volumes of same Shape as the dimensions of the input and output layers. There are helper classes available in ConvNetSharp to help do this easily. First, the images are combined into batches and then are converted to Volumes to speed up the training process. Batches of size 50 works perfectly well for this network, balancing the time to converge and accuracy.

public void BatchTrain(double[][] batchInputs, double[][] batchOutputs, 
            int iterations, Action<double, int, string> progressCallback)
{
    trainer.BatchSize = batchInputs.Length;
    foreach (int currentIteration in Enumerable.Range(1, iterations))
    {
        Randomizer.Shuffle(batchInputs, batchOutputs);
        (Volume<double> inputs, Volume<double> outputs) = 
                GetVolumeDataSetsFromArrays(batchInputs, batchOutputs);
        trainer.Train(inputs, outputs);
        var error = network.GetCostLoss(inputs, outputs);
        if (progressCallback != null)
        {
            progressCallback(error, currentIteration, "Supervised");
        }
        inputs.Dispose();
        outputs.Dispose();
        if(this.ShouldStopTraning)
        {
            this.ShouldStopTraning = false;
            break;
        }
    }
}

The double array input data is first converted into a compatible Shape and Volume, fed into the network and the output of this network is a volume which is converted to a double array and returned.

public double[] GenerateOutput(double[] inputs)
{
    Volume<double> inputImageData = BuilderInstance<double>.Volume.From
                                    (inputs, new Shape(50, 52, 1));            
    Volume<double> calculatedPrediction = network.Forward(inputImageData);
    inputImageData.Dispose();
    return calculatedPrediction.ToArray();
}

Serializing/Deserializing the Trained Model

In ConvNetSharp, FromJson and ToJson are two methods available for serializing and deserializing the network model.

public void LoadNetworkFromFile(string filePath)
{
    var networkJSON = File.ReadAllText(filePath);
    network = SerializationExtensions.FromJson&lt;double&gt;(networkJSON);
    trainer = GetTrainerForNetwork(network);
}

public void SaveNetwork(string filePath)
{
    var networkJSON = network.ToJson();
    FileHelper.WriteTextAsync
         (filePath, Encoding.ASCII.GetBytes(networkJSON)).RunSynchronously();
}

The Application – Cat Classifier

Using the techniques detailed above, the application part of this article is implementing a cat classifier where classification of an image into a binary classification of “cat”, or “not cat” is done. The application created using WPF for ease of use by users and for the looks of it. It will run under .NET 4.5, hence users must download the .NET 4.5 runtime if they are still under .NET 4 in case of older Windows 7 OS. The source code of the application is also linked above which opens with Visual Studio, for the more code-inclined.

Network Selection

The first window that the app shows is the network selection window which can be used to choose one of the two implemented neural networks, from either Accord.NET or ConvNetSharp. An interface INeuralNetwork is defined to make sure networks from different sources can operate seemlessly with a common code and UI.

Once the network is selected, it can be used to train or test using the next window.

Data Collection

The images are collected using a CSV file which contains URL of the image and the type of image, if it is a cat or not. The file looks like this:

ImageType,Url
CAT,https://proxy.duckduckgo.com/iu/?u=http%3A%2F%2Fi.dailymail.co.uk% ...
CAT,https://proxy.duckduckgo.com/iu/?u=https%3A%2F%2Ftse3.mm.bing.net% ...
NOT_CAT,https://proxy.duckduckgo.com/iu/?u=https%3A%2F%2Ftse3.mm.bing. ...
NOT_CAT,https://proxy.duckduckgo.com/iu/?u=https%3A%2F%2Ftse2.mm.bing. ...
NOT_CAT,https://proxy.duckduckgo.com/iu/?u=https%3A%2F%2Ftse3.mm.bing. ...

Using the CsvHelper library, this file can be read into ‘Remote Images’ list. In the UI, these images can be downloaded using the ‘Train Network’ window and the images get collected into a directory under “Images/” and into the UI into the ‘Local Images’ list. About 500 images are collected in the csv file provided in the zip file in the link above, along with the application. The name if the file is 'remote-image-urls.csv'.

If the images are already downloaded and exist in the folder “Images/CAT” for cats and “Images/ NOT_CAT” for non-cats, the UI can be used to load them into the application by the “Load Images Form Disk” button.

These images will then be ready to be trained or tested using the selected neural network.

Testing the Selected Network

In the application window, there is a button “Load Model” where a pre-trained model can be selected to continue testing the application. The Accord.NET network models have an extension ‘*.accmodel’ and ConvNetSharp network models have ‘*.convmodel’. Once the model is selected, the application is ready to test using any image by dragging and dropping image into the window or testing all the local images at once in the ‘Train Network’ window. Currently, both models included in the “Models/” folder of the application have 100% accuracy rate against the trained 500 images.

Training the Selected Network

Once images are loaded into the Local Image List, the training can start with “Start Training with Images” button. The error rate can be monitored in the “Show Error Graph” window. Decreasing error rates is a sign of a proper parameters and training. As images are passed to neural network in batches, the graph will appear in the following way as sets of images are being trained. The error rate reaches a lower threshold and then the training proceeds to the next batch of images, shooting the error rate back up.

The batch size and thresholds vary in both the networks, selected after some trial and error.

ConvNetSharp Parameters

private int batchSubSize = 50;
private int maxIterationsOnEachPage = 500;
private double stopPageTrainingErrorRate = 0.02;
private double stopIterationTrainingErrorRate = 1;

Accord.NET Parameters

private int batchSubSize = 500;
private int maxIterationsOnEachPage = 1000;
private int maxUnSupervisedIterationsOnEachPage = 100;
private double stopPageTrainingErrorRate = 0.0001;
private double stopIterationTrainingErrorRate = 0.001;

Footnote

With the advent of Keras, CNTK, TensorFlow, etc. creating complex neural networks has become a breeze and .NET is also has caught up. Once ConvNetSharp comes out of alpha, it can be reliably used to create equally powerful neural networks as it also includes support for ‘Computational Graph’ feature. Example of a network that can be created with this feature is given as a diagram here.

Each neural network is by itself very dumb, but combining multiple networks of different kinds and functions in a specific order, including recurrent networks along with a bit of procedural code, could create an intelligent network which we could call ‘AI’.

Image Classification Using Neural Networks in .NET

Basic Theory

Pre-processing the Images

Normalizing the Pixel Values

Training Networks

Generating Outputs

Classification using Fully Connected Networks

Classification using Deep Belief Networks

Classification using Convolutional Neural Networks

Image Localization using Anchor Boxes

Implementation of Deep Belief Network using Accord.NET for Classification

Training Unsupervised on all Restricted Boltzmann Machines

Training Supervised on Whole Network

Calculating the Output

Serializing/Deserializing the Trained Model

Implementation of Convolutional Neural Network using ConvNetSharp for Classification

Training the Network

Serializing/Deserializing the Trained Model

The Application – Cat Classifier

Network Selection

Data Collection

Testing the Selected Network

Training the Selected Network

ConvNetSharp Parameters

Accord.NET Parameters

Footnote

References