Click here to Skip to main content
Click here to Skip to main content

Tagged as

Go to top

Self-organizing Map Implementation

, 25 Jul 2014
Rate this:
Please Sign up or sign in to vote.
Get real with an implementation of a mini SOM project.

Click on the following image to view a demo video on YouTube

Introduction

In my previous article Self-organizing Map Demystified, you have learned the concept, architecture, and algorithm of self-organizing map (SOM). From here on, you will embark on a journey to designing and implementing a mini SOM for clustering of handwitten digits. For training the SOM, I have obtained the training dataset from Machine Learning Repository of the Center for Machine Learning and Intelligent Systems. MATLAB will be used for programming the SOM. Some pf the design considerations are generic and can be applied in other types of machine learning projects. Let's get started...

Preparing the Ingredients

The original training dataset file is called "optdigits-orig.tra.Z". It consists of a total of 1934 handwritten digits from 0 to 9 collected from a total of 30 people. Each digit sample has been normalized to 32x32 binary image that has 0 or 1 for each pixel. The distribution of the training dataset is considered well balanced as shown in Table 1. It is important to have balanced training dataset where the number of samples in every class is more or less equal so as to prevent biases in training, e.g. classes with overwhelmingly large number tend to get chosen more often than those in minority thus affecting the accuracy and reliability of the training result.

Table 1: Distribution of Classes
Class Number of Samples
0 189
1 198
2 195
3 199
4 186
5 187
6 195
7 201
8 180
9 204

Figure 1 shows some of the binary images contained in the training dataset:

Figure 1: Some of the Binary Images in the Training Dataset

In the original data file, each block of 32x32 binary image is followed by a class label that indicates the digit that this sample belongs to. For example, the "8" bitmap block (in Figure 1) is followed by its class label of 8 and so on. To facilitate processing by MATLAB, I have further pre-processed the data as such:

  1. Separate the class labels from their bitmaps in the original data file into two different files.

  2. Make the class labels, a total of 1934 digits, into a 1x1934 vector called "train_label"and save it as a MATLAB data file called "raining_label.mat".

  3. Make the bitmap data, from its original form of 61888(rows)x32(columns) bits after removing their class labels, into a 1024x1934 matrix called "train_data", and save it as "training_data.mat". In the train_data matrix, each column represents a training sample and each sample has 1024 bits, i.e. each training sample has been transformed from its original 32x32 bitmap into a 1024x1 vector.

These two files - "training_label.mat" and "training_data.mat" are available for download. Unzip and placed them in a folder, say "som_experiment".

If you are curious to see how these digits look like, fire up MATLAB, set the Current Folder to "som_experiment", enter the following code inside the Command Window, and run it.

% View a Digit from the train_data

  clear

  clc

  load 'training_data';

  img = reshape(train_data(:,10), 32, 32)';

  imshow(double(img));

This code will:

  • load the "training_label.mat" file that contains the 1024x1934 train_data matrix into the memory;

  • train_data(:, n) will extract the nth column vector (1024x1) vector from train_data matrix, where n coincides with the position of the digits in the dataset. In this code, the n is 10, so it will extract the tenth digit, you can change it to any number up to the total number of digits in the dataset, i.e. 1934;

  • reshape(train_data(:,10), 32, 32)' will first convert the column vector into a row vector and then reshape the row vector into a 32x32 matrix, effectively reverting it back to its original shape like those shown in Figure 1;

  • imshow(double(img)) will display the digit as binary image where pixels with the value zero are shown as black and 1 as white, such as

    Figure 2: A Binary Image on Screen

Setting the Parameters

Let's set the parameters for subsequent development.

  • The size of the SOM map is 10x10.

  • The total iteration is set at 1000. In this experiment, we will only attempt the first sub-phase of the adaptation phase.

  • The neighborhood function:

    where

    is the Euclidean distance from neuron j to the winner neuron c(X).

    is the effective width of the topological neighborhood at nth iteration.

    where

    is the initial effective width which is set to be 5, i.e. the radius of the 10x10 map.

    is the time constant

  • The weight updating equation:

    where

    is the time-varying learning rate at nth iteration and is computed as such:

    where

    is the initial learning rate which is set to 0.1.

    is a time constant which is set to .

We have designed the parameters for the mini SOM. It is time to make things happend.

Ready to Cook (Code)

The MATLAB script for implementing the min SOM is saved in this file called "training_som.m" and is available for download. I have created this script as "proof of concept" for the sole purpose of reinforcing the learning of SOM algorithm. You are free to implement it in any programming languages. After all, programming languages are just media of implementation of problem solving techniques using computer.

Unzip and placed it in the "som_experiment" folder. Open the "training_som.m" script in MATLAB's Editor, and you will see the recipe as shown below. The code has been painstakingly commented and therefore sufficiently self-explanatory. Nevertheless, I will still round up those parts of the code that correspond to the various phases of SOM algorithm so that you can understand and related them better.

% ==========================================================
%
%             Self-organizing Map
%       Clustering of Handwritten Digits
%               training_som.m
%                 Peter Leow
%                10 July 2014
%
% ==========================================================

% clean up the previous act
close all;
clear;  % delete all memory
clc;    % clear windows screen
clf;    % clear figure screen
shg;    % put figure screen on top of all screens

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Ground breaking!
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Load training_data.mat that consists of train_data
% of 1024x1934 matrix
% Consists of a total of 1934 input samples and
% each sample posseses 1024 attributes (dimenstions)
load training_data;

% dataRow = number of attributes (dimensions) of each sample, i.e. 1024
% dataCol = total number of training samples, i.e. 1934
[dataRow, dataCol] = size(train_data);

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% SOM Architecture
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Determine the number of rows and columns of som map
somRow = 10;
somCol = 10;

% Initialize 10x10x1024 som matrix
% The is SOM Map of 10x10 neurons
% Each neuron carries a weight vector of 1024 elements
som = zeros(somRow, somCol, dataRow);

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Parameters Settings
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Max number of iteration
N = 1000;

% Initial effective width
sigmaInitial = 5;

% Time constant for sigma
t1 = N / log(sigmaInitial);

% Initialize 10x10 matrix to store Euclidean distances
% of each neurons on the map
euclideanD = zeros(somRow, somCol);


% Initialize 10x10 matrix to store neighbourhood functions
% of each neurons on the map
neighbourhoodF = zeros(somRow, somCol);

% initial learning rate
etaInitial = 0.1;

% time constant for eta
t2 = N;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Initialization
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Generate random weight vectors [dataRow x 1]
% and assign it to the third dimension of som
for r=1:somRow
    for c=1:somCol
        som(r, c, :) = rand(dataRow, 1);
    end
end

% Initialize iteration count to one
n = 1;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Start of Iterative Training
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% Start of one iterative loop
while n <= N
    
    sigma = sigmaInitial * exp(-n/t1);
    
    variance = sigma^2;
    
    eta = etaInitial * exp(-n/t2);
    
    % Prevent eta from falling below 0.01
    if (eta < 0.01)
        eta = 0.01;
    end
    
    % Randomly generate a column integer between 1 and 1934
    % corresponding to the column index of a training sample
    % in train_data
    i = randi([1,dataCol]);

    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    % Competition Phase
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    % Find the Euclidean Distances from the input vector
    % to all the neurons
    for r=1:somRow
        for c=1:somCol
            v = train_data(:,i) - reshape(som(r,c,:),dataRow,1);
            euclideanD(r,c) = sqrt(v' * v);
        end
    end
    
    % Determine the winner neuron, i.e. the neuron that is
    % the closest to the input vector.
    % winnerRow and winnerCol is the index position of the
    % winner neuron on the SOM Map
    [vector, winnerRowVector]=min(euclideanD,[],1); % 1 stands for 1st dimension, i.e. row
    
    [winnerEuclidean, winnerCol]=min(vector,[],2); % 2 stands for 2nd dimension, i.e. column
    
    winnerRow = winnerRowVector(winnerCol);
    
    %%%%%%% End of Competition Phase %%%%%%%%%
    
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    % Cooperation Phase
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    % Compute the neighborhood function of every neuron
    for r=1:somRow
        for c=1:somCol
            if (r == winnerRow && c == winnerCol)   % Is the winner
                neighbourhoodF(r, c) = 1;
                continue;
            else   % Not the winner
                distance = (winnerRow - r)^2 + (winnerCol - c)^2;
                neighbourhoodF(r, c) = exp(-distance/(2*variance));
            end
        end
    end
    
    %%%%%%% End of Cooperation Phase %%%%%%%%%
    
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    % Adaptation Phase - only the first sub-phase is considered
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    for r=1:somRow
        for c=1:somCol
            oldWeightVector = reshape(som(r, c,:),dataRow,1);
            
            % Update weight vector of neuron
            som(r, c,:) = oldWeightVector + eta*neighbourhoodF(r, c)*(train_data(:,i) - oldWeightVector);
        end
    end
    
    %%%%%%% End of Adaptation Phase %%%%%%%%%
    
    %%%%%%% Draw updated SOM map %%%%%%%%%
    
    f1 = figure(1);
    set(f1,'name',strcat('Iteration #',num2str(n)),'numbertitle','off');
    for r=1:somRow
        for c=1:somCol
            region = 10 * (r - 1) + c;
            subplot(somRow,somCol,region,'align')
            img=reshape(som(r, c, :),32,32)';
            imshow(double(img));
        end
    end
    
    %%%%%%% End of Draw updated SOM map %%%%%%%%%
    
    % Increase iteration count by one
    n = n + 1;
    
end
%%%%%%% End of while loop %%%%%%%%%

%%%%%%% Save the trained SOM %%%%%%%%%
save('trained_som', 'som');

Initialization

for r=1:somRow
    for c=1:somCol
        som(r, c, :) = rand(dataRow, 1);
    end
end

The block of code above will generate arrays of random numbers, whose elements are uniformly distributed in the interval (0,1). for the weight vectors of all neurons on the map. These weight vectors are stored as the third dimension of the som matrix, i.e. som(r, c, Smile | :) . Note that the som matrix represents the SOM map. The actual iterative training shall begin...

Update Parameters

sigma = sigmaInitial * exp(-n/t1);

variance = sigma^2;

eta = etaInitial * exp(-n/t2);

% Prevent eta from falling below 0.01
if (eta < 0.01)
    eta = 0.01;
end

The block of code above will update the time-varying effective width (sigma) and learning rate (eta) at the start of every iteration.

Sampling

i = randi([1,dataCol]);

The line of code above will select a training sample (input vector) randomly from the pool of 1934 samples at the start of every iteration.

Competition

for r=1:somRow
    for c=1:somCol
        v = train_data(:,i) - reshape(som(r,c,:),dataRow,1);
        euclideanD(r,c) = sqrt(v' * v);
    end
end

This block of code above will compute the Euclidean distances from the input vector selected in the preceding code to all the neurons on the map. The resulting Euclidean distances will be stored in a matrix euclideanD. The index positions of these Euclidean distances in euclideanD coincide with those of their corresponding neurons on the som matrix.

[vector, winnerRowVector]=min(euclideanD,[],1);

[winnerEuclidean, winnerCol]=min(vector,[],2);

winnerRow = winnerRowVector(winnerCol);

The block of code above will determine the winner neuron, i.e. the neuron with the minimum Euclidean distance in euclideanD matrix.

Cooperation

for r=1:somRow
    for c=1:somCol
        if (r == winnerRow && c == winnerCol)   % Is the winner
            neighbourhoodF(r, c) = 1;
            continue;
        else   % Not the winner
            distance = (winnerRow - r)^2 + (winnerCol - c)^2;
            neighbourhoodF(r, c) = exp(-distance/(2*variance));
        end
    end
end

The block of code above will compute the neighborhood functions of all the neurons. The resulting neighborhood functions will be stored in a matrix neighbourhoodF. The index positions of these neighborhood functions in neighbourhoodF will coincide with those of their corresponding neurons on the som matrix.

Adaptation

for r=1:somRow
    for c=1:somCol
        oldWeightVector = reshape(som(r, c,:),dataRow,1);

        % Update weight vector of neuron
        som(r, c,:) = oldWeightVector + eta*neighbourhoodF(r, c)*(train_data(:,i) - oldWeightVector);
    end
end

The block of code above will implement the weight updating equation to update the weight vectors of all the neurons on the som matrix.

Seeing is Believing

f1 = figure(1);
set(f1,'name',strcat('Iteration #',num2str(n)),'numbertitle','off');
for r=1:somRow
    for c=1:somCol
        region = 10 * (r - 1) + c;
        subplot(somRow,somCol,region,'align')
        img=reshape(som(r, c, :),32,32)';
        imshow(double(img));
    end
end

The block of code above will display the som matrix as a two dimensional map after the weights updates at the end of every iteration. In this way, we are able to visulaize the progress of the training. Enjoy!

Saving the Fruit of your Labor

save('trained_som', 'som');

At the end of the whole training, i.e. 1000th iterations, the line of code above will save the final som matrix to a file called "trained_som.mat".

Labeling the SOM

We will proceed to label the neurons on the SOM using the class labels of their most similar training samples. See illustrations as shown in Figures 3 and 4.

Figure 3: Before Labeling
  Figure 4: After Labeling

The code to achieve this is contained in this script file called "display_som_with_class_labels.m" as shown. (Forgive me, I'm really very poor at naming. :p). This file is available for donwload.

% ==========================================================
%
%             Self-organizing Map
%       Clustering of Handwritten Digits
%       display_som_with_class_labels.m
%                 Peter Leow
%                10 July 2014
%
% ==========================================================

% clean up the previous act
close all;
clear;  % delete all memory
clc;    % clear windows screen
clf;    % clear figure screen
shg;    % put figure screen on top of all screens

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Draw SOM map
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Load trained_som.mat that consists of trained som of 10x10x1024 matrix
load trained_som;
[somRow, somCol, somWeight] = size(som);
%%%%%%% End of Draw SOM map %%%%%%%%%

f1 = figure(1);
set(f1,'name','Trained SOM','numbertitle','off');
for r=1:somRow
    for c=1:somCol
        region = 10 * (r - 1) + c;
        subplot(somRow,somCol,region,'align')
        img=reshape(som(r, c, :),32,32)';
        imshow(double(img));
    end
end

%%%%%%% End of Draw SOM map %%%%%%%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Determine class label for each neuron on the map
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% Load training_data.mat that consists of train_data
% of 1024x1934 matrix
% Consists of a total of 1934 input samples and
% each sample posseses 1024 attributes (dimenstions)
load training_data;
[dataRow, dataCol] = size(train_data);

% Load training_label.mat that consists of train_label of 1x1934 matrix
% Each element in train_label carries one of the 10 digits
% from 0 to 9, which is the class label for the corresponding input sample
% that has the same column index in train_data
load training_label;

% Matrix to store the column indices of training samples that are
% closest to each neuron on the map
somLabelIndex = zeros(somRow, somCol);

for r=1:somRow
    for c=1:somCol
        
        % distance is an 1x1934 vector that stores the euclidean distance
        % between each training sample and a neuron
        distance = zeros(1, length(train_data(1,:)));
        
        % loop the train _data
        for i=1:length(train_data(1,:))
            v = train_data(:,i) - reshape(som(r,c,:),dataRow,1);
            distance(i) = sqrt(v' * v);
        end
        
        % Find the index in distance matrix of the training sample
        % that is the closest to the neuron
        [value, colIndex]=min(distance,[],2); % 2 stands for 2nd dimension, i.e. column
        
        % Assign colIndex to the somLabelIndex matrix at the same
        % index position as that of the neuron on the SOM map
        somLabelIndex(r, c) = colIndex;
        
    end
end

%%%%%%% Draw SOM map with class labels %%%%%%%%%

f2 = figure(2);
set(f2,'name','Trained SOM with Class Labels','numbertitle','off');

for r=1:somRow
    for c=1:somCol
        region = 10 * (r - 1) + c;
        subplot(somRow,somCol,region,'align')
        set(gca,'xtick',[], 'xticklabel',{},'ytick',[], 'yticklabel',{})
        text(0.5, 0.5, int2str(train_label(1,somLabelIndex(r, c))));
    end
end

%%%%%%% End of Draw SOM map with class labels %%%%%%%%%

Is That All?

Of course not. We can use the labeled SOM to classify new handwritten digits. How to implement it? Feed any new digit into the SOM map, find out the neuron that is the most similar to it based on Euclidean distance, and the class label of that neuron will be the class label of the new digit. Got it? How to cook (Oop, I mean code) ? Well, it is similar to that code we used to label the SOM. Alright, stop, no more questions, I shall leave the rest to you as homework. Poke tongue | ;-P

It has been a long journey, I hope you find it fruitful. Smile | :)

Bibliography

  • Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Peter Leow
Instructor / Trainer
Singapore Singapore
Graduated from the National University of Singapore with the following qualifications:
1. Graduate Diploma in Systems Analysis (2002)
2. Master of Technology (Knowledge Engineering) (2013)
 
Currently, lecturing in the area of social media and web development at a technical institution.
 
Having hibernated for ages, finally woke up to serve the Community in Oct 2013.
Follow on   LinkedIn

Comments and Discussions

 
QuestionHelp Training and Testing PinmemberMember 1109317419-Sep-14 11:07 
QuestionHi Please give me tips PinmemberMember 110482482-Sep-14 1:14 
Questionquestion PinmemberMember 1102475224-Aug-14 1:59 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web04 | 2.8.140926.1 | Last Updated 25 Jul 2014
Article Copyright 2014 by Peter Leow
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid