Click here to Skip to main content
Click here to Skip to main content
Go to top

Self-organizing Map Demystified

, 25 Jul 2014
Rate this:
Please Sign up or sign in to vote.
Discover the concept, architecture, and algorithm of self-organizing map.

Introduction

Whether we are at work, walking down a busy street, or shopping in a market place, we meet people from all walks of life. Through our senses, we can readily identify different people and group them into similar classes based on common features like the audible characteristics of their languages, or their visual appearances. We would then start to label each group on the basis of nationality, culture, or any suitable names depending on circumstances. This human capability of identifying similarities and distinguishing differences based solely on the input patterns perceived forms the core of our human nature. However, this is not without limitations. One obvious limitation is that we can only see and comprehend in two or at most three dimensions (attributes). Anything higher than three dimensions will require the help of some machine learning techniques. In this article, we will discuss one of them - Self-Organizing Map (SOM).

Overview

This Self-Organizing Map (SOM) was introduced as an unsupervised competitive learning algorithm of the artificial neural networks by Finnish Professor Teuvo Kohonen in the early 1980s, and as such is also called a Kohonen map. The development of SOM was microbiological motivated by the way different sensory inputs such as motor, visual, and acoustic inputs are mapped onto corresponding areas of the cerebral cortex in a topographically ordered computational map (Figure 1).

Figure 1: Cerebral Cortex

(From Wikipedia, the free encyclopedia)

A SOM is made of a network of neurons that is usually one or two dimensions. At the beginning, the neurons will be initiated with random weight vectors that are of the same size as the dimension (number of features) of the input sample. Input samples from the input dataset will take turn to stimulate the neurons and cause their weight vectors to be tuned in such a way the most stimulated (winning) neuron and its neighbors are being tuned more so that they become more like the stimulating input sample. . At the same time, the tuning effects for those neurons that neighbour the winning neuron gradually decrease inversely to their topologically distance from the winning neuron. Gradually, this network of neurons will progress from an initially unordered map to a stable topologically ordered map of clusters of neurons. Each cluster will contain neurons that belong to a particular class of the input dataset - does it remind you of the idiom that says "Birds of a feather flock together"? All these take place in a self-organizing fashion without any external help, i.e. unsupervised. Watch this video on self-organization of digits in a digits recognition project.

SOM algorithm is one of the most powerful algorithms in data visualization and exploration and is widely used in clustering analysis in data mining. Its ability to map high-dimensional input vectors onto a two-dimensional grid of prototype vectors and orders them topologically greatly facilitates the interpretation by naked human eyes.

SOM Architecture

A typical SOM architecture consists of two layers - the Input layout and the Computational layer (Map) (Figure 2).

Figure 2: Two-Dimensional SOM Architecture

The input layer consists of source nodes representing the various features (also known as attributes) of an input sample. For example, the input layout may consists of three source nodes representing the color values of R, G, and B respectively of a screen pixel, that makes it a three-dimensional input sample. There are as many nodes as there are number of features (dimensions) in the input layout, they are represented in the form of an input vector, i.e. X = [x1, x2, ..., xd] for an input sample where d is the number of dimensions.

On the other hand, the computational layer (Map) consists of neurons placed at the nodes of a one or two dimensional lattice of size R x C. (Figure 2 shows a 4 x 4 lattice.) This map is where topological transformation and visualization of patterns occur. Each neuron is identified by its index position, i.e. j, on the map and associated with a weight vector, i.e. Wj = {wji : j = 1, ..., n; i = 1, ..., d}, the size of which coincides with the dimension of the input vector. That is, if d is the dimension of the input space, then every neuron on the map carries a d-dimensional vector of weights.

All the source nodes in the input layer are fully connected to all the neurons on the map in a feed forward manner.

We are now ready to explore the algorithm of SOM.

Self-organization Algorithm

Let's get an overview of the self-organization algorithm first and leave the details to subsequent sections. The algorithm proceeds iteratively through the following phases:

  1. Initialization: Randomly initialize the weights vectors of all the neurons on the map, e.g. the neuron j in the map can be expressed as Wj = {wji : j = 1, ..., n; i = 1, ..., d}, where n is the total number of neurons and d is the number of dimensions of the input vector.

  2. Sampling: Select an input sample from the training dataset, e.g. the input sample X can be expressed as X = {xi : i = 1, ..., d}.

  3. Competition: Compare the vector of the input sample with the weight vectors of individual neurons to find the most similar neuron which will become the winner.

  4. Cooperation: Compute the influence that the winner neuron has over its neighboring neurons.

  5. Adaptation: Update the weight vector of the winner neuron and those of its neighbors so that they become more like the input vector.

  6. Iteration: Repeat phase 2 with another input vector until the map has converged (stabilized), i.e. no noticeable changes in the weight vectors or a pre-defined number of iteration has reached.

A self-organization map will take many iterations to converge. We will now walk through the different phases in details one by one.

Initialization

At start, all the neurons on the map are initialized with random weight vectors of small values. There are different techniques of doing this.

One way is to use random values that are completely independent of the training dataset. In this way, as the map has no knowledge of the input data, the self-organization will take significantly more training cycles to converge.

Another technique is to pick random samples from the training dataset, and assign their values to the weight vectors. In this way, as the map is already inside a subset of the input space, it will reduce the number of training cycles needed to converge. However, since the choice of input samples used for the initialization is random and the number of neurons on the map is much smaller than the size of the training dataset, the initial map is unlikely to be truly representative of the dataset.

As to which technique is better, this can only be answered by trying out them in separate trials and compare their outcomes.

Once the map is initialized, it is set for an exciting iterative process of automated self-organizational evolution.

Sampling

At every iteration, choose an input sample X from the training dataset. One way is to choose input samples randomly. Another way is to choose them in a deterministic sequence. The random method will introduce "element of surprise" and "uncertainty" into the self-organizing process which could result in better result as compared to the deterministic method. But the trade-offs would be longer convergence time and more complicated implementation.

Competition

The chosen input sample X from the training dataset will begin to find its best-matching neuron from the neurons on the map. All the neurons on the map will compete to be the winner denoted as c(X) which is the most similar to the input sample X by:

  • Computing the Euclidean distances from the input sample to all the respective neurons on the map, i.e.

  • Identifying the neuron that has the smallest Euclidean distance to be the winner, i.e.

Cooperation

In this phase, the winner of the competition will exert its influence over its neighboring neurons.

In neurobiological studies, it is discovered that a neuron that fires tends to excite the neurons in its immediate neighborhood more than those farther away from it. It is observed that:

  • The topological neighborhood is symmetric about the firing neuron (the winner neuron in the case of SOM).

  • The degree of excitement decreases monotonically with increasing lateral distance.

This phenomenon is best illustrated in Figure 3 as shown:

Figure 3: Neighborhood Effect of Winner Neuron

Does the shape look familiar to you? Yes! it is the Gaussian function. We can express this phenomenon in a Gaussian function like this:

where

is called the neighborhood function, a measure of the influence (degree of excitement) of the winner neuron c(X) on neuron j.

is the Euclidean distance from neuron j to the winner neuron c(X).

is the standard deviation that affects the effective width of the topological neighborhood.

When the neuron is away from the winner neuron, its effectiveness is 0.61, as compared to 0.14 when it is 2 away. Its effectiveness is at its maximum value of 1 at ground zero, that is where the winner neuron is situated. In this way, the neighborhood function can mimic the neurobiological phenomenon of lateral interaction within a set of excited neurons in our brain.

In addition, SOM algorithm has added a unique feature in which the size of the effective width shrinks with time. This can be achieved by introducing a time-varying component like this:

where

is the initial effective width.

is the time constant

is the total number of iterations in the first phase of the adaptation process.

The effect of this time-varying feature is illustrated in Figure 4 as shown:

Figure 4: Time-varying Neighborhood of a Winner Neuron

This results in a time-varying neighborhood function as shown:

In this way, the influence of the winner neuron on its neighbors decreases with time.

Adaptation

Once the winner is found and its neighborhood functions are computed at each iteration, the topological change of the map will now take place through adaption of weights. Specifically, the adaption phase is divided into two sub-phases - an ordering or self-organizing phase followed by a convergence phase. We will take an overview of the whole adaptation first and then proceed to the respective sub-phases for specific details.

Overview

Generally, the weight vector of the winner c(X) is updated so that its new weight vector moves closer to the input vector. The weight vectors of all its neighbors are also updated in a similar manner, but the amount of update decreases inversely to the distance from the winner. Upon repeated feeding of the training data followed by updating of weight vectors of winners and their neighborhood, the weight vectors of the neurons gradually follow the distribution of the input vectors. This will eventually leads to a topological ordering of the map where neurons that are adjacent in the lattice tend to have similar weight vectors, a manifestation of "Birds of a feather flock together".

The weight updating equation is expressed as such:

where

is the neighborhood function between neuron j and the winner neuron c(X) at nth iteration.

is the time-varying learning rate at nth iteration. The learning rate serves to moderate the learning step of each iteration. It will decrease with time so as to facilitate the convergence of the map. The learning rate at nth iteration is computed as such:

where

is the initial learning rate.

is a time constant which is set to the total number of iterations in the first phase of the adaptation process, i.e. .

Ordering or Self-organizing phase

The ordering phase may take as many as 1000 iterations, i.e. = 1000.

The learning rate should begin with a value of close to 0.1, i.e. = 0.1, and gradually decrease, but remain above 0.01. Too wide a learning rate at the start will cause the learning to swing widely from one iteration to the next, virtually un-doing the previous learning, thus rendering it ineffective. On the contrary, too small a rate will unnecessarily prolong the convergence time. A properly selected learning rate should allow bigger updates at the earlier learning stages, followed by gradually slower updates nearing the end, and finally settle down to a stable state.

The neighborhood function should initially include as many neurons as possible and then shrink with time. This can be achieved by setting the initial effective width equal to the radius of the lattice. For example, if the size of the two dimensional lattice is R x C, then the initial effective width can be set as:

At the end of the first phase, the winner can only affects its immediate neighbors.

Convergence phase

This phase is introduced to fine tune the quality of the ordered map derived from the first phase of adaptation with the aim to provide a more accurate statistical quantification of the input dataset. As a general rule, the number of iterations in this phase must be at least 500 times the number of neurons available on the map. In this phase, the learning rate should be maintained at a small constant value like 0.01, and the neighborhood function should contain only the immediate neighbors of the winner neuron, which may eventually reduce to zero. However, the convergence phase may not be needed in applications where convergence of the parameters is not critical.

How to Use the Map?

After training the SOM, what can we do with it? How can we use it? In order to answer these questions, we must find out what each neuron on the map represents. There are many possible ways to interpret the neurons. We may be able to assign label or class to the respective neurons based on their visual presentation on the map, such as images of digits in the case of digits recognition SOM (as shown in Figure 6). Another way is simply to find out which input sample in the training dataset stimulates the particular neuron the most based on the smallest Euclidean distance and then mark that neuron as representing this input sample. In more complicated cases, domain experts can be engaged to interpret the map and assign meaning to each neuron.

Figure 5: Before Training
  Figure 6: After Training

The resulting map is called the 'contextual map" or "semantic map" as it now carries "meanings". This contextual map (such as that in Figure 6) can then be used to classify new data based on the best-matching neuron found on the map using some similarity measure like the Euclidean distance.

What's Next?

You have learned the concept, architecture, and algorithm of SOM. I shall follow up with another article to showcase an implementation of a mini SOM that can be used to cluster and classify digits. So stayed tuned.  

For now, you can make do with this trailer while I take a breather. Java | [Coffee]

News Flash: The follow-up article Self-organizing Map Implementation has been released as of 10 Jul 2014.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Peter Leow
Instructor / Trainer
Singapore Singapore
Graduated from the National University of Singapore with the following qualifications:
1. Graduate Diploma in Systems Analysis (2002)
2. Master of Technology (Knowledge Engineering) (2013)
 
Currently, lecturing in the area of social media and web development at a technical institution.
 
Having hibernated for ages, finally woke up to serve the Community in Oct 2013.
Follow on   LinkedIn

Comments and Discussions

 
GeneralMy vote of 5 PinprofessionalCatchExAs14-Jul-14 9:03 
GeneralMy vote of 4 PinmemberWooters7-Jul-14 17:27 
AnswerRe: My vote of 4 PinprofessionalPeter Leow10-Jul-14 4:43 
GeneralRe: My vote of 4 PinmemberWooters10-Jul-14 17:54 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web01 | 2.8.140916.1 | Last Updated 25 Jul 2014
Article Copyright 2014 by Peter Leow
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid