Introduction
Whether we are at work, walking down a busy street, or shopping in a market place, we meet people from all walks of life. Through our senses, we can readily identify different people and group them into similar classes based on common features like the audible characteristics of their languages, or their visual appearances. We would then start to label each group on the basis of nationality, culture, or any suitable names depending on circumstances. This human capability of identifying similarities and distinguishing differences based solely on the input patterns perceived forms the core of our human nature. However, this is not without limitations. One obvious limitation is that we can only see and comprehend in two or at most three dimensions (attributes). Anything higher than three dimensions will require the help of some machine learning techniques. In this article, we will discuss one of them  SelfOrganizing Map (SOM).
Overview
This SelfOrganizing Map (SOM) was introduced as an unsupervised competitive learning algorithm of the artificial neural networks by Finnish Professor Teuvo Kohonen in the early 1980s, and as such is also called a Kohonen map. The development of SOM was microbiological motivated by the way different sensory inputs such as motor, visual, and acoustic inputs are mapped onto corresponding areas of the cerebral cortex in a topographically ordered computational map (Figure 1).


A SOM is made of a network of neurons that is usually one or two dimensions. At the beginning, the neurons will be initiated with random weight vectors that are of the same size as the dimension (number of features) of the input sample. Input samples from the input dataset will take turn to stimulate the neurons and cause their weight vectors to be tuned in such a way the most stimulated (winning) neuron and its neighbors are being tuned more so that they become more like the stimulating input sample. . At the same time, the tuning effects for those neurons that neighbour the winning neuron gradually decrease inversely to their topologically distance from the winning neuron. Gradually, this network of neurons will progress from an initially unordered map to a stable topologically ordered map of clusters of neurons. Each cluster will contain neurons that belong to a particular class of the input dataset  does it remind you of the idiom that says "Birds of a feather flock together"? All these take place in a selforganizing fashion without any external help, i.e. unsupervised. Watch this video on selforganization of digits in a digits recognition project.
SOM algorithm is one of the most powerful algorithms in data visualization and exploration and is widely used in clustering analysis in data mining. Its ability to map highdimensional input vectors onto a twodimensional grid of prototype vectors and orders them topologically greatly facilitates the interpretation by naked human eyes.
SOM Architecture
A typical SOM architecture consists of two layers  the Input layout and the Computational layer (Map) (Figure 2).

Figure 2: TwoDimensional SOM Architecture

The input layer consists of source nodes representing the various features (also known as attributes) of an input sample. For example, the input layout may consists of three source nodes representing the color values of R, G, and B respectively of a screen pixel, that makes it a threedimensional input sample. There are as many nodes as there are number of features (dimensions) in the input layout, they are represented in the form of an input vector, i.e. X = [x_{1}, x_{2}, ..., x_{d}] for an input sample where d is the number of dimensions.
On the other hand, the computational layer (Map) consists of neurons placed at the nodes of a one or two dimensional lattice of size R x C. (Figure 2 shows a 4 x 4 lattice.) This map is where topological transformation and visualization of patterns occur. Each neuron is identified by its index position, i.e. j, on the map and associated with a weight vector, i.e. W_{j} = {w_{ji} : j = 1, ..., n; i = 1, ..., d}, the size of which coincides with the dimension of the input vector. That is, if d is the dimension of the input space, then every neuron on the map carries a ddimensional vector of weights.
All the source nodes in the input layer are fully connected to all the neurons on the map in a feed forward manner.
We are now ready to explore the algorithm of SOM.
Selforganization Algorithm
Let's get an overview of the selforganization algorithm first and leave the details to subsequent sections. The algorithm proceeds iteratively through the following phases:

Initialization: Randomly initialize the weights vectors of all the neurons on the map, e.g. the neuron j in the map can be expressed as W_{j} = {w_{ji} : j = 1, ..., n; i = 1, ..., d}, where n is the total number of neurons and d is the number of dimensions of the input vector.

Sampling: Select an input sample from the training dataset, e.g. the input sample X can be expressed as X = {x_{i} : i = 1, ..., d}.

Competition: Compare the vector of the input sample with the weight vectors of individual neurons to find the most similar neuron which will become the winner.

Cooperation: Compute the influence that the winner neuron has over its neighboring neurons.

Adaptation: Update the weight vector of the winner neuron and those of its neighbors so that they become more like the input vector.
 Iteration: Repeat phase 2 with another input vector until the map has converged (stabilized), i.e. no noticeable changes in the weight vectors or a predefined number of iteration has reached.
A selforganization map will take many iterations to converge. We will now walk through the different phases in details one by one.
Initialization
At start, all the neurons on the map are initialized with random weight vectors of small values. There are different techniques of doing this.
One way is to use random values that are completely independent of the training dataset. In this way, as the map has no knowledge of the input data, the selforganization will take significantly more training cycles to converge.
Another technique is to pick random samples from the training dataset, and assign their values to the weight vectors. In this way, as the map is already inside a subset of the input space, it will reduce the number of training cycles needed to converge. However, since the choice of input samples used for the initialization is random and the number of neurons on the map is much smaller than the size of the training dataset, the initial map is unlikely to be truly representative of the dataset.
As to which technique is better, this can only be answered by trying out them in separate trials and compare their outcomes.
Once the map is initialized, it is set for an exciting iterative process of automated selforganizational evolution.
Sampling
At every iteration, choose an input sample X from the training dataset. One way is to choose input samples randomly. Another way is to choose them in a deterministic sequence. The random method will introduce "element of surprise" and "uncertainty" into the selforganizing process which could result in better result as compared to the deterministic method. But the tradeoffs would be longer convergence time and more complicated implementation.
Competition
The chosen input sample X from the training dataset will begin to find its bestmatching neuron from the neurons on the map. All the neurons on the map will compete to be the winner denoted as c(X) which is the most similar to the input sample X by:
 Computing the Euclidean distances from the input sample to all the respective neurons on the map, i.e.
 Identifying the neuron that has the smallest Euclidean distance to be the winner, i.e.
Cooperation
In this phase, the winner of the competition will exert its influence over its neighboring neurons.
In neurobiological studies, it is discovered that a neuron that fires tends to excite the neurons in its immediate neighborhood more than those farther away from it. It is observed that:
This phenomenon is best illustrated in Figure 3 as shown:

Figure 3: Neighborhood Effect of Winner Neuron

Does the shape look familiar to you? Yes! it is the Gaussian function. We can express this phenomenon in a Gaussian function like this:
where
is called the neighborhood function, a measure of the influence (degree of excitement) of the winner neuron c(X) on neuron j.
is the Euclidean distance from neuron j to the winner neuron c(X).
is the standard deviation that affects the effective width of the topological neighborhood.
When the neuron is away from the winner neuron, its effectiveness is 0.61, as compared to 0.14 when it is 2 away. Its effectiveness is at its maximum value of 1 at ground zero, that is where the winner neuron is situated. In this way, the neighborhood function can mimic the neurobiological phenomenon of lateral interaction within a set of excited neurons in our brain.
In addition, SOM algorithm has added a unique feature in which the size of the effective width shrinks with time. This can be achieved by introducing a timevarying component like this:
where
is the initial effective width.
is the time constant
is the total number of iterations in the first phase of the adaptation process.
The effect of this timevarying feature is illustrated in Figure 4 as shown:

Figure 4: Timevarying Neighborhood of a Winner Neuron

This results in a timevarying neighborhood function as shown:
In this way, the influence of the winner neuron on its neighbors decreases with time.
Adaptation
Once the winner is found and its neighborhood functions are computed at each iteration, the topological change of the map will now take place through adaption of weights. Specifically, the adaption phase is divided into two subphases  an ordering or selforganizing phase followed by a convergence phase. We will take an overview of the whole adaptation first and then proceed to the respective subphases for specific details.
Overview
Generally, the weight vector of the winner c(X) is updated so that its new weight vector moves closer to the input vector. The weight vectors of all its neighbors are also updated in a similar manner, but the amount of update decreases inversely to the distance from the winner. Upon repeated feeding of the training data followed by updating of weight vectors of winners and their neighborhood, the weight vectors of the neurons gradually follow the distribution of the input vectors. This will eventually leads to a topological ordering of the map where neurons that are adjacent in the lattice tend to have similar weight vectors, a manifestation of "Birds of a feather flock together".
The weight updating equation is expressed as such:
where
is the neighborhood function between neuron j and the winner neuron c(X) at n^{th} iteration.
is the timevarying learning rate at n^{th} iteration. The learning rate serves to moderate the learning step of each iteration. It will decrease with time so as to facilitate the convergence of the map. The learning rate at n^{th} iteration is computed as such:
where
is the initial learning rate.
is a time constant which is set to the total number of iterations in the first phase of the adaptation process, i.e. .
Ordering or Selforganizing phase
The ordering phase may take as many as 1000 iterations, i.e. = 1000.
The learning rate should begin with a value of close to 0.1, i.e. = 0.1, and gradually decrease, but remain above 0.01. Too wide a learning rate at the start will cause the learning to swing widely from one iteration to the next, virtually undoing the previous learning, thus rendering it ineffective. On the contrary, too small a rate will unnecessarily prolong the convergence time. A properly selected learning rate should allow bigger updates at the earlier learning stages, followed by gradually slower updates nearing the end, and finally settle down to a stable state.
The neighborhood function should initially include as many neurons as possible and then shrink with time. This can be achieved by setting the initial effective width equal to the radius of the lattice. For example, if the size of the two dimensional lattice is R x C, then the initial effective width can be set as:
At the end of the first phase, the winner can only affects its immediate neighbors.
Convergence phase
This phase is introduced to fine tune the quality of the ordered map derived from the first phase of adaptation with the aim to provide a more accurate statistical quantification of the input dataset. As a general rule, the number of iterations in this phase must be at least 500 times the number of neurons available on the map. In this phase, the learning rate should be maintained at a small constant value like 0.01, and the neighborhood function should contain only the immediate neighbors of the winner neuron, which may eventually reduce to zero. However, the convergence phase may not be needed in applications where convergence of the parameters is not critical.
How to Use the Map?
After training the SOM, what can we do with it? How can we use it? In order to answer these questions, we must find out what each neuron on the map represents. There are many possible ways to interpret the neurons. We may be able to assign label or class to the respective neurons based on their visual presentation on the map, such as images of digits in the case of digits recognition SOM (as shown in Figure 6). Another way is simply to find out which input sample in the training dataset stimulates the particular neuron the most based on the smallest Euclidean distance and then mark that neuron as representing this input sample. In more complicated cases, domain experts can be engaged to interpret the map and assign meaning to each neuron.
The resulting map is called the 'contextual map" or "semantic map" as it now carries "meanings". This contextual map (such as that in Figure 6) can then be used to classify new data based on the bestmatching neuron found on the map using some similarity measure like the Euclidean distance.
What's Next?
You have learned the concept, architecture, and algorithm of SOM. I shall follow up with another article to showcase an implementation of a mini SOM that can be used to cluster and classify digits. So stayed tuned.
For now, you can make do with this trailer while I take a breather.
News Flash: The followup article Selforganizing Map Implementation has been released as of 10 Jul 2014.