This is the fifth in a series of articles demonstrating how to build a .NET AI library from scratch. In this article, we will learn about Artificial Neural Networks. We will go through a couple of most common types of ANN where classification is mainly based on Architecture or layout of ANN.
Here are the links for previous articles in the series:
My objective is to create a simple AI library that covers a couple of advanced AI topics such as Genetic algorithms, ANN, Fuzzy logics and other evolutionary algorithms. The only challenge to complete this series would be having enough time to work on the code and articles.
Having the code itself might not be the main target however, understanding these algorithms is. Wish it will be useful to someone someday.
Please feel free to comment and ask for any clarifications or hopefully suggest better approaches.
Article Introduction - Part 5 "ANN"
The inventor of the first neurocomputer, Dr. Robert Hecht-Nielsen, defines a neural network as:
"...A Computing System Made Up of a Number of Simple, Highly Interconnected Processing Elements, Which Process Information by Their Dynamic State Response to External inputs.”
Early studies of ANN began back in 1940.
In the last article, we reached the most common layout of ANN, consisting of Input layer, Hidden layer(s) and Output layer. Again, this is all to imitate human brain intelligence. The average human brain has about 100 billion neurons (or nerve cells).
The way that individual artificial neurons are interconnected is called topology, architecture, layout or graph of an artificial neural network and from this prospective, many types of ANN have been developed over years but all types share a common thing which, all are based on multiple Perceptrons or Neurons arranged in layers and connected in layers.
Let's go through a couple of most common types of ANN where classification is mainly based on Architecture or layout of ANN.
Types of ANN
This is MLP ANN with data flow from input layer to output layer:
FF ANN can have multiple hidden layers, in case of multiplier hidden layers, it is called Deep Neural network.
Each layer normally works on a certain set of features. And overall ANN could be considered as Optimization function or classifier. The applications are unlimited.
For training, FF ANN is used with supervised & unsupervised learning types of problems and training algorithms are many. However, the most common type is Backpropagation Algorithm.
I am sure that the next article will be dedicated just to FF ANN, so we will save further details for later.
1.1 - Self-Organizing Map (SOM)
SOM is a feed-forward ANN but it is fundamentally different in arrangement of neurons and motivation.
SOM uses unsupervised learning, in particular competitive learning, in which the output neurons compete amongst themselves to be activated, with the result that only one is activated at any one time.
This activated neuron is called the winning neuron. Such competition can be implemented by having lateral inhibition connections (negative feedback paths) between the neurons. The result is that the neurons are forced to organize themselves. Hence, such a network is called a Self Organizing Map (SOM).
SOM are used to produce a map of low-dimensional (typically two-dimensional), discretized representation of the input space of the training samples, so it can be considered as a method to do dimensionality reduction.
1.2 - Radial Basis Function Network (RBF)
Radial functions are a special class of functions with the basic feature that their response decreases or increases monotonically (bell shape) with distance from a central point. Examples of radial functions are Gaussian
functions as below:
Radial Basis Function Network (RBF) is an ANN that uses radial basis functions as their activation functions.
2. Recurrent ANN (RNN)
So far, during our discussion for FF ANN, we have assumed that each input set is independent from other inputs, what if they are not and there is some kind of relation between input sets.
For example, consider building ANN for speech-to-text application, mostly ANN will not be able to instantaneously translate speech word by word. Instead, it shall wait for a full sentence to be able to provide as much accurate translation as possible. This is how human translators do.
So, ANN should be able to somehow keep track of not only current input but other past inputs as well. In other words, it shall have internal memory to process this kind of input. This is a sequential kind of input.
So what is memory? Memory is the ability to record past information and use it during current time processing. One way to impose memory is to have a delay function with specific capacity of time span.
So let's add delay to a single Perceptron by simply adding self-interconnection to Neuron.
Not only each Neuron has a self memory, but it can have interconnections from other Neurons on the same layer.
Building full network based on the above layout is called Recurrent NN (RNN).
RNNs are used for sequential inputs and hence applications could be Speech to text, Language Modeling and generating text and such.
Over time, many variations of RNN have been introduced which are special cases of RNN, below are couple of common types.
2.1 - Hopfield Network
Let's first examine Human memory. Human memory works in an associative or content-addressable way as there is no location in the neural network in the brain for a particular memory say of an individual.
Rather, the memory of the individual is retrieved by a string of associations about the physical features and/or personality characteristics and social relations of that individual, which are dealt with by different parts of the brain.
Using advanced imaging technique, a sophisticated pattern of activation of various neural regions is observed
in the process of recalling an individual.
Human beings are also able to fully recall a memory by first remembering only particular aspects or features of that memory.
Back to1982, John Hopfield introduced a special case of RNN that stores and retrieves memory like the human brain. Where, a neuron either is ON (firing) or is OFF (not firing).
The state of a neuron (on: +1 or off: -1) will be renewed depending on the input it receives from other neurons.
A Hopfield network is initially trained to store a number of patterns or memories. It is then able to recognize any of the learned patterns by exposure to only partial or even some corrupted information about that pattern, i.e., it eventually settles down and returns the closest pattern or the best guess.
A Hopfield network is single-layered, the neurons are fully connected, i.e., every neuron is connected to every other neuron and there are no self-connections. Also, all weights are symmetrical (Given two neurons,
Wij = Wji).
Here are 3 Neurons Hopfield NN (HNN):
The activation function used with Hopfield NN is Sign function in the form of:
Or from 0 to 1:
Sj is the state of unit j
(Theta)i is the threshold of unit i
There are two ways to update the Neurons' weights:
- Asynchronously - At each point in time, update one node chosen randomly or according to some rule, this is more biologically realistic.
- Synchronously - Every time, update all nodes together.
When training HNN, the term Energy function is used, you may consider Energy function as cost function but as function of network states (cost function is function of weights). I am sure, I will have a separate article about HNN, hence I will stop here.
2.2 - Elman Networks and Jordan Networks
Elman network AKA Simple Recurrent Network is a special case of RNN. The first hidden layer has a recurrent connection. It is a simple three-layer RNN that has back-loop from hidden layer to input layer through so called context layer or context units. weights between hidden layer and context layer are 1. This type of RNN has memory that allows it to both detect and generate time-varying patterns.
A simplified diagram for Elman network is shown below:
And here is one full diagram for Elman RNN:
The Elman artificial neural network has typically sigmoid artificial neurons in its hidden layer, and linear artificial neurons in its output layer.
At each time step, the input is fed-forward and then a learning rule is applied. The fixed back connections save a copy of the previous values of the hidden units in the context units (since they propagate over the connections before the learning rule is applied). Thus the network can maintain a sort of state, allowing it to perform such tasks as sequence-prediction that are beyond the power of a standard multilayer Perceptron.
Jordan network is similar to Elman network. The only difference is that context units are fed from the output layer instead of the hidden layer.
And here is the full sample network:
2.3 - Long Short Term Memory (LSTM)
Long Short Term Memory (LSTM) resolves one limitation RNN has which is, it keeps information for short term. what about long term dependencies?
LSTM is one of the recurrent ANN topologies. In contrast with basic RNN, it can learn from its experience to process, classify and predict time series with very long time lags of unknown size between important events.
LSTM ANN is built from Long Short Term Memory blocks that are capable of remembering value for any length of time. This is achieved with gates that determine when the input is significant enough remembering it, when to continue to remember or forget it, and when to output the value.
The memory in LSTMs are called cells
and you can think of them as blocks that take as input, the previous state h(t-1) and current input x1. Internally, these cells decide what to keep in (and what to erase from) memory. Then combine the previous state, the current memory, and the input.
LSTM might look confusing however, it can be implemented and will have separate discussion later. This is just an introduction.
2.4 - Bi-directional RNN (Bi-ANN)
Bi-ANNs are based on the idea that the output at time t may not only depend on the past elements in the sequence, but also future elements.
For example, to predict a missing word in a sequence, you want to look at both the left and the right context.
Bidirectional RNNs are quite simple and designed to predict complex time series. They consist of two individual interconnected ANN sub-networks that performs direct and inverse (bidirectional) transformation. Both networks are stacked on top of each other.
Interconnection of artificial neural sub-networks is done through two dynamic artificial neurons that are capable of remembering their internal states. This type of interconnection between future and past values of the processed signals increase time series prediction capabilities. As such, these ANNs not only predict future values of input data but also past values.
That brings need for two phase learning, in first phase, we teach one ANN sub-network for predicting future and in the second phase, we teach the second ANN sub-network for predicting past.
Here is a sample layout:
Points of Interest
Still many types of ANN are out there, however the above list is most common. Hope this article provided a fair introduction to ANN and its types which will be referenced in future articles.
A nice infograph (from the internet but cannot recall the source) is attached to this article that has a nice graphical presentation for many ANN.
Next, I will start coding to implement some of the most common ANNs.
- 18th September, 2017: Initial version