The research described in this paper focuses on the presentation of word recognition technique for an online handwriting recognition system which uses multiple component neural networks (MCNN) as the exchangeable parts of the classifier. As the most of recent approaches, the system proceeds by segmenting handwriting words into smaller pieces (usually characters) which are recognized separately. The recognition results are then the composition of the individually recognized parts. They are sent to the input of a word recognition module in turn to choose the best one by applying some dictionary search algorithms. The proposed classifier overcomes obstacles and difficulties of traditional ones to big character classes. Furthermore, the proposed classifier also has expandable capacity which can recognize another character classes by adding or changing component networks and built-in dictionaries dynamically.
Now a day, touch user interfaces (TUI) are becoming increasingly popular and will play an important role in human-computer interaction. Tablets, smartphones and TUI computers accepting finger or pen based input are becoming an indispensable part of many persons. Using fingers or a pen as an input device takes over many functions of conventional mouse and keyboard. One major advantage of the pen over the mouse is the fact that a pen is a natural writing tool while the mouse is very cumbersome when used as a writing tool. However, it needs a reliable transformation of handwritten text into a coding that can be directly processed by a computer, e.g., ASCII. A traditional transformation model usually includes a preprocessor which extracts each word from image or input screen and divides it into segments. A neural network classifier then finds the likelihoods of each possible character class given the segments. These likelihoods are used as the input to a special algorithm which recognizes the entire word. In recent years, research in handwriting recognition has advanced to a level that makes commercial applications. Nevertheless, significant disadvantages of such single neural network classifiers are complexity in big network organization and expandable capacity.
A high reliable recognition rate neural network can be built easy to recognize a small character class but not to big ones. The larger inputs and outputs make increasing of the neural network’s layers, neurons, connections. Hence, it makes more difficulties to network training process and especially the recognition rate should be significantly decreased. Furthermore, a single neural network classifier only works to a particular character class. It is not exchangeable and or expandable to recognize additional character classes without recreating or retraining the neural network.
This paper presents a new online handwriting recognition system that based on multiple convolutional neural networks (CNNs). Unlike the traditional single neural network classifiers, the new one includes a collection of very high recognition rate component CNNs that work together. Each CNN only recognize correctly to a part of the big character class (digits, alphabet, etc.), but when these networks are combined by programing algorithms they can create a flexible classifier which can recognize differential big character classes by simply adding or removing component CNNs and language dictionaries.
Convolution neural network
Convolutional Neural Networks (CNNs) are a special kind of multi-layer neural networks. Like almost every other neural networks they are trained with a version of the back-propagation algorithm. Where they differ is in the architecture. Convolutional Neural Networks are designed to recognize visual patterns directly from pixel images with minimal preprocessing. They can recognize patterns with extreme variability (such as handwritten characters), and with robustness to distortions and simple geometric transformations.
Fig. 1. A Typical Convolutional Neural Network (LeNET 5)
The convolutional neural network LeNET 5 for handwritten digit recognition has granted reliable recognition rate up to 99% to MNIST dataset. The input layer is of size 32 x32 and receives the gray-level image containing the digit to recognize. The pixel intensities are normalized between −1 and +1. The first hidden layer C1 consists six feature maps each having 25 weights, constituting a 5x5 trainable kernel, and a bias. The values of the feature map are computed by convolving the input layer with respective kernel and applying an activation function to get the results. All values of the feature map are constrained to share the same trainable kernel or the same weights values. Because of the border effects, the feature maps’ size is 28x28, smaller than the input layer.
Each convolution layer is followed by a sub-sampling layer which reduces the dimension of the respective convolution layer’s feature maps by factor two. Hence the sub-sampling maps of the hidden layer S2 are of size 14x14. Similarly, layer C3 has 16 convolution maps of size 10x10 and layer S4 has 16 sub-sampling maps of size 5x5. The functions are implemented exactly as same as the layer C1 and S2 perform. The S4 layer’s feature maps are of size 5x5 which is too small for a third convolution layer. The C1 to S4 layers of this neural network can be viewed as a trainable feature extractor. Then, a trainable classifier is added to the feature extractor, in the form of 3 fully connected layers (a universal classifier).
Fig. 2. A convolution network based on Dr. Partrice Simard’s model