Click here to Skip to main content
Click here to Skip to main content

Neural Network for Recognition of Handwritten Digits in C#

By , 14 Mar 2012
 

Introduction

This article is another example of an artificial neural network designed to recognize handwritten digits based on the brilliant article Neural Network for Recognition of Handwritten Digits by Mike O'Neill. Although many systems and classification algorithms have been proposed in the past years, handwriting recognition has always been a challenging task in pattern recognition. Mike O’Neill’s program is an excellent demo to programmers who want to study on neuron network for pattern recognition in general, and especially on convolution neural network. The program has been written in MFC/C++ model, which is a little bit difficult to someone who is not familiar with it. So, I decided to rewrite it in C# with some of my experiments. My program has achieved some results nearly reaching to that of the original, but it is still not optimized enough (convergence speed, error rate, etc.). It is just initial code which simply gets the job done and helps to understand the network, so it is really confusing and needs to be reconstructed. I have been trying to rebuild it as a library so it would be flexible and is easy to change parameters through an INI file. Hopefully, I can get the expected result someday.

Character Detection

Pattern detection or character candidate detection is one of the most importance problems I had to face in my program. In fact, I did not only want to simply rewrite Mike’s program in another language but I also wanted to recognize characters in a document picture. There are some researches that have proposed very good algorithms for object detection that I found in the Internet, but they are too complicated for a free-time project like my own. A small algorithm I found when teaching my daughter drawing solved my problem. Of course, it still has limitations, but it exceeded my expectations in the first test. In the normal, character candidate detection is divided to row detection, word detection and character detection with separate and different algorithms. My approach is little bit different. Detections used same algorithm with functions:

public static Rectangle GetPatternRectangeBoundary
	(Bitmap original,int colorIndex, int hStep, int vStep, bool bTopStart) 

and:

public static List<Rectangle> PatternRectangeBoundaryList
	(Bitmap original, int colorIndex, int hStep, int vStep, 
	bool bTopStart,int widthMin,int heightMin)

Row, word or character can be detected by simple changing parameters: hStep (horizon step) and vStep (vertical step). Rectangle boundaries also can be detected from top to bottom or left to right by changing bTopStart to true or false. Rectangle can be limited by widthMin and d. The biggest advantage of my algorithm is: it can detect words or character groups which do not lie in a same row.

The character candidate recognition can be obtained by function as follows:

   public void PatternRecognitionThread(Bitmap bitmap)
        {
            _originalBitmap = bitmap;
            if (_rowList == null)
            {
                _rowList = AForge.Imaging.Image.PatternRectangeBoundaryList
		(_originalBitmap,255, 30, 1, true, 5, 5);
                _irowIndex = 0;
                
            }
            foreach(Rectangle rowRect in _rowList)
            {
                _currentRow = AForge.Imaging.ImageResize.ImageCrop
		(_originalBitmap, rowRect);
                if (_iwordIndex == 0)
                {
                    _currentWordsList = AForge.Imaging.Image.PatternRectangeBoundaryList
			(_currentRow, 255, 20, 10, false, 5, 5);
                }
                
                foreach (Rectangle wordRect in _currentWordsList)
                {
                    _currentWord = AForge.Imaging.ImageResize.ImageCrop
			(_currentRow, wordRect);
                   _iwordIndex++;
                    if (_icharIndex == 0)
                    {
                        _currentCharsList = 
			AForge.Imaging.Image.PatternRectangeBoundaryList
			(_currentWord, 255, 1, 1, false, 5, 5);
                    }
                    
                    foreach (Rectangle charRect in _currentCharsList)
                    {
                        _currentChar = AForge.Imaging.ImageResize.ImageCrop
			(_currentWord, charRect);
                       _icharIndex++;
                        Bitmap bmptemp = AForge.Imaging.ImageResize.FixedSize
			(_currentChar, 21, 21);
                        bmptemp = AForge.Imaging.Image.CreateColorPad
			(bmptemp,Color.White, 4, 4);
                        bmptemp = AForge.Imaging.Image.CreateIndexedGrayScaleBitmap
				(bmptemp);
                        byte[] graybytes = AForge.Imaging.Image.GrayscaletoBytes(bmptemp);
                        PatternRecognitionThread(graybytes);
                        m_bitmaps.Add(bmptemp);
                    }
                    string s = " \n";
                    _form.Invoke(_form._DelegateAddObject, new Object[] { 1, s });
                          If(_icharIndex ==_currentCharsList.Count)
                          {
                          _icharIndex =0;
                          }
                 }
                 If(_iwordIndex==__currentWordsList.Count)
                 {
                          _iwordIndex=0;
                 }
            }            
        }

Character Recognition

The Convolution Neural Network (CNN) in the original program is essentially a CNN with five layers, including the input layer. The detail of the convolution architecture has been described by Mike and by Dr. Simard in their article: "Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis". The general strategy of this convolutional network is to extract simple features at a higher resolution, and then convert them into more complex features at a coarser resolution. The simplest way to generate a coarser resolution is to sub-sample a layer by a factor of 2. This, in turn, is a clue to the convolution's kernel's size. The width of the kernel is chosen to be centered on a unit (odd size), to have sufficient overlap to not lose information (3 would be too small with only one unit overlap), but yet to not have redundant computation (7 would be too large, with 5 units or over 70% overlap). Therefore, the convolution kernel of size 5 is chosen in this network. Padding the input (making it larger so that there are feature units centered on the border) did not improve performance significantly. With no padding, a subsampling of 2, and a kernel size of 5, each convolution layer reduces the feature size from n to (n-3)/2. Since the initial MNIST input size is 28x28, the nearest value which generates an integer size after 2 layers of convolution is 29x29. After 2 layers of convolution, the feature size of 5x5 is too small for a third layer of convolution. Dr. Simard also emphasized that if the first layer has fewer than five different features, it decreased performance; while using more than 5 did not improve it (Mike used 6 features). Similarly, on the second layer, fewer than 50 features decreased performance while more (100 features) did not improve it. A summary of the neural network is as follows:

Layer #0: is the gray scale image of the handwritten character in the MNIST database which is padded to 29x29 pixel. There are 29x29= 841 neurons in the input layer.

Layer #1: is a convolutional layer with six (6) feature maps. There are 13x13x6 = 1014 neurons, (5x5+1)x6 = 156 weights, and 1014x26 = 26364 connections from layer #1 to the previous layer.

Layer #2: is a convolutional layer with fifty (50) feature maps. There are 5x5x50 = 1250 neurons, (5x5+1)x6x50 = 7800 weights, and 1250x(5x5x6+1)=188750 connections from layer #2 to the previous layer.

(Not 32500 connections in Mike’s article).

Layer #3: is a fully-connected layer with 100 units. There are 100 neurons, 100x(1250+1) = 125100 weights, and 100x1251 = 125100 connections.

Layer #4: is the final, there are 10 neurons, 10x(100+1) = 1010 weights, and 10x101 = 1010 connections.

Back Propagation

Back propagation is the process that updates the change in the weights for each layer, which starts with the last layer and moves backwards through the layers until the first layer is reached.

In standard back propagation, each weight is updated according to the following formula:

Description: Description: Description: Description: Description: C:\Users\Viet Dung\Desktop\du kien bai viet\Article Source_files\image004.png(1)

Where eta is the "learning rate", typically a small number like 0.0005 that is gradually decreased during training. However, standard back propagation does not need to be used in the original program because of slow convergence. Instead, the second order technique called “stochastic diagonal Levenberg-Marquardt method”, which was proposed by Dr. LeCun in his article "Efficient BackProp”, has been applied. Although Mike said that it is not dissimilar to standard back propagation, a little theory should help freshmen like me to easier understand the code.

In Levenberg-Marquardt method, rw is calculated as follows:

Assuming a squared cost function:

 

Then the Gradient is:

 

And the Hessian follows as:

A simplifying approximation of the Hessian is square of the Jacobian, which is a positive semi-definite matrix of dimension: N x O.

 

Back propagation procedures for computing the diagonal Hessian in neural networks are well known. Assuming that each layer in the network has the functional form:

Using Gaus-Neuton approximation (drop the term that contains ¦’’(y)), we obtain:

and:

A Stochastic Diagonal Levenberg-Marquardt Method

In fact, techniques using full Hessian information (Levenberg-Marquardt, Gaus-Newton, etc.) can only apply to very small networks trained in batch mode, not in stochastic mode. In order to obtain a stochastic version of the Levenberg-Marquardt algorithm, Dr. LeCun has proposed the idea to compute the diagonal Hessian through a running estimate of the second derivative with respect to each parameter. The instantaneous second derivative can be obtained via back propagation as shown in the formulas (7, 8, 9). As soon as we have those running estimates, we can use them to compute individual learning rates for each parameter:

Where e is the global learning rate, and Description: Description: Description: Description: Description: C:\Users\Viet Dung\Desktop\du kien bai viet\Article Source_files\image015.pngis a running estimate of the diagonal second derivative with respect to hki. m is a parameter to prevent hki from blowing up in case the second derivative is small, i.e., when the optimization moves in flat parts of the error function. The second derivatives can be computed in a subset of the training set (500 randomized patterns / 60000 patterns of the training set). Since they change very slowly, they only need to be re-estimated every few epochs. In the original program, the diagonal Hessian is re-estimated every epoch.

Here is the second derivative computation function in C#:

public void BackpropagateSecondDerivatives(DErrorsList d2Err_wrt_dXn /* in */,
                                                    DErrorsList d2Err_wrt_dXnm1 /* out */)
{
    // nomenclature (repeated from NeuralNetwork class)
    // NOTE: even though we are addressing SECOND
    // derivatives ( and not first derivatives),
    // we use nearly the same notation as if there
    // were first derivatives, since otherwise the
    // ASCII look would be confusing.  We add one "2"
    // but not two "2's", such as "d2Err_wrt_dXn",
    // to give a gentle emphasis that we are using second derivatives
    //
    // Err is output error of the entire neural net
    // Xn is the output vector on the n-th layer
    // Xnm1 is the output vector of the previous layer
    // Wn is the vector of weights of the n-th layer
    // Yn is the activation value of the n-th layer,
    //   i.e., the weighted sum of inputs BEFORE the squashing function is applied
    // F is the squashing function: Xn = F(Yn)
    // F' is the derivative of the squashing function
    //   Conveniently, for F = tanh, then F'(Yn) = 1 - Xn^2, i.e.,
    //   the derivative can be calculated from the output,
    //   without knowledge of the input 
 
    int ii, jj;
    uint kk;
    int nIndex;
    double output;
    double dTemp;
 
    var d2Err_wrt_dYn = new DErrorsList(m_Neurons.Count);
    //
    // std::vector< double > d2Err_wrt_dWn( m_Weights.size(), 0.0 );
    // important to initialize to zero
    //////////////////////////////////////////////////
    //
    ///// DESIGN TRADEOFF: REVIEW !!
    //
    // Note that the reasoning of this comment is identical
    // to that in the NNLayer::Backpropagate() 
    // function, from which the instant
    // BackpropagateSecondDerivatives() function is derived from
    //
    // We would prefer (for ease of coding) to use
    // STL vector for the array "d2Err_wrt_dWn", which is the 
    // second differential of the current pattern's error
    // wrt weights in the layer.  However, for layers with
    // many weights, such as fully-connected layers,
    // there are also many weights.  The STL vector
    // class's allocator is remarkably stupid when allocating
    // large memory chunks, and causes a remarkable 
    // number of page faults, with a consequent
    // slowing of the application's overall execution time.
 
    // To fix this, I tried using a plain-old C array,
    // by new'ing the needed space from the heap, and 
    // delete[]'ing it at the end of the function.
    // However, this caused the same number of page-fault
    // errors, and did not improve performance.
 
    // So I tried a plain-old C array allocated on the
    // stack (i.e., not the heap).  Of course I could not
    // write a statement like 
    //    double d2Err_wrt_dWn[ m_Weights.size() ];
    // since the compiler insists upon a compile-time
    // known constant value for the size of the array.  
    // To avoid this requirement, I used the _alloca function,
    // to allocate memory on the stack.
    // The downside of this is excessive stack usage,
    // and there might be stack overflow probelms.  That's why
    // this comment is labeled "REVIEW"
 
    double[] d2Err_wrt_dWn = new double[m_Weights.Count];
    for (ii = 0; ii < m_Weights.Count; ++ii)
    {
        d2Err_wrt_dWn[ii] = 0.0;
    }
    // calculate d2Err_wrt_dYn = ( F'(Yn) )^2 *
    //    dErr_wrt_Xn (where dErr_wrt_Xn is actually a second derivative )
 
    for (ii = 0; ii < m_Neurons.Count; ++ii)
    {
        output = m_Neurons[ii].output;
        dTemp = m_sigmoid.DSIGMOID(output);
        d2Err_wrt_dYn.Add(d2Err_wrt_dXn[ii] * dTemp * dTemp);
    }
    // calculate d2Err_wrt_Wn = ( Xnm1 )^2 * d2Err_wrt_Yn
    // (where dE2rr_wrt_Yn is actually a second derivative)
    // For each neuron in this layer, go through the list
    // of connections from the prior layer, and
    // update the differential for the corresponding weight
 
    ii = 0;
    foreach (NNNeuron nit in m_Neurons)
    {
        foreach (NNConnection cit in nit.m_Connections)
        {
            try
            {
                 kk = (uint)cit.NeuronIndex;
                if (kk == 0xffffffff)
                {
                    output = 1.0;
                    // this is the bias connection; implied neuron output of "1"
                }
                else
                {
                    output = m_pPrevLayer.m_Neurons[(int)kk].output;
                }
 
                //  ASSERT( (*cit).WeightIndex < d2Err_wrt_dWn.size() );
                // since after changing d2Err_wrt_dWn to a C-style array,
                // the size() function this won't work

                d2Err_wrt_dWn[cit.WeightIndex] = d2Err_wrt_dYn[ii] * output * output;
            }
            catch (Exception ex)
            {
 
           }
        }
 
        ii++;
    }
    // calculate d2Err_wrt_Xnm1 = ( Wn )^2 * d2Err_wrt_dYn
    // (where d2Err_wrt_dYn is a second derivative not a first).
    // d2Err_wrt_Xnm1 is needed as the input value of
    // d2Err_wrt_Xn for backpropagation of second derivatives
    // for the next (i.e., previous spatially) layer
    // For each neuron in this layer
 
    ii = 0;
    foreach (NNNeuron nit in m_Neurons)
    {
        foreach (NNConnection cit in nit.m_Connections)
        {
            try
            {
                kk = cit.NeuronIndex;
                if (kk != 0xffffffff)
                {
                    // we exclude ULONG_MAX, which signifies the phantom bias neuron with
                    // constant output of "1", since we cannot train the bias neuron
 
                    nIndex = (int)kk;
                    dTemp = m_Weights[(int)cit.WeightIndex].value;
                    d2Err_wrt_dXnm1[nIndex] += d2Err_wrt_dYn[ii] * dTemp * dTemp;
                }
            }
            catch (Exception ex)
            {
                return;
            }
        }
 
        ii++;  // ii tracks the neuron iterator 
    }
    double oldValue, newValue;
 
    // finally, update the diagonal Hessians
    // for the weights of this layer neuron using dErr_wrt_dW.
    // By design, this function (and its iteration
    // over many (approx 500 patterns) is called while a 
    // single thread has locked the neural network,
    // so there is no possibility that another
    // thread might change the value of the Hessian.
    // Nevertheless, since it's easy to do, we
    // use an atomic compare-and-exchange operation,
    // which means that another thread might be in 
    // the process of backpropagation of second derivatives
    // and the Hessians might have shifted slightly
 
    for (jj = 0; jj < m_Weights.Count; ++jj)
    {
        oldValue = m_Weights[jj].diagHessian;
        newValue = oldValue + d2Err_wrt_dWn[jj];
        m_Weights[jj].diagHessian = newValue;
    }
}
//////////////////////////////////////////////////////////////////

Training and Experiments

Although there is an incompatibility between MFC/C++ and C#, my program is similar to the original. Using the MNIST database, the network performed 291 mis-recognitions out of 60,000 patterns of the training set. It means the error rate is only 0.485%. However, it performed 136 mis-recognitions out of 10,000 patterns of the testing set, and the error rate is 1.36 %. The result was not as good as the benchmark, but it was enough for me to do experiments with my own handwritten character set. Firstly, the input picture was divided into character groups from top to bottom, after that, characters in each group would be detected from left to right and resized to 29x29 pixels before recognized by the neural network. The program satisfied my requirements in general, my own hand written digits could be recognized in good order. Detection functions have been added to AForge.Net’s Image processing library for future works. However, because it has only been programmed at my free times only, so I am sure that it has huge bugs that need to be fixed. Back propagation time is an example. It usually takes around 3800 seconds per epoch with a distorted training set, but only 2400 seconds vice versa. (My computer is an Intel Pentium dual core E6500.) It is rather slow when compared to Mike’s program etc. I also do hope to have a better handwritten character database or cooperate with someone to continue my experiments and developing my algorithms for a real application.

NNHandwrittenCharRecCs/pic_5_small.png

Bibliography

License

This article, along with any associated source code and files, is licensed under The MIT License

About the Author

Vietdungiitb
Vietnam Maritime University
Vietnam Vietnam
Member
No Biography provided

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
QuestionError : "Input string was not in a correct format."memberMember 100232435 May '13 - 21:37 
Hi all, first of all thanks a lot for this great tutorial. I was going to run the solution in my PC and I'm getting this error "Input string was not in a correct format". I can't get this corrected. Can anybody please help? Thanks in advance.
AnswerRe: Error : "Input string was not in a correct format."mvpVietdungiitb5 May '13 - 21:41 
you should download full source code here: Multiple convolution neural networks approach for online handwriting recognition[^]
QuestionNice prototype, need your contact to cooperatememberMember 100232351 May '13 - 18:06 
Hi Vietdung,
 
Mình là Dương, mình rất quan tâm đến phần nghiên cứu này của bạn và đang có project muốn hợp tác, rất vui có thể làm quen với bạn. Bạn có thể liên lạc với mình qua email: duonguh@gmail.com, hoặc qua mobile: 0936026822. Rất mong nhận được phản hồi của bạn. Thanks.
 
Regards
QuestionCan tim cac ban co dam me ve nhan dang/AI/tri tue nhan taomemberMember 999845422 Apr '13 - 20:25 
Choa cac ban
 
Minh o HN, cty minh lam dich vu tren nen mobile phone va cung dang dau tu nghien cuu ve nhan dang. Minh muon tim kiem nhung nguoi co cung chung niem dam me de chi se cong cu, tai lieu va hoc hoi lan nhau.
 
Anh em nao quan tam xin vui long gui email vao habbsh@gmail.com.
 
Xin chan thanh cam on,
Ha
QuestionMnistLabelFileHeadermemberLaurent.iss29 Jan '13 - 5:33 
I downloaded the MNIST data bases from
 
Neural Network for Recognition of Handwritten Digits[^]
and I get the message "Item numbers are different".
 
Where can I find a correct data base ?
 
Thanks aniway for this great project!
Questioncho em hỏimemberHuy Ngo20 Nov '12 - 22:50 
cho em hỏi chút, anh dùng cách gì để đưa 1 ảnh với độ phân giải bất kỳ về dạng chuẩn giống với MNIST data (29x29 pixel)? vì em thấy trên có những ảnh với kích thước khác nhau, vì thế chắc phải có cách để chuẩn hóa trước khi cho vào ANN Big Grin | :-D
GeneralMy vote of 5mvpKanasz Robert6 Nov '12 - 4:39 
Another great job. Well done!
QuestionChương trình bị lỗi rồimembercosnet6 Oct '12 - 6:21 
mình down về nó chạy vs2008|vs2010 nó báo thiếu file NeuralNetworkLibrary.dll
vào kiểm tra thì thấy nó sờ sờ đó...Rebuild thì nó báo 25 errors...
 
làm ơn send cho 1 code hoàn chỉnh, thanks!
AnswerRe: Chương trình bị lỗi rồimemberVietdungiitb7 Oct '12 - 16:37 
Bạn add thư viện vào mục references sẽ không bị lỗi như vậy nữa.
Chúc thành công
Questionregarding increase in input sizememberATISH VAZE21 Sep '12 - 3:30 
First of all thanks for such informative article..
I have been using similar CNN based approach for HINDI characters [42 classes].
As Hindi characters are complex in shape,i want to increase the input size to say 60x60, can you please suggest how to proceed regarding that.
GeneralMy vote of 3memberlordbasset21 Aug '12 - 1:12 
nnt
GeneralI need the resource code! [modified]memberVirusing29 Jun '12 - 17:37 
Your code is helpful.But I need the resource code.I need the resource code or function interface of "Neurons.dll","UPImage.dll" and "UPUnipenLib.dll".
Please.
Thank you.
Best wishs.

modified 1 Jul '12 - 21:45.

Questionfeature mapmemberphuong303010 Apr '12 - 23:02 
anh ơi, anh cho em hỏi về feature map. Với ảnh đưa vào mạng kích thước 29x29. Việc lấy ra 6 feature map kích thước 13x13 là như nào ạ? Em xin cảm ơn anh.
AnswerRe: feature mapmemberVietdungiitb10 Apr '12 - 23:21 
các feature map 13x13 được thực hiện thông qua lớp convolution bằng kỹ thuật sub sampling sử dụng một ma trận kernel 5x5. Ma trận này dịch chuyển trên ảnh đầu vào từ trái qua phải, từ trên xuống dưới với mỗi bước là 2. Kết quả là ta sẽ có feature map 13x13. Em có thể tìm hiểu thêm thông qua các bài báo về convolution network trong phần tham khảo của bài viết. Nếu cần tìm hiểu thêm về convolution network cứ gửi mail cho anh theo địa chỉ vietdungIITB@gmail.com.
Chúc vui.
GeneralRe: feature mapmemberphuong303012 Apr '12 - 17:25 
em cám ơn anh ạ.
Questionrunning errormemberngoclx_ro6 Apr '12 - 4:47 
Hi there,
Thank you for sharing the code, but i have an error when running the project. VS raises me the error "FormatException was unhandled. Input string was not in a correct format..." on the line "Preferences _Preference = new Preferences();". Dont you know why it shows me the error?
Thank you so much!
Questionquestionmembernghialethanh20 Dec '11 - 7:58 
solution don't have a "open image file" menu. so its hard to understand how this work from the solution.
GeneralMy vote of 5memberSergio Andrés Gutiérrez Rojas12 Dec '11 - 8:50 
Excellent Work friend.
QuestionHow to run itmemberdungbarca9013 Nov '11 - 3:04 
I use visual 2008.When i run,it has a error because it miss NeuralNetworkLibrary.dll.I had add it to references but it had error
AnswerRe: How to run itmemberKaguMcGallen3 Jul '12 - 23:39 
Your problem is probably caused by NeuralNetworkLibrary which expects a Default-Ini.ini file with some default parameters for the neural network. Just download the Demo and copy the Data folder to your bin/Debug (or bin/Release) folder.
BugThere is a bug in Mike's code, and yours toomemberpgpvn20 Sep '11 - 2:12 
There is a bug in Mike's code, you may see it at http://www.codeproject.com/Messages/2004682/Re-Weights-in-Level-sharp2.aspx[^]
The correct weights per feature map is 5x5x6+1 = 151 so the total weights are 151* 50 = 7550; total connections are 1250x26 = 32500. Your code used 188750 connections, that's why it's slower.
 
Regarding error rate for training set is only 0.485% but jumps to 1.36 % in test set, it's likely that you had over-fitted the network.
 
You may look at this article Convolutional Neural Network MNIST Workbench[^]. He reached 0.46% error rate in the test set after 40 epochs. More epochs raised error rate.
 
By the way, how's thing in Do Son recently? Poke tongue | ;-P
GeneralRe: There is a bug in Mike's code, and yours toomemberngoclx_ro8 Apr '12 - 0:56 
You've calculated only the connections between layer 1 (the one with 6 feature maps, each 13x13) and layer 2 (the one with 50 feature maps, each 5x5). But for all (between #1, #2, #3, #4 and #5), it's all 188750 connectionsSmile | :) i think the author calculated wellSmile | :)
GeneralRe: There is a bug in Mike's code, and yours toomemberVietdungiitb8 Apr '12 - 3:13 
My mistake in this program is weight caculation in layer02 of CreateNetwork function. it was: iNumWeight = fm * 156; but the correct is iNumWeight = fm * (5*5*6+1).
This mistake has been corrected in next version. So now it is:
iNumWeight = fm * ((int)Math.Pow(kernelsize, 2) * prevLayer.FeatureMapCount + 1);
 
In next version the CreateNetwork function is very simple like this:
void CreateNetwork()
{
network = new ConvolutionNetwork();
//layer 0: inputlayer
network.Layers = new NNLayer[6];
network.LayerCount = 6;
NNLayer layer = new NNLayer("00-Layer Input", null, new Size(29, 29), 1, 5);
network.InputDesignedPatternSize = new Size(29, 29);
layer.Initialize();
network.Layers[0] = layer;
layer = new NNLayer("01-Layer ConvolutionalSubsampling", layer, new Size(13, 13), 10, 5);
layer.Initialize();
network.Layers[1] = layer;
layer = new NNLayer("02-Layer ConvolutionalSubsampling", layer, new Size(5, 5), 60, 5);
layer.Initialize();
network.Layers[2] = layer;
layer = new NNLayer("03-Layer FullConnected", layer, new Size(1, 200), 1, 5);
layer.Initialize();
network.Layers[3] = layer;
layer = new NNLayer("04-Layer FullConnected", layer, new Size(1, 100), 1, 5);
layer.Initialize();
network.Layers[4] = layer;
layer = new NNLayer("05-Layer FullConnected", layer, new Size(1, Letters.Count), 1, 5);
layer.Initialize();
network.Layers[5] = layer;
network.TagetOutputs = Letters;

}
I will publish next version of this program when it finish. Wink | ;)
GeneralCode Issue [modified]memberkebomix21 May '11 - 13:00 
Hello Again,
 
there is a part of code confusing me very much, i even read this part more than 10 times, it is in your code and Mike's code as well, in the weights update step in back-propagation function,
 
 for (jj = 0; jj < m_Weights.Count; ++jj)
                {
                    divisor = m_Weights[jj].diagHessian + dMicron;
 
                    epsilon = etaLearningRate / divisor;
                    oldValue = m_Weights[jj].value;
                    newValue = oldValue - epsilon * dErr_wrt_dWn[jj];
                    while ( oldValue != Interlocked.CompareExchange(
                           ref (m_Weights[jj].value),
                            (double)newValue,(double) oldValue)) 
                    {
                        // another thread must have modified the weight.

                        // Obtain its new value, adjust it, and try again

                        oldValue = m_Weights[jj].value;
                        newValue = oldValue - epsilon * dErr_wrt_dWn[jj];
                    }              
                }
 
suppose I'm not going to use threading, so i will ignore while loop, we stored the new weight value in newValue double variable, so where is weight update step ?! we only stored new weight value but we didn't apply it on actual weight,
shouldn't it be like this ??? (I'm re-coding this in Java BTW).
 
  for (jj = 0; jj &lt; m_Weights.Count; ++jj)
                {
                    divisor = m_Weights[jj].diagHessian + dMicron;
 
                    epsilon = etaLearningRate / divisor;
                    oldValue = m_Weights[jj].value;
                    newValue = oldValue - epsilon * dErr_wrt_dWn[jj];
                    m_Weights[jj].value= newValue;
                }
looking forward your quick response, I'm stuck at CNN for a while.
 
UPDATE: one last question, how long does it takes for the network to start to converge, During First Epoch MSE started at about 31 and kept decreasing till it reached 1.90, then it never been less than that for next 2 epochs ?! is that normal (i stopped the execution after 3 epochs since in the 2nd and 3rd epoch since the recognized value of each pattern was the same!!! ) ?

modified on Sunday, May 22, 2011 6:57 AM

GeneralRe: Code IssuememberVietdungiitb22 May '11 - 15:44 
Hi,
for (jj = 0; jj < m_Weights.Count; ++jj)
{
divisor = m_Weights[jj].diagHessian + dMicron;

epsilon = etaLearningRate / divisor;
oldValue = m_Weights[jj].value;
newValue = oldValue - epsilon * dErr_wrt_dWn[jj];
while ( oldValue != Interlocked.CompareExchange(
ref (m_Weights[jj].value),
(double)newValue,(double) oldValue))
{
// another thread must have modified the weight.
 
// Obtain its new value, adjust it, and try again
 
oldValue = m_Weights[jj].value;
newValue = oldValue - epsilon * dErr_wrt_dWn[jj];
}
}
The above loop for updating new value of network's weights which can be accessed and changed by other thread also. (please see the Interlocked.CompareExchange in MSDN for the details). if m_Weight[jj].value==oldValue it with be replaced by newValue. In case you do not use threading you should ignore this function and simple replace the m_Weights[jj].value by newValue. However, it will influent sufficiently to convergent speed ( more long time per epoch).
because we use distortion method to train my network, the input patterns are (distored) different each time. So the network's recognition capacity is better if you could train more (run more epoch).
Hope it could help
best regards.
Vietdungiitb.

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web04 | 2.6.130523.1 | Last Updated 14 Mar 2012
Article Copyright 2011 by Vietdungiitb
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid