Add your own alternative version
Stats
322.9K views 31.6K downloads 305 bookmarked
Posted
28 Dec 2010

Comments and Discussions


Good afternoon, dear respective Filip D'haene!
Could you please say does the SGDLevenbergMarquardt implemented here is the Stochastic Diagonal LevenbergMarquardt which uses Diagonal Hessian Matrix of Second order partial derivatives? Because on the internet the only Stochastic version of Levenberg Marquardt Algorithm is Stochastic Diagonal LevenbergMarquardt described at "Gradient Based Learning Applied to Document Recognition, page 41 appendix C". May I ask you how it would be possible to see the contents of diagonal Hessian matrix after each epoch? Also would it be possible to avoid calculation of second order derivatives if I would approximate diagonal Hessian matrix by JTJ formula from classic LM method (Jacobian multiplied by Jacobian)? If yes, could you please say how it would reflect on quality of recognition? I would be very grateful to receive a reply! Thank you!





Good evening, dear respective Mr. Filip D'haene!
If is it possible may I ask you few questions regarding to Convolutional Neural Network Workbench? Could you please in what lines of code is it possible to add modification so it would be available to see the contents of all feature maps and kernels in each layer of LeNet5. Would it be possible to launch Convolutional Neural Network Workbench in forward propagation mode, so that I can submit one MNIST image and see the output of last fully connected layer with 10 neurons, could you also please say where I should add additional code to see all generated weights? Thank you in advance for reply!
Sincerely





Hey,
Thank you for your application. I ran it on my laptop for 41 hours to get an efficiency of only 58%. What tweaks to the design should one make to increase it's efficiency?
Also, could you please help us with the change of dataset? We intend to use a simpler dataset with already eliminated background, but with about 300 classes of objects (nonliving).
Could you please get us started with this?
Thank You





Lovely program. And thank you so much for sharing it with us.
I would like to change the database and include some of my own classes and work on improving the efficiency for the program.
Do you have a documentation, or some links through which I can understand the nittygritties of the code.
We are trying to integrate the code with a robotic arm, therefore the efficiency is of paramount importance. So is the change of database.
Could you please help us with this?
Sincere request





Hi,
It is me again.
What is confusing me: in CNN I used before
NeuralNetwork network = new NeuralNetwork(DataProvider, "CNNCIFAR10Z2", 10, 1D, LossFunctions.CrossEntropy, DataProviderSets.CIFAR10, TrainingStrategy.SGDLevenbergMarquardt);
network.AddLayer(LayerTypes.Input, 3, 32, 32);
bool[] maps = new bool[3 * 64]
{.............................................};
network.AddLayer(LayerTypes.Convolutional, ActivationFunctions.ReLU, 64, 28, 28, 5, 5, 1, 1, 0, 0, new Mappings(maps));
network.AddLayer(LayerTypes.StochasticPooling, ActivationFunctions.Ident, 64, 14, 14, 3, 3, 2, 2);
network.AddLayer(LayerTypes.LocalResponseNormalizationCM, ActivationFunctions.None, 64, 14, 14, 3, 3);
network.AddLayer(LayerTypes.Convolutional, ActivationFunctions.ReLU, 64, 10, 10, 5, 5, 1, 1, 0, 0, new Mappings(64, 64, 66, 1));
network.AddLayer(LayerTypes.LocalResponseNormalizationCM, ActivationFunctions.None, 64, 10, 10, 3, 3);
network.AddLayer(LayerTypes.StochasticPooling, ActivationFunctions.Ident, 64, 5, 5, 3, 3, 2, 2);
network.AddLayer(LayerTypes.Local, ActivationFunctions.ReLU, 64, 1, 1, 5, 5, 1, 1, 0, 0, new Mappings(64, 64, 66, 2));
network.AddLayer(LayerTypes.Local, ActivationFunctions.Logistic, 384, 1, 1, 5, 5, 1, 1, 0, 0, 50);
network.AddLayer(LayerTypes.FullyConnected, ActivationFunctions.SoftMax, 10);
network.InitializeWeights();
...................................................................................
each neuron in the local layer has several separate bias connections.
For example, in the layer 7 the neuron #0 has biases with weight indexes #26, #52, etc (the map #0 is connected to the previous maps #1, #2, etc.) Generally, if the map size is 1 x 1, number of neuron biases is equal to number of previous maps it is connected. Is it by design?
in the layer #7 network.AddLayer(LayerTypes.Local, ActivationFunctions.ReLU, 64, 1, 1, 5, 5, 1, 1, 0, 0, new Mappings(64, 64, 66, 2));





Hi,
There's a mistake in the network definition above. More precisely in the definition of layer #8. You can't have a layer with a receptive field of 5x5 in that position because the size of a map in layer #7 is exactly 1x1. Try changing the receptive field size in layer #8 to 1x1 instead of 5x5.





Yes I know it. On my computer the receptive field is 1 x 1. But this exacerbates the problem I am reporting: each neuron in the layer 8 has 64 biases, all with different weight indexes.





You're absolutely right!!! That's a very nasty big bug! Thanks for debugging the code!
Try changing the code for the Local layer type to this:
WeightCount = (totalMappings * MapSize * ReceptiveFieldSize) + NeuronCount;
...
if (!IsFullyMapped)
{
int mapping = 0;
int[] mappingCount = new int[MapCount * PreviousLayer.MapCount];
for (int curMap = 0; curMap < MapCount; curMap++)
for (int prevMap = 0; prevMap < PreviousLayer.MapCount; prevMap++)
{
mappingCount[prevMap + (curMap * PreviousLayer.MapCount)] = mapping;
if (Mappings.IsMapped(curMap, prevMap, MapCount))
mapping++;
}
Parallel.For(0, MapCount, curMap =>
{
for (int prevMap = 0; prevMap < PreviousLayer.MapCount; prevMap++)
{
int positionPrevMap = prevMap * maskSize;
if (Mappings.IsMapped(curMap, prevMap, MapCount))
{
int iNumWeight = (mappingCount[prevMap + (curMap * PreviousLayer.MapCount)] * ReceptiveFieldSize * MapSize) + NeuronCount;
for (int y = 0; y < MapHeight; y++)
for (int x = 0; x < MapWidth; x++)
{
int position = x + (y * MapWidth) + (curMap * MapSize);
AddBias(ref Connections[position], curMap);
int pIndex;
for (int row = 0; row < ReceptiveFieldHeight; row++)
for (int column = 0; column < ReceptiveFieldWidth; column++)
{
pIndex = x + (y * maskWidth) + kernelTemplate[column + (row * ReceptiveFieldWidth)] + positionPrevMap;
if (maskMatrix[pIndex] != 1)
AddConnection(ref Connections[position], maskMatrix[pIndex], iNumWeight++);
}
}
}
}
});
}
else
{
if (totalMappings > MapCount)
{
Parallel.For(0, MapCount, curMap =>
{
for (int prevMap = 0; prevMap < PreviousLayer.MapCount; prevMap++)
{
int positionPrevMap = prevMap * maskSize;
int mapping = prevMap + (curMap * PreviousLayer.MapCount);
int iNumWeight = (mapping * ReceptiveFieldSize * MapSize) + NeuronCount;
for (int y = 0; y < MapHeight; y++)
for (int x = 0; x < MapWidth; x++)
{
int position = x + (y * MapWidth) + (curMap * MapSize);
AddBias(ref Connections[position], curMap);
int pIndex;
for (int row = 0; row < ReceptiveFieldHeight; row++)
for (int column = 0; column < ReceptiveFieldWidth; column++)
{
pIndex = x + (y * maskWidth) + kernelTemplate[column + (row * ReceptiveFieldWidth)] + positionPrevMap;
if (maskMatrix[pIndex] != 1)
AddConnection(ref Connections[position], maskMatrix[pIndex], iNumWeight++);
}
}
}
});
}
else
{
Parallel.For(0, MapCount, curMap =>
{
int iNumWeight = (curMap * ReceptiveFieldSize * MapSize) + NeuronCount;
for (int y = 0; y < MapHeight; y++)
for (int x = 0; x < MapWidth; x++)
{
int position = x + (y * MapWidth) + (curMap * MapSize);
AddBias(ref Connections[position], curMap);
int pIndex;
for (int row = 0; row < ReceptiveFieldHeight; row++)
for (int column = 0; column < ReceptiveFieldWidth; column++)
{
pIndex = x + (y * maskWidth) + kernelTemplate[column + (row * ReceptiveFieldWidth)];
if (maskMatrix[pIndex] != 1)
AddConnection(ref Connections[position], maskMatrix[pIndex], iNumWeight++);
}
}
});
}
}





Sorry, made a mistake. Should be:
if (!IsFullyMapped)
{
int mapping = 0;
int[] mappingCount = new int[MapCount * PreviousLayer.MapCount];
for (int curMap = 0; curMap < MapCount; curMap++)
for (int prevMap = 0; prevMap < PreviousLayer.MapCount; prevMap++)
{
mappingCount[prevMap + (curMap * PreviousLayer.MapCount)] = mapping;
if (Mappings.IsMapped(curMap, prevMap, MapCount))
mapping++;
}
Parallel.For(0, MapCount, curMap =>
{
for (int prevMap = 0; prevMap < PreviousLayer.MapCount; prevMap++)
{
int positionPrevMap = prevMap * maskSize;
if (Mappings.IsMapped(curMap, prevMap, MapCount))
{
int iNumWeight = (mappingCount[prevMap + (curMap * PreviousLayer.MapCount)] * ReceptiveFieldSize * MapSize) + NeuronCount;
for (int y = 0; y < MapHeight; y++)
for (int x = 0; x < MapWidth; x++)
{
int position = x + (y * MapWidth) + (curMap * MapSize);
AddBias(ref Connections[position], position);
int pIndex;
for (int row = 0; row < ReceptiveFieldHeight; row++)
for (int column = 0; column < ReceptiveFieldWidth; column++)
{
pIndex = x + (y * maskWidth) + kernelTemplate[column + (row * ReceptiveFieldWidth)] + positionPrevMap;
if (maskMatrix[pIndex] != 1)
AddConnection(ref Connections[position], maskMatrix[pIndex], iNumWeight++);
}
}
}
}
});
}
else
{
if (totalMappings > MapCount)
{
Parallel.For(0, MapCount, curMap =>
{
for (int prevMap = 0; prevMap < PreviousLayer.MapCount; prevMap++)
{
int positionPrevMap = prevMap * maskSize;
int mapping = prevMap + (curMap * PreviousLayer.MapCount);
int iNumWeight = (mapping * ReceptiveFieldSize * MapSize) + NeuronCount;
for (int y = 0; y < MapHeight; y++)
for (int x = 0; x < MapWidth; x++)
{
int position = x + (y * MapWidth) + (curMap * MapSize);
AddBias(ref Connections[position], position);
int pIndex;
for (int row = 0; row < ReceptiveFieldHeight; row++)
for (int column = 0; column < ReceptiveFieldWidth; column++)
{
pIndex = x + (y * maskWidth) + kernelTemplate[column + (row * ReceptiveFieldWidth)] + positionPrevMap;
if (maskMatrix[pIndex] != 1)
AddConnection(ref Connections[position], maskMatrix[pIndex], iNumWeight++);
}
}
}
});
}
else
{
Parallel.For(0, MapCount, curMap =>
{
int iNumWeight = (curMap * ReceptiveFieldSize * MapSize) + NeuronCount;
for (int y = 0; y < MapHeight; y++)
for (int x = 0; x < MapWidth; x++)
{
int position = x + (y * MapWidth) + (curMap * MapSize);
AddBias(ref Connections[position], position);
int pIndex;
for (int row = 0; row < ReceptiveFieldHeight; row++)
for (int column = 0; column < ReceptiveFieldWidth; column++)
{
pIndex = x + (y * maskWidth) + kernelTemplate[column + (row * ReceptiveFieldWidth)];
if (maskMatrix[pIndex] != 1)
AddConnection(ref Connections[position], maskMatrix[pIndex], iNumWeight++);
}
}
});
}
}





Thanks for reply.
The net result is right, but we have multiple assignments of the same bias.
For example, for map 1 x 1, we will assign bias #0 for every previous map connected to the neuron (map) #0. Not a big deal for 64 neurons, but I saw article with many thousands of neurons in a layer.
It seems that because each and every neuron has a bias, and you placed the biases in the beginning of the weight array, it might be simpler just assign the biases to connections outside of the previousMap loop.





Hi,
I understand your reasoning, but I don't see a proper way to implement it like you describe whithout altering all the fprop, bprop & bbprop steps. I'm currently not using this codebase anymore for myself. Have now a much faster c++ implementation I'm still tinkering on. Thanks anyway for debugging the code!





Yes, C++ is faster.
Interested...





Fix that you suggested, indeed, links the biases to the right weights. But it introduces a new connection[][i] in Connections for each connected previous map.
For example, the layer 7 (Local) consists of 64 maps size 1 x1. Maps are connected to the previous layer's 64 x 5 x 5 maps. The first map is not connected to the first map, but is connected to the previous maps #2 and #3. The function AddBias(Connections[posotion], position) is called on position #0 for each connected map. On each call it resizes the array Connections[][] and adds the new connections to the end of the array.
As a result, we have
Connections[0][0] with Neuron ID MAX_INT and Weight ID 0
Connections[0][26] with Neuron ID MAX+INT and Weight ID 0,
etc. 0
So for biases we still have many connections to the same weight (bias.)
Does it compromise forward and backdrop calculations? Seems like for forward calculations it adds bias for layer's neuron multiple times.





Maybe something like this will address the issue:
Parallel.For(0, MapCount, curMap =>
{
for (int prevMap = 0; prevMap < PreviousLayer.MapCount; prevMap++)
{
int positionPrevMap = prevMap * maskSize;
if (Mappings.IsMapped(curMap, prevMap, MapCount))
{
int iNumWeight = (mappingCount[prevMap + (curMap * PreviousLayer.MapCount)] * ReceptiveFieldSize * MapSize) + NeuronCount;
for (int y = 0; y < MapHeight; y++)
for (int x = 0; x < MapWidth; x++)
{
int position = x + (y * MapWidth) + (curMap * MapSize);
int pIndex;
for (int row = 0; row < ReceptiveFieldHeight; row++)
for (int column = 0; column < ReceptiveFieldWidth; column++)
{
pIndex = x + (y * maskWidth) + kernelTemplate[column + (row * ReceptiveFieldWidth)] + positionPrevMap;
if (maskMatrix[pIndex] != 1)
AddConnection(ref Connections[position], maskMatrix[pIndex], iNumWeight++);
}
}
}
}
});
for (int i=0; i < NeuronCount; i++)
AddBias(ref Connections[i], i);





Yes, I already did it. I have places the for loop before parallel_for. No difference.





Don't you notice a speed improvement in training time?





I did not look at training time, because, first, C# is not so quick comparing to C++, and, second, IMHO there is a lot of corrections to speed up the existing C# program. What I am doing, I am learning from the great knowledge of ANN field you have embedded in the program. I appreciate it very much.
Years ago I have compared C# and C++ versions of the same small and simple ANN program and got about 70% gain for C++. I am not sure that comparison was correct. It was before MS Concurrency





Thanks! The speed of c# will be much better in the next generation of the .NET framework with .NET native.





Not sure if .NET Native will really improve this kind of thing. Being managed requires runtime bounds checks, which, given all the array indexing, is probably the culprit here. Some of those can be optimized away by a compiler, but most remain. You can work around that in C# using "unsafe" constructs.





Hi,
I have tried your Workbench on one of networks you suggested:
NeuralNetwork network = new NeuralNetwork(DataProvider, "CNNCIFAR10Z2", 10, 1D, LossFunctions.CrossEntropy, DataProviderSets.CIFAR10, TrainingStrategy.SGDLevenbergMarquardt);
When adding a local layer
network.AddLayer(LayerTypes.Local, ActivationFunctions.Logistic, 384, 1, 1, 5, 5, 1, 1, 0, 0, 50);
the application crashes.
The reason is reading beyond boundaries.
The maskMatrix for this layer is:
maskMatrix = new int[maskSize * PreviousLayer.MapCount];
where maskSize = 1<code> and <code>PreviousLayer.MapCount = 64
So maskMatrix has 64 entries.
But when setting connections for a local layer we have:
for (int y = 0; y < MapHeight; y++)
for (int x = 0; x < MapWidth; x++)
{
int position = x + (y * MapWidth) + (curMap * MapSize);
AddBias(ref Connections[position], iNumWeight++);
int pIndex;
for (int row = 0; row < ReceptiveFieldHeight; row++)
for (int column = 0; column < ReceptiveFieldWidth; column++)
{
pIndex = x + (y * maskWidth) + kernelTemplate[column + (row*receptiveFieldWidth)] + positionPrevMap;
if (maskMatrix[pIndex] != 1)
AddConnection(ref Connections[position], maskMatrix[pIndex], iNumWeight++);
}
}
Because makWidth = 1 , ReceptiveField is 5x5, and max of positionPrevMap is 63, the max of pIndex is 71. This is well out of boundaries of maskMatrix[pIndex] 64.
Any help?
By a way, what is the Local Layer?





Hi,
Can you please give me the definition of the whole network you would like to construct.
A Local connected layer is like a convolutional layer but without the weight sharing.





Thank you for reply.
The network is from NeuralNetwork InitializeDefaultNeuralNetwork() . I just uncommented the definition:
NeuralNetwork network = new NeuralNetwork(DataProvider, "CNNCIFAR10Z2", 10, 1D, LossFunctions.CrossEntropy, DataProviderSets.CIFAR10, TrainingStrategy.SGDLevenbergMarquardt);
network.AddLayer(LayerTypes.Input, 3, 32, 32);
bool[] maps = new bool[3 * 64]
{.............................................};
network.AddLayer(LayerTypes.Convolutional, ActivationFunctions.ReLU, 64, 28, 28, 5, 5, 1, 1, 0, 0, new Mappings(maps));
network.AddLayer(LayerTypes.StochasticPooling, ActivationFunctions.Ident, 64, 14, 14, 3, 3, 2, 2);
network.AddLayer(LayerTypes.LocalResponseNormalizationCM, ActivationFunctions.None, 64, 14, 14, 3, 3);
network.AddLayer(LayerTypes.Convolutional, ActivationFunctions.ReLU, 64, 10, 10, 5, 5, 1, 1, 0, 0, new Mappings(64, 64, 66, 1));
network.AddLayer(LayerTypes.LocalResponseNormalizationCM, ActivationFunctions.None, 64, 10, 10, 3, 3);
network.AddLayer(LayerTypes.StochasticPooling, ActivationFunctions.Ident, 64, 5, 5, 3, 3, 2, 2);
network.AddLayer(LayerTypes.Local, ActivationFunctions.ReLU, 64, 1, 1, 5, 5, 1, 1, 0, 0, new Mappings(64, 64, 66, 2));
network.AddLayer(LayerTypes.Local, ActivationFunctions.Logistic, 384, 1, 1, 5, 5, 1, 1, 0, 0, 50);
network.AddLayer(LayerTypes.FullyConnected, ActivationFunctions.SoftMax, 10);
network.InitializeWeights();
...................................................................................
The exception is thrown when the layer previous to the last layer is instantiating.
This is not a bug exactly; it is a violation of an implicit constraint.
Obviously, a receptive field should fit into a map of its previous layer. But there we have the previous map 1 x 1 neurons, and receptive field is 5 x 5 neurons. So , because the mask has dimensions of the previous layer map, and the maskMatrix consosts of the previous layer's mapCout masks, we are going out of maskMatrix boundaries when we instantiate connections to the last of previous layer's maps.
Correction to network.AddLayer(LayerTypes.Local, ActivationFunctions.Logistic, 384, 1, 1, 1, 1, 1, 1, 0, 0, 50) solves the problem, but with it this layer becomes just a full connected layer.
If we want to connect each map to the 25 (5 x 5) previous maps, we have to use mapping.
The similar configuration is in other (commented out) network in NeuralNetwork InitializeDefaultNeuralNetwork() .
I think it will not hurt to add some validation (exception) for this constraint to AddLayer() . If C# has something like static_assert of C++, compile time check (meta function) would be an excellent solution.





More and better input validation is always a good practice. I didn't put enough time in it, I shall give it more effort on a next release or project.





I've Add another DataProviderSet, modified mainly from CIFAR10 to do regression, specifically locate 8 key points. I changed the Output neural number to 16. However I am quite clear what the code below is for.
for (int i = 0; i < ClassCount; i++)
D2ErrX[i] = 1D;
Well in my case all the outputs of the neurons represent the relative location of the points. Would you please clarify it for me?





This value must be the second derivative of the cost function.
for MSE (0.5*sumof( (actualtarget)^2 )) this differential is 1, for Cross Entropy I'm not sure. Don't use TrainToValue this is plain wrong. It only matters when you're using LevenbergMarquardt based learning strategies.






General News Suggestion Question Bug Answer Joke Praise Rant Admin Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

