Click here to Skip to main content
Rate this: bad
good
Please Sign up or sign in to vote.
See more: C#C#4.0
Hey there all
 
Background:::::
I am working with a massive (2-dimensional) array of a custom neuron type for a Neural Network that seems to work well. When working with the smaller Neuron array sizes e.g.
Neuron[,] NodeList = new Neuron[800,600];
it will serialize fine.
 
However it gets stuck in some object tree flattening loop(According to some forum in my history) here and runs out of memory
Neuron[,] NodeList = new Neuron[10000,9000]
After a little research on the Google I found that I will have to use another serializer or implement my own way of saving the data.
I opted for a simple data writer as it would be easier
 

 

 
Problem(Skip here if you want :p):::
For my custom data writer i'm writing all the data needed to a file.
In my Neuron type it stores a reference to another Neuron.
All Neurons stored are all in the NodeList array as mentioned (including linked ones)
class Neuron
{
    //Some Old Neuron Info
    float IThreshold = 1;
    float IFiringValue = 1;
 
    //Link
    float IWeight = 1;
    Neuron NextNeuron;   //This Line HERE
    ...
}
For my data writer I am going through each neuron in the array and storing its info... like location in the array, weighting etc
I also want to store the location where its linked neuron is in the array
 
I would like to do something like
int NewX = NodeList[x,y].NextNeuron.ParentArrayIndex(0);
int NewX = NodeList[x,y].NextNeuron.ParentArrayIndex(1);
Where the 0 & 1 merely represent the array dimension
Is any of this possible or have a different way of getting it without searching through and checking they equal?
 
Any help appreciated,... and thank you Smile | :)
Posted 6-May-11 7:50am
Edited 6-May-11 8:11am
v3
Comments
   
I changed your tags. If you're using C#4, none of the rest of them really added anything.
SAKryukov at 6-May-11 14:31pm
   
This problem is interesting enough. My 5. --SA
yesotaso at 6-May-11 14:36pm
   
How many Neurons are we talking about? 10000*9000? If so adding even 1 int to a neuron would increase total size 360000000 bytes... damn I lost count of zeros there.
Thomas.D Williams at 6-May-11 14:47pm
   
Yep... With successfull early networks like 1000 it was fine. I then had an idea of how I could train it to be more intelligent... but required a lot more neurons so 90,000,000 neurons as a result :)
yesotaso at 6-May-11 15:12pm
   
Well it doesnt seem very hopeful situation, but if you are going to use disk minimize whatever you can. On that scale even 1 byte makes difference, most probably such hindrance will deviate your focus from main obejctive... Anyway all I can add is keep 4KB sector size in mind (it may vary due to system settings but by default it is 4) even if you read 1 byte that whole sector is read from disk to memory.
Thomas.D Williams at 6-May-11 15:06pm
   
To all those interested a 90,000,000 neuron network saved to a CSV file of 348mb... I think it must have stopped writing to the file in the last few repetitions/notepad++ doesn't work as it only shows lines up to 87,043,237. 3 millish less than expected :)
yesotaso at 6-May-11 15:16pm
   
Text output may be deceptive. As 1 digit integer holds 1 byte in text 9 digit holds 9 byte.
Thomas.D Williams at 6-May-11 15:31pm
   
I'm not to familiar with other file types or writing my own. This part of the system will only be needed temporarly any way. Just for training :)
Thomas.D Williams at 6-May-11 15:33pm
   
I estimate that the training of the Neural Network will use 100GB hard drive space and take 20hours to complete :) A lot less than I anticipated :)
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 3

As I promised, I'm posting my sketch following the discussion of storage and access to the neuron data.
 
This is just cultural techniques which you may or may not be well familiar with.
These techniques illustrate the following:
 
  1. Separation of storage layer from semantic layer.
  2. You don't need to work with arrays, instead, you need array-like interface to the network with the use of indexed property this.
  3. You do not need rank-2 array for storage; it is essentially the rank-1 array.
  4. How to use the same data using the interface of rank-1 and rank-2 (rank-3, or whatever else) array at the same time. See GetStorageIndex.
  5. How to develop a class supporting more than one indexed property this with different signature. The only way to do this is implementing interface with the indexed property this.
  6. How to limit visibility of the storage layer.
  7. How to use identical storage schema for different dimensions of the neural network. See NeuralNetworkMetadata used to support, store and load dimensions.
 
I hope your disk storage is binary? It should be, for the sake of performance. Also, the stream should be buffered.
 
using System.IO;
using StreamPosition = System.Int64;
using StorageIndex = System.UInt64;
using NeuronIndex = System.UInt32;
using NeuronLayerIndex = System.UInt32;
 
class NeuronStorage {
    internal static Neuron Load(Stream stream) { return null; }
    internal void Store(Stream stream) { /*...*/ }
    //...
} //class NeuronStorage

class Neuron : NeuronStorage { /*...*/ }
 
interface INeuralNetworkStorage {
    Neuron this[StorageIndex index] { get; set; }
} //interface INeuralNetworkStorage

interface INeuralNetwork : INeuralNetworkStorage {
    Neuron this[NeuronLayerIndex layer, NeuronIndex index] { get; set; }
} //interface INeuralNetwork

class NeuralNetworkMetadata {
    internal NeuronIndex NeuronsPerLayer { get { /*...*/ } }
    internal NeuronLayerIndex LayerCount { get { /*...*/ } }
    //...
    internal static NeuralNetworkMetadata Load(Stream stream) { /*...*/ }
    internal void Store(Stream stream) { /*...*/ }
    //...
} //class NeuralNetworkMetadata

class NeuralNetworkStorage : INeuralNetworkStorage {
    public NeuralNetworkStorage(string fileName) { /*...*/ }
    public Neuron this[StorageIndex index] {
        get {
            Stream.Seek(GetNeuronStreamPosition(index), SeekOrigin.Begin);
            return Neuron.Load(Stream);
        }
        set {
            Stream.Seek(GetNeuronStreamPosition(index), SeekOrigin.Begin);
            value.Store(Stream);
        }
    } //this
    //...
    internal protected NeuralNetworkMetadata Metadata;
    StreamPosition GetNeuronStreamPosition(StorageIndex index) { /*...*/ }
    Stream Stream;
} //class NeuralNetworkStorage

class NeuralNetwork : NeuralNetworkStorage, INeuralNetwork {
    public NeuralNetwork(string fileName) : base(fileName) {  /*...*/ }
    public NeuralNetwork(string fileName, NeuralNetworkMetadata metadata) : base(fileName) { /*store meta-data*/ }
    public Neuron this[NeuronIndex index, NeuronLayerIndex layer] {
        get { return this[GetStorageIndex(index, layer)]; }
        set { this[GetStorageIndex(index, layer)] = value; }
    } //this
    StorageIndex GetStorageIndex(NeuronIndex neutron, NeuronLayerIndex layer) {
        return layer * Metadata.NeuronsPerLayer + neutron;
    } //GetStorageIndex
    //...
} //class NeuralNetwork
 
The separate section of storage should be designated for relationships between neutron. The technique may depend on several factors, first of an how sparse is the Cartesian Square, how big is the total number of associations, how the should be used.
 
I hope all the techniques and ideas I provide can be useful.
 
—SA
  Permalink  
v4
Comments
Thomas.D Williams at 9-May-11 13:19pm
   
Thank you for your alternative solution. There are a lot of new concepts in the above code I am going to have a go at researching. I think they would be good for me to learn about. E.g interface I have never used. At the top you also append = System.Int64; to what you are using. Don't quite know how or what that affects with sed library your using. I'll have a go at implementing something like this above. I'm getting the feeling as I am entirely self taught... I never have used keywords like internal and interface. I'll get back to you if I get stuck or need something clarifying :) Once again, a great thank you for all the effort :)
SAKryukov at 9-May-11 13:40pm
   
System.Int64 is a must because this is a position in the stream. This "using" for is convenient for maintainability; it works almost as "typedef" which is not available in C#. Come on, I'm self taught, too. Just reading. But... don't even play with the idea of using language without reading all about it, every single construct. If will take minimum time but will re-pay a lot. I like re-inventing the wheels, but if your wheels are invented based on lack of fundamental knowledge, chances are, many of your wheels are not as good. In particular, interfaces are absolute must. I just demonstrated the simple technique to use NeuralNetwork[pos], and NeuralNetwork[layer, index] at the same time. This is very important (let's say just convenience, but important one) thing which you cannot do in C# in other way. OK, these techniques are kind of minimal; and you can use some variation. You really need to understand them. If you ask what needs explanation, I will try to help you. Helps, for accepting this answer (and voting). Come back if you need to. Cheers, --SA
Thomas.D Williams at 9-May-11 16:18pm
   
I've only been programming since summer before college(Not college as you know it in the states... its whatever is before your higher ed) :p and i'm going to uni this september. Wanting to learn all the formal techniques(and clarity on others) and holes in my knowledge. :) I suppose I consider myself a practical programmer... I program to solve an intriguing problem and pick up skills on the way (probably not the best way). Any way yeah... Thank you once again for the good answer. I voted it five for a reason :)
SAKryukov at 9-May-11 21:20pm
   
Thanks. Good luck, call back. --SA
yesotaso at 18-May-11 5:24am
   
Nice read :) my 5. But something bugs me, not that I can say I know what the ... is neural network, as if biology involvement not enough why would you drag nuclear chemistry stuff aswell? :P "return layer * Metadata.NeuronsPerLayer + neutron;" It looks you built your own NBC... nvm
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 2

Oh… it can be not so simple. I remember your previous question. Let's see.
 
As I understand, you store the full neuron array on disk. You did not show your search algorithm, so it's hard to say what how good or bad is that. The general idea is to create a data cash which you could load and keep in memory during run time. In this case you can store less information per neuron than the full data on each neuron. The problems is: would you have enough memory to store even the cash data all in memory? If not, it could be two level of cache…
 
The question is: is the persistent representation of each neuron takes exact same space in disk? It yes, you simply calculate a position of each neuron by its index. If not, you cache data should index the array on disk, so each cache element representing a neuron should contain a stream position for each neuron. (Don't this it is slow: a while ago I developed a dictionary system based on this idea: it's as fast or faster then then best commercial systems regardless of the dictionary size which can be orders of magnitude bigger than the available memory).
 
Now the relationship between neuron (how they are connected) is a real mathematical relationship, that is, if you have a set of neurons S, the relationship is by definition is a subset of the Cartesian Square S×S. (If you think it's more complex than that, is simply means that a neurons participate in more then one relationship.) The graph of a relationship introduced in this way is always directed, but the direction can be ignored or not. You need to decide how to store it on disk and in (cache) memory. First of all, it depends on how sparse is this Cartesian Square. Ultimately, you need to develop the way to access all neutron neighbors by a relationship.
 
This is just some basic ideas. They all need some thinking. What do you think?
 
[EDIT]
 
Please my my other solution for detail on how storage layer should be isolated from semantic layer and how neuron and neuron layers can be accessed in transparent way. I also shows that the storage layers should always be organized a rank-1 array-like interface which does not have to be the same interface as the semantic layer.
 
—SA
  Permalink  
v3
Comments
Thomas.D Williams at 6-May-11 14:41pm
   
They all sound good. As you say this is hard to search through a large dictionary which would be hard with my 90,000,000 neurons... Forgive me I don't know much if anything at all about Windows memory management. All I know is i'm allowed up 1.5GB of memory for my application regardless of how much RAM i have. The program already suffers from lack of available memory and is slow to run. Each cycle of the big network takes 3ish mins. Resourse manager says I use 1.2GB Would it be worth in the constructor neuron to force it to be passed its location within the array (consuming an x & a y value worth more memory for each neuron). I have only really started on neural networks but had great success with my early stages :) And with the large networks and how I plan to train them requires they be wrote to a file first. Thank you for you constant help here on the Code Project :)
SAKryukov at 6-May-11 15:11pm
   
If you're pushing your 2G limit of the user memory (and you're pushing it with 1.5G too hard already!) on 32-bit system it is a good reason to move to file-mapped memory or 64-bit system with a lot more memory. Don't come close the this limit. You can implement a (specialized hence simplified) disk-based memory system on the same principles as heap API is organized: pointers, allocations... It needs some serious analysis in terms of performance based on knowledge of typical use cases. You can try to collect this experience later during development. With this goal, you should try to abstract memory/retrieval system from main functionality as clearly as possible; this is because future development (scaling, adding complexity or whatever) may force you to replace/modify the memory/retrieval (internal, hopefully) model -- better be prepared. What do you think. Well, than you for your good words. I just got a notification that someone up-voted my answer to your old question on navigation keys. I took a look; and it reminded me that all your questions are quite adequate and you tend to pay close attention to the answers and understand them well (which is unfortunately pretty unusual :-) Cheers, --SA
Thomas.D Williams at 6-May-11 15:29pm
   
My system is x64... By disk-based if you mean like a text file I have just made it a simple CSV structure. Initial tests seem promising. My test program using my neural network class builds a random network everytime on startup saving to a txt file. Don't know if it was the right choice but it allowed me to set the buffer size down low using StreamWriter. I'm sure there are more elegant solutions. However it will be used in training. Famous Last Programmers Words: Shouldn't need changing in the future, hopefully I plan to attempt to now train a network by attempting breeding. Generating 100 networks. Keeping the top performing. Killing the rest (Deletion) Should all be automated once i'm done. Lets hope this doesn't go Terminator on us :p I'll let you know my progress her at some point Once again thank you... Saved me hours once again :) :) :)
SAKryukov at 6-May-11 16:09pm
   
I was confused by 1.5G. Well, then your memory is limited by its physical volume and swap size. I have a different idea. Why you need a rank-2 array? Is really the neuron relationship top/right/left/button important, used in algorithm? Please, just answer. Even it you say "yes, important" I'll offer a rank-1 model, but I'll just need more words to explain how to present rank-2. One the level of memory structure it's rank-1 array, one the functional level it can be rank-2, rank-3, whatever. The mapping is just few lines and no time. It can look very elegant for you. Probably you already know the techniques. What do you think? --SA
SAKryukov at 6-May-11 16:10pm
   
Yes, I have an idea what neuron networks basically do, not too deep though. --SA
Thomas.D Williams at 6-May-11 16:52pm
   
I need a rank 2 array because the first dimension respresents the layer in the network and the second rank is the Neurons in that layer. Ohhh... if your refrencing my other question about the keys and the imputs that was a different project :p I don't know an awful lot about algorithms related to neural networks. I just researched some neural networks theories before this project. If you don't mind showing your algorithm... It would be a good insight :)
SAKryukov at 8-May-11 20:25pm
   
Sure, please see the updated solution and details in a separate solution. We can discuss it if you need. Good luck. --SA
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 1

You're talking about a MINIMUM file size of 432MB (4 bytes for each float, and 4 bytes for each of the two indexes). That's a huge freakin' file. If it were me, I'd use SQLLight and store the data in records in a database. It affords fast retrieval, and index access to individual neuron records.
  Permalink  
Comments
Thomas.D Williams at 6-May-11 14:21pm
   
That would be a better solution still :) My 5 However, I still would have to get the array position of each neurons linked neuron. I don't do much with database but i'm willing to try :p I accepted defeat a long time ago in this project of having a small save file size Thats not including the multiple networks i'll use to train it On that topic is there anyway I could estimate an object size programmaticly in c#?
SAKryukov at 6-May-11 14:35pm
   
For performance issues and other reasons I would stay with a file. It is not a problem to access data in a random access manner in a file pretty fast; and you don't need many database features like transactions (important). With file, you can avoid mapping from linear table structure to the rank-2 arrays. With pure relationship model you can perfectly model it all, but at a price. Something to think about, anyway. --SA

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
Your Filters
Interested
Ignored
     
0 Sergey Alexandrovich Kryukov 545
1 OriginalGriff 498
2 sanket saxena 330
3 Abhinav S 280
4 thatraja 275
0 Sergey Alexandrovich Kryukov 8,372
1 OriginalGriff 4,830
2 Peter Leow 3,784
3 Maciej Los 3,515
4 Er. Puneet Goel 3,107


Advertise | Privacy | Mobile
Web01 | 2.8.140415.2 | Last Updated 17 May 2011
Copyright © CodeProject, 1999-2014
All Rights Reserved. Terms of Use
Layout: fixed | fluid