Location of a refernced object in an array?

Question

4.43/5 (6 votes)

See more:

Hey there all

Background:::::
I am working with a massive (2-dimensional) array of a custom neuron type for a Neural Network that seems to work well. When working with the smaller Neuron array sizes e.g.
Neuron[,] NodeList = new Neuron[800,600];
it will serialize fine.

However it gets stuck in some object tree flattening loop(According to some forum in my history) here and runs out of memory
Neuron[,] NodeList = new Neuron[10000,9000]
After a little research on the Google I found that I will have to use another serializer or implement my own way of saving the data.
I opted for a simple data writer as it would be easier

Problem(Skip here if you want :p):::
For my custom data writer i'm writing all the data needed to a file.
In my Neuron type it stores a reference to another Neuron.
All Neurons stored are all in the NodeList array as mentioned (including linked ones)

C#

class Neuron
{
    //Some Old Neuron Info
    float IThreshold = 1;
    float IFiringValue = 1;

    //Link
    float IWeight = 1;
    Neuron NextNeuron;   //This Line HERE
    ...
}

For my data writer I am going through each neuron in the array and storing its info... like location in the array, weighting etc
I also want to store the location where its linked neuron is in the array

I would like to do something like

C#

int NewX = NodeList[x,y].NextNeuron.ParentArrayIndex(0);
int NewX = NodeList[x,y].NextNeuron.ParentArrayIndex(1);

Where the 0 & 1 merely represent the array dimension
Is any of this possible or have a different way of getting it without searching through and checking they equal?

Any help appreciated,... and thank you :)

Posted 6-May-11 7:50am

Thomas.D Williams

Updated 6-May-11 8:11am

#realJSOP

v3

Add a Solution

Comments

#realJSOP 6-May-11 14:11pm

I changed your tags. If you're using C#4, none of the rest of them really added anything.

Sergey Alexandrovich Kryukov 6-May-11 14:31pm

This problem is interesting enough. My 5.
--SA

yesotaso 6-May-11 14:36pm

How many Neurons are we talking about? 10000*9000? If so adding even 1 int to a neuron would increase total size 360000000 bytes... damn I lost count of zeros there.

Thomas.D Williams 6-May-11 14:47pm

Yep... With successfull early networks like 1000 it was fine.
I then had an idea of how I could train it to be more intelligent... but required a lot more neurons
so 90,000,000 neurons as a result :)

yesotaso 6-May-11 15:12pm

Well it doesnt seem very hopeful situation, but if you are going to use disk minimize whatever you can. On that scale even 1 byte makes difference, most probably such hindrance will deviate your focus from main obejctive... Anyway all I can add is keep 4KB sector size in mind (it may vary due to system settings but by default it is 4) even if you read 1 byte that whole sector is read from disk to memory.

Thomas.D Williams 6-May-11 15:06pm

To all those interested a 90,000,000 neuron network saved to a CSV file of 348mb... I think it must have stopped writing to the file in the last few repetitions/notepad++ doesn't work as it only shows lines up to 87,043,237. 3 millish less than expected :)

yesotaso 6-May-11 15:16pm

Text output may be deceptive. As 1 digit integer holds 1 byte in text 9 digit holds 9 byte.

Thomas.D Williams 6-May-11 15:31pm

I'm not to familiar with other file types or writing my own. This part of the system will only be needed temporarly any way. Just for training :)

Thomas.D Williams 6-May-11 15:33pm

I estimate that the training of the Neural Network will use 100GB hard drive space and take 20hours to complete :) A lot less than I anticipated :)

3 solutions

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

#realJSOP · Accepted Answer · 2011-05-06T08:10:00

Solution 1

You're talking about a MINIMUM file size of 432MB (4 bytes for each float, and 4 bytes for each of the two indexes). That's a huge freakin' file. If it were me, I'd use SQLLight and store the data in records in a database. It affords fast retrieval, and index access to individual neuron records.

Posted 6-May-11 8:10am

#realJSOP

Comments

Thomas.D Williams 6-May-11 14:21pm

That would be a better solution still :) My 5
However, I still would have to get the array position of each neurons linked neuron.
I don't do much with database but i'm willing to try :p
I accepted defeat a long time ago in this project of having a small save file size
Thats not including the multiple networks i'll use to train it

On that topic is there anyway I could estimate an object size programmaticly in c#?

Sergey Alexandrovich Kryukov 6-May-11 14:35pm

For performance issues and other reasons I would stay with a file. It is not a problem to access data in a random access manner in a file pretty fast; and you don't need many database features like transactions (important). With file, you can avoid mapping from linear table structure to the rank-2 arrays. With pure relationship model you can perfectly model it all, but at a price. Something to think about, anyway.
--SA

Sergey Alexandrovich Kryukov · Accepted Answer · 2011-05-06T08:29:00

Solution 2

Oh… it can be not so simple. I remember your previous question. Let's see.

As I understand, you store the full neuron array on disk. You did not show your search algorithm, so it's hard to say what how good or bad is that. The general idea is to create a data cash which you could load and keep in memory during run time. In this case you can store less information per neuron than the full data on each neuron. The problems is: would you have enough memory to store even the cash data all in memory? If not, it could be two level of cache…

The question is: is the persistent representation of each neuron takes exact same space in disk? It yes, you simply calculate a position of each neuron by its index. If not, you cache data should index the array on disk, so each cache element representing a neuron should contain a stream position for each neuron. (Don't this it is slow: a while ago I developed a dictionary system based on this idea: it's as fast or faster then then best commercial systems regardless of the dictionary size which can be orders of magnitude bigger than the available memory).

Now the relationship between neuron (how they are connected) is a real mathematical relationship, that is, if you have a set of neurons S, the relationship is by definition is a subset of the Cartesian Square S×S. (If you think it's more complex than that, is simply means that a neurons participate in more then one relationship.) The graph of a relationship introduced in this way is always directed, but the direction can be ignored or not. You need to decide how to store it on disk and in (cache) memory. First of all, it depends on how sparse is this Cartesian Square. Ultimately, you need to develop the way to access all neutron neighbors by a relationship.

This is just some basic ideas. They all need some thinking. What do you think?

[EDIT]

Please my my other solution for detail on how storage layer should be isolated from semantic layer and how neuron and neuron layers can be accessed in transparent way. I also shows that the storage layers should always be organized a rank-1 array-like interface which does not have to be the same interface as the semantic layer.

—SA

Posted 6-May-11 8:29am

Sergey Alexandrovich Kryukov

Updated 8-May-11 17:27pm

v3

Comments

Thomas.D Williams 6-May-11 14:41pm

They all sound good. As you say this is hard to search through a large dictionary which would be hard with my 90,000,000 neurons... Forgive me I don't know much if anything at all about Windows memory management. All I know is i'm allowed up 1.5GB of memory for my application regardless of how much RAM i have.
The program already suffers from lack of available memory and is slow to run. Each cycle of the big network takes 3ish mins.
Resourse manager says I use 1.2GB
Would it be worth in the constructor neuron to force it to be passed its location within the array (consuming an x & a y value worth more memory for each neuron).
I have only really started on neural networks but had great success with my early stages :) And with the large networks and how I plan to train them requires they be wrote to a file first.
Thank you for you constant help here on the Code Project :)

Sergey Alexandrovich Kryukov 6-May-11 15:11pm

If you're pushing your 2G limit of the user memory (and you're pushing it with 1.5G too hard already!) on 32-bit system it is a good reason to move to file-mapped memory or 64-bit system with a lot more memory. Don't come close the this limit.

You can implement a (specialized hence simplified) disk-based memory system on the same principles as heap API is organized: pointers, allocations... It needs some serious analysis in terms of performance based on knowledge of typical use cases. You can try to collect this experience later during development. With this goal, you should try to abstract memory/retrieval system from main functionality as clearly as possible; this is because future development (scaling, adding complexity or whatever) may force you to replace/modify the memory/retrieval (internal, hopefully) model -- better be prepared.
What do you think.

Well, than you for your good words. I just got a notification that someone up-voted my answer to your old question on navigation keys. I took a look; and it reminded me that all your questions are quite adequate and you tend to pay close attention to the answers and understand them well (which is unfortunately pretty unusual :-)

Cheers,
--SA

Thomas.D Williams 6-May-11 15:29pm

My system is x64... By disk-based if you mean like a text file I have just made it a simple CSV structure. Initial tests seem promising. My test program using my neural network class builds a random network everytime on startup saving to a txt file.
Don't know if it was the right choice but it allowed me to set the buffer size down low using StreamWriter. I'm sure there are more elegant solutions. However it will be used in training.

Famous Last Programmers Words: Shouldn't need changing in the future, hopefully

I plan to attempt to now train a network by attempting breeding. Generating 100 networks. Keeping the top performing. Killing the rest (Deletion)
Should all be automated once i'm done.
Lets hope this doesn't go Terminator on us :p

I'll let you know my progress her at some point
Once again thank you... Saved me hours once again :) :) :)

Sergey Alexandrovich Kryukov 6-May-11 16:09pm

I was confused by 1.5G. Well, then your memory is limited by its physical volume and swap size.
I have a different idea. Why you need a rank-2 array? Is really the neuron relationship top/right/left/button important, used in algorithm? Please, just answer. Even it you say "yes, important" I'll offer a rank-1 model, but I'll just need more words to explain how to present rank-2. One the level of memory structure it's rank-1 array, one the functional level it can be rank-2, rank-3, whatever. The mapping is just few lines and no time. It can look very elegant for you. Probably you already know the techniques.

What do you think?
--SA

Sergey Alexandrovich Kryukov 6-May-11 16:10pm

Yes, I have an idea what neuron networks basically do, not too deep though.
--SA

Thomas.D Williams 6-May-11 16:52pm

I need a rank 2 array because the first dimension respresents the layer in the network and the second rank is the Neurons in that layer. Ohhh... if your refrencing my other question about the keys and the imputs that was a different project :p I don't know an awful lot about algorithms related to neural networks. I just researched some neural networks theories before this project. If you don't mind showing your algorithm... It would be a good insight :)

Sergey Alexandrovich Kryukov 8-May-11 20:25pm

Sure, please see the updated solution and details in a separate solution. We can discuss it if you need.
Good luck.
--SA

Sergey Alexandrovich Kryukov · Accepted Answer · 2011-05-08T14:15:00

As I promised, I'm posting my sketch following the discussion of storage and access to the neuron data.

This is just cultural techniques which you may or may not be well familiar with.
These techniques illustrate the following:

Separation of storage layer from semantic layer.
You don't need to work with arrays, instead, you need array-like interface to the network with the use of indexed property this.
You do not need rank-2 array for storage; it is essentially the rank-1 array.
How to use the same data using the interface of rank-1 and rank-2 (rank-3, or whatever else) array at the same time. See GetStorageIndex.
How to develop a class supporting more than one indexed property this with different signature. The only way to do this is implementing interface with the indexed property this.
How to limit visibility of the storage layer.
How to use identical storage schema for different dimensions of the neural network. See NeuralNetworkMetadata used to support, store and load dimensions.

I hope your disk storage is binary? It should be, for the sake of performance. Also, the stream should be buffered.

C#

using System.IO;
using StreamPosition = System.Int64;
using StorageIndex = System.UInt64;
using NeuronIndex = System.UInt32;
using NeuronLayerIndex = System.UInt32;

class NeuronStorage {
    internal static Neuron Load(Stream stream) { return null; }
    internal void Store(Stream stream) { /*...*/ }
    //...
} //class NeuronStorage

class Neuron : NeuronStorage { /*...*/ }

interface INeuralNetworkStorage {
    Neuron this[StorageIndex index] { get; set; }
} //interface INeuralNetworkStorage

interface INeuralNetwork : INeuralNetworkStorage {
    Neuron this[NeuronLayerIndex layer, NeuronIndex index] { get; set; }
} //interface INeuralNetwork

class NeuralNetworkMetadata {
    internal NeuronIndex NeuronsPerLayer { get { /*...*/ } }
    internal NeuronLayerIndex LayerCount { get { /*...*/ } }
    //...
    internal static NeuralNetworkMetadata Load(Stream stream) { /*...*/ }
    internal void Store(Stream stream) { /*...*/ }
    //...
} //class NeuralNetworkMetadata

class NeuralNetworkStorage : INeuralNetworkStorage {
    public NeuralNetworkStorage(string fileName) { /*...*/ }
    public Neuron this[StorageIndex index] {
        get {
            Stream.Seek(GetNeuronStreamPosition(index), SeekOrigin.Begin);
            return Neuron.Load(Stream);
        }
        set {
            Stream.Seek(GetNeuronStreamPosition(index), SeekOrigin.Begin);
            value.Store(Stream);
        }
    } //this
    //...
    internal protected NeuralNetworkMetadata Metadata;
    StreamPosition GetNeuronStreamPosition(StorageIndex index) { /*...*/ }
    Stream Stream;
} //class NeuralNetworkStorage

class NeuralNetwork : NeuralNetworkStorage, INeuralNetwork {
    public NeuralNetwork(string fileName) : base(fileName) {  /*...*/ }
    public NeuralNetwork(string fileName, NeuralNetworkMetadata metadata) : base(fileName) { /*store meta-data*/ }
    public Neuron this[NeuronIndex index, NeuronLayerIndex layer] {
        get { return this[GetStorageIndex(index, layer)]; }
        set { this[GetStorageIndex(index, layer)] = value; }
    } //this
    StorageIndex GetStorageIndex(NeuronIndex neutron, NeuronLayerIndex layer) {
        return layer * Metadata.NeuronsPerLayer + neutron;
    } //GetStorageIndex
    //...
} //class NeuralNetwork

The separate section of storage should be designated for relationships between neutron. The technique may depend on several factors, first of an how sparse is the Cartesian Square, how big is the total number of associations, how the should be used.

I hope all the techniques and ideas I provide can be useful.

—SA

Location of a refernced object in an array?

3 solutions

Solution 1

Solution 2

Solution 3

Add your solution here

Preview 0

Existing Members

...or Join us