Click here to Skip to main content
16,020,261 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hi,
I have an application where I need to read records from database and create a graph of users from them.
Each node will have some attributes and will be connected to some other nodes with some edge weights which can change depending upon the type of analysis.The number of nodes is extremely large (i.e. more than 10 million). I tried using hashes to represent these graphs once created using Storable but when I try to load them, the system goes out of memory.
I wanted to know what is the best possible way of representing graphs in such applications. Should they be represented as disk files i.e. each node stored as a separate file in which case I would need a good distributing algorithm for putting them into a hierarchy in filesystem. Or they can be represented on single files without killing too much memory.
Please enlighten. Thanks
Posted

I think question is not about how to store the data - single file vs a number of files. But the question is about how to load the data and how much? I am sure all 10 million nodes won't be useful at a time, so you need to somehow determine the nodes which are currently useful. Then develop a system to dynamically load only necessary nodes. You may also have to develop a system to predict which data might be used next and may cache it in advance.

-Saurabh
 
Share this answer
 
Comments
Nemo145 25-Jul-10 7:12am    
thank you for your answer!
I was part of a WPF project a couple of years ago, and we implemented an ad-hoc filter the user could use use to target more specific data. That made the queries take less time, and the returned data was more manageable.

Part of that filter was a setting that dictated how many records to return. If the resulting filter generated in more than the specified number of results, a message would pop-up informing them that their query generated more than the number of desired results (and would tell them how many results were found), and would only show them the number they specified. This would allow them to better specify a more targeted filter.

For instance, if they specified one search criteria, such as last name of "Smith", and set the number of results to 5, the database would invariable find many more than 5 "Smith" records, but they would see five records appear in the tree control. The filter was still on the screen, so they could add another search criteria that would better filter the returned results.
 
Share this answer
 
Comments
Nemo145 25-Jul-10 7:12am    
thank you for your answer!

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900