Click here to Skip to main content
Rate this: bad
good
Please Sign up or sign in to vote.
Hello,
 
I'm building a web tool for bioinformatics field. I have to deal with DNA sequence files which will exceed 2 GB for each file and I'll have dozens of text files. I will have a DB that keep these DNAs for me. My question is will it be more efficient to read the txt file and store its content in a table record or store the path of the txt file and read it every time I need to access it ?? or is there another way ?
I'm using SQL server 2008, VS 2010 asp.net and c#
Posted 23-Feb-13 7:43am

1 solution

Rate this: bad
good
Please Sign up or sign in to vote.

Solution 1

Probably storing the path of of the file would be better, but since you haven't explained anything about how you're going to access and use the data, it's pretty hard to tell.
 
For example, are your clients just going to read the entire file and use every bit of or are you expected to index all of this data and search for random parts of it??
 
There's a lot more information require than just "what's the best way to store this"...
  Permalink  
Comments
AseelHadlaq at 23-Feb-13 14:55pm
   
Thanks Dave,
 
The user will enter a motif and the system will search for that motif in one or more of the text files as the user request. I have a search algorithm that I want to apply.
Dave Kreskowiak at 23-Feb-13 19:48pm
   
Soooo...how are you going to search for this "motif", whatever that is?? Since you already have a search algorith you want to use, that is kind of going to dictat how you store this data.
AseelHadlaq at 24-Feb-13 13:51pm
   
a motif is like "ACCCGTA" a part of the DNA that I want to search for its occurrences in the whole DNA file.
I will access a file or many files as many times as the user wants to search for that motif in different DNA files.
The search algorithm does not depend on the file size, but my concern would be is it faster to store the data in a record or just store the path and each time a user request a certain file I go and read it.
 
Thanks a lot
Dave Kreskowiak at 24-Feb-13 14:57pm
   
Since you're searching these files, your problem isn't storing the file, but indexing the data. How long do you think it takes to read an entire 2GB file and search of a substring?
Pranit Kothari at 23-Feb-13 15:11pm
   
My 5! Dave. It's good advice to keep path than complete data of file.
AseelHadlaq at 24-Feb-13 13:53pm
   
What's the difference? and why did you suggested that ?
Dave Kreskowiak at 24-Feb-13 15:44pm
   
You said you already had an algorithm to search for data in these files. What is it??
 
How are you going to search an entire 2+ GB file without an index?? It'll take a LONG time to find any one string in a file that big if there is no index at all.
AseelHadlaq at 25-Feb-13 13:05pm
   
I'm going to use a trie structure for the searching algorithm,, I was asking from the DB design angle what's more efficient
Dave Kreskowiak at 25-Feb-13 15:26pm
   
Since this is DNA data adn you're thinking of using Trie, you really have no choice but to store all the data in the database anyway, since the data is going to be the key/path anyway.
 
Have you read http://books.google.com/books?id=aioKEPWSyMoC&pg=PA41&lpg=PA41&dq=sql+store+dna+data&source=bl&ots=WKXWUKWkr1&sig=UQheQ77Hxs23n4s2SEd_q9BIpK4&hl=en&sa=X&ei=W8YrUbF4htzJAaitgPAG&ved=0CG8Q6AEwCA#v=onepage&q=sql%20store%20dna%20data&f=false
 
Your problems are much larger than you seem to know about.
AseelHadlaq at 1-Mar-13 4:42am
   
Thnx for the link :)

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
0 Sergey Alexandrovich Kryukov 355
1 _Amy 235
2 Peter Leow 185
3 Andreas Gieriet 180
4 Dave Kreskowiak 155
0 OriginalGriff 7,540
1 Sergey Alexandrovich Kryukov 6,462
2 Maciej Los 3,849
3 Peter Leow 3,653
4 CHill60 2,712


Advertise | Privacy | Mobile
Web01 | 2.8.140721.1 | Last Updated 23 Feb 2013
Copyright © CodeProject, 1999-2014
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100