Click here to Skip to main content
12,948,334 members (66,855 online)
Rate this:
Please Sign up or sign in to vote.
See more: , +

I'm building a web tool for bioinformatics field. I have to deal with DNA sequence files which will exceed 2 GB for each file and I'll have dozens of text files. I will have a DB that keep these DNAs for me. My question is will it be more efficient to read the txt file and store its content in a table record or store the path of the txt file and read it every time I need to access it ?? or is there another way ?
I'm using SQL server 2008, VS 2010 and c#
Posted 23-Feb-13 7:43am

1 solution

Rate this: bad
Please Sign up or sign in to vote.

Solution 1

Probably storing the path of of the file would be better, but since you haven't explained anything about how you're going to access and use the data, it's pretty hard to tell.

For example, are your clients just going to read the entire file and use every bit of or are you expected to index all of this data and search for random parts of it??

There's a lot more information require than just "what's the best way to store this"...
AseelHadlaq 23-Feb-13 14:55pm
Thanks Dave,

The user will enter a motif and the system will search for that motif in one or more of the text files as the user request. I have a search algorithm that I want to apply.
Pranit Kothari 23-Feb-13 15:11pm
My 5! Dave. It's good advice to keep path than complete data of file.
Dave Kreskowiak 23-Feb-13 19:48pm are you going to search for this "motif", whatever that is?? Since you already have a search algorith you want to use, that is kind of going to dictat how you store this data.
AseelHadlaq 24-Feb-13 13:51pm
a motif is like "ACCCGTA" a part of the DNA that I want to search for its occurrences in the whole DNA file.
I will access a file or many files as many times as the user wants to search for that motif in different DNA files.
The search algorithm does not depend on the file size, but my concern would be is it faster to store the data in a record or just store the path and each time a user request a certain file I go and read it.

Thanks a lot
AseelHadlaq 24-Feb-13 13:53pm
What's the difference? and why did you suggested that ?
Dave Kreskowiak 24-Feb-13 14:57pm
Since you're searching these files, your problem isn't storing the file, but indexing the data. How long do you think it takes to read an entire 2GB file and search of a substring?
Dave Kreskowiak 24-Feb-13 15:44pm
You said you already had an algorithm to search for data in these files. What is it??

How are you going to search an entire 2+ GB file without an index?? It'll take a LONG time to find any one string in a file that big if there is no index at all.
AseelHadlaq 25-Feb-13 13:05pm
I'm going to use a trie structure for the searching algorithm,, I was asking from the DB design angle what's more efficient
Dave Kreskowiak 25-Feb-13 15:26pm
Since this is DNA data adn you're thinking of using Trie, you really have no choice but to store all the data in the database anyway, since the data is going to be the key/path anyway.

Have you read

Your problems are much larger than you seem to know about.
AseelHadlaq 1-Mar-13 4:42am
Thnx for the link :)

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

    Print Answers RSS
Top Experts
Last 24hrsThis month
OriginalGriff 5,344
CHill60 3,275
Maciej Los 2,703
Jochen Arndt 1,935
ppolymorphe 1,795

Advertise | Privacy | Mobile
Web02 | 2.8.170524.1 | Last Updated 23 Feb 2013
Copyright © CodeProject, 1999-2017
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100