Click here to Skip to main content
Rate this: bad
good
Please Sign up or sign in to vote.
See more: VB.NET
Hi I want to read a bytes from large file >2GB
When i do that OutofMemory exception is thrown, cause i read the whole file to a memory, all I know is that I can chunk the file into small pieces...
So what is the best code to do that?
 
Reason for reading the file, is to find some bytes that stored in the file.
 
Any suggestion will be really appreciated.
Posted 10-Feb-13 12:09pm
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 1

Have a look at:
FileStream.Read[^]
FileStream.Seek[^]
 
That pretty much covers what you need to know.
 
[Update]
Your implementation should look a bit like this:
const int megabyte = 1024 * 1024 * 1024;
 
public void ReadAndProcessLargeFile(string theFilename, long whereToStartReading = 0)
{
    FileStream fileStram = new FileStream(theFilename,FileMode.Open,FileAccess.Read);
    using (fileStram)
    {
        byte[] buffer = new byte[megabyte];
        fileStram.Seek(whereToStartReading, SeekOrigin.Begin);
        int bytesRead = fileStram.Read(buffer, 0, megabyte);
        while(bytesRead > 0)
        {
            ProcessChunk(buffer, bytesRead);
            bytesRead = fileStram.Read(buffer, 0, megabyte);
        }
 
    }
}
 
private void ProcessChunk(byte[] buffer, int bytesRead)
{
    // Do the processing here
}
 
Best regards
Espen Harlinn
  Permalink  
v3
Comments
Sergey Alexandrovich Kryukov at 10-Feb-13 18:14pm
   
Basically, this is all one needs, my 5, but proper design of well optimized and encapsulated code needs considerable experience or brain work.
I provided some basic directions, please see.
—SA
Espen Harlinn at 11-Feb-13 4:25am
   
Thank you, Sergey :-D
Leecherman at 11-Feb-13 0:33am
   
Thank you two for your replies, the first solution I know it and it didn't helped me, it works with small size of files but not the big one, the second one is not what I want, all what I want is to read a bytes from a large file.
Any posted code will be really appreciated...
thanks
Espen Harlinn at 11-Feb-13 4:25am
   
The Read method works well enough for me, even when the file is > 2GB.
You just read a chunk into memory, process it, and reuse the byte array for the next chunk.
Leecherman at 12-Feb-13 20:50pm
   
thanks again for your replies, please can you post the code?
tried to chunk it but with no luck at all...
Espen Harlinn at 13-Feb-13 5:57am
   
Ok, have another look at the solution
Leecherman at 13-Feb-13 7:43am
   
Thank you very much 5 stars, Really appreciate it.
OK this process 1gb chunk in a loop, and it doesn't exit 'while loop' until I write 'Exit While' if I found some bytes, also if i doesn't use ProcessChunk sub it loop without end, so any workaround for this if possible?
Espen Harlinn at 13-Feb-13 7:52am
   
The update was quick & dirty - turns out it was a bit too dirty, which I've corrected ...
Leecherman at 13-Feb-13 18:33pm
   
Thanks a lot again, this solved my question :)
Espen Harlinn at 13-Feb-13 18:36pm
   
Brilliant - seems I was a bit tired this morning ...
Leecherman at 13-Feb-13 22:47pm
   
Oh I meant it solved my question not my answer! ( will edit it )
Also thank you again for the useful answer :)
Marcus Kramer at 13-Feb-13 9:36am
   
+5.
Espen Harlinn at 13-Feb-13 9:42am
   
Thank you, Marcus :-D
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 2

In addition to the correct answer by Espen Harlinn:
 
Breaking a file into chunks will hardly help you, unless those chunks are of different natures (different formats, representing different data structures), so they were put in one file without proper justification.
 
In other cases, it's good to use the big file and keep it open. There are cases when you need to split the file in two pieces. This is just the basic idea; see below.
 
So, I would assume that the file is big just because it represent a collection of object of the same type or few different types. If all the items are of the same size (in file storage units), addressing then is trivial: you simply need to multiply the size by required index of the item, to get a position parameter for Stream.Seek. So, the only non-trivial case is when you have a collection of items of different size. If this is the case, you should index the file and build the index table. The index table will consist of the units of the same size, which is typically the list/array of file positions per index. Due to this fact, addressing to the index table can be done by index (shift), as described above, and then you read position of the "big" file, move file position there and read data.
 
You will have 2 options: 1) keep index table in memory; you can recalculate it each time; but it's better to do it once (cache) and to keep it in some file, the same or a separate one; 2) to have it in a file and read this file at required position. This way, you will have to seek the position in the file(s) in two steps. In principle, this method will allow you to access files of any size (limited only by System.Uint64.MaximumValue).
 
After you position in a stream of a "big" file, you can read a single item. You can use serialization for this purpose. Please see:
http://en.wikipedia.org/wiki/Serialization#.NET_Framework[^],
http://msdn.microsoft.com/en-us/library/vstudio/ms233843.aspx[^],
http://msdn.microsoft.com/en-us/library/system.runtime.serialization.formatters.binary.binaryformatter.aspx[^].
 
A fancy way of implementing all the solutions with index table would be encapsulating it all in the class with indexed property.
 
—SA
  Permalink  
Comments
Espen Harlinn at 11-Feb-13 4:26am
   
Good points :-D
Sergey Alexandrovich Kryukov at 11-Feb-13 4:41am
   
Thank you, Espen.
—SA

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
0 DamithSL 370
1 Maciej Los 217
2 OriginalGriff 213
3 BillWoodruff 135
4 Zoltán Zörgő 85
0 OriginalGriff 7,953
1 DamithSL 6,139
2 Sergey Alexandrovich Kryukov 5,454
3 Maciej Los 5,293
4 Kornfeld Eliyahu Peter 4,539


Advertise | Privacy | Mobile
Web04 | 2.8.141223.1 | Last Updated 13 Feb 2013
Copyright © CodeProject, 1999-2014
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100