Click here to Skip to main content
16,021,294 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I am writing a program in C# that will extract five files from a .DAT file. I have the original file along with their extensions which are a .pdf, .tif, .txt, and two .docx files saved on my computer for reference.

My question is, how am I supposed to know when the a specific file like say .pdf beings and ends and another begins? I have looked at the hexadecimal version of the .DAT file and was thinking about spotting a certain piece of it to reference and know hey that's a .pdf file and this is where it ends. I don't know if this is correct or not which is why I need some help figuring out when the files end and start. Do I have to go in there by hand and count the hexadecimal digits and see what they are actually in terms of different files? That would seem like a hassle but if I have to I will do it.

If you have any ideas about how to do this, please elaborate!

What I have tried:

I so far have written code that will turn the file data from bytes to hex and vice versa and does work as I have verified this with the .tif, and .pdf files.
Posted
Updated 15-Dec-22 12:10pm
v2
Comments
PIEBALDconsult 15-Dec-22 15:35pm    
Ummm what? I don't understand the question.
A file begins where it begins and ends when it ends, I don't.

Or are you looking at a file which contains two or more other files? That would depend on the definition of that file.
There is no standard for a "DAT" file.
If it's a TAR file, you should look at the spec for TAR files.
Steven Villarreal 15-Dec-22 15:51pm    
Sorry for no clarification but essentially the .DAT file contains more than one file. It contains 5 of them but I just ended up looking at the hex and converting that to bytes then writing the bytes to files depending on what the file was.
PIEBALDconsult 15-Dec-22 15:57pm    
There must be some form of index in there.
PIEBALDconsult 15-Dec-22 15:57pm    
See if 7ZIP can tell you anything about it.
[no name] 15-Dec-22 15:43pm    
If the .dat file does not contain an index with the details then you will have to work it out for yourself. It all depends what software was used to create the file.

1 solution

If your code is creating this .DAT file, you can NOT just copy the bytes from your source files and append them to the .DAT file. You need to keep track of how many bytes are written to the .DAT file for each "file" you add to it. You're tracking stuff like the offsets into the .DAT file for the start and end of each of your contained files, and probably other metadata, like the original filename, size in bytes, and whatever else you want to track.

Once you are done adding files and you've built this index, you have to write that index to your .DAT file.

Oh, and if you want to read the files back, your code is going to have to be able to read the index to tell it what the .DAT file contains and where each file starts and stops in the .DAT file.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900