Click here to Skip to main content
15,887,386 members
Please Sign up or sign in to vote.
4.20/5 (2 votes)
See more:
Hello All,

I have some data files (~2Mb size) which contain data encoded with mixed little- and big-endian byte-ordering. Data-Chunks are delimited by Strings (UTF8) which act as labels for data fields. Since .NET's binary reader doesn't support such mixed data streams I have tried to implement a custom reader. To find the offset to a particular data-field I have tried the some thing like the following:

C#
byte[] byteBuffer = File.ReadAllBytes("SomeFilePath");
string byteBufferAsString = System.Text.Encoding.UTF8.GetString(byteBuffer);
Int32 offset1 = byteBufferAsString.IndexOf("StringToFind");


However this seems to have variable results. Sometimes the offset value point exactly to the start-position of the StringToFind text in the buffer and other times it will point two bytes in front of the actual start position i.e. pointing to a Int16 which indicates the byte-length of string immediately following.

Has anyone had similar experience? Otherwise does anyone have any advice for working with binary-files and searching for string positions?

cheers
Posted
Comments
Sergey Alexandrovich Kryukov 18-Oct-12 16:16pm    
Little and big-endian mixed? And it means it's not UTF-8 (no endianess), which is also mixed-in... What a mess! Why developing all that at all?
Anyway, no matter what you do, you should have pre-existing knowledge on the boundaries between encodings, or do some guess trial-and-error programmatically. Nothing can clearly safe stupid design.
--SA

I think this step
Quote:
string byteBufferAsString = System.Text.Encoding.UTF8.GetString(byteBuffer)
is weak.

You should instead do the opposite: get the array of bytes representing the search string and search it inside the data buffer.
 
Share this answer
 
Comments
Espen Harlinn 18-Oct-12 18:42pm    
Good point :-D
You need to search the UTF-8 string as binary. Something like this (not tested):

C#
byte[] ByteBuffer = File.ReadAllBytes("SomeFilePath");
byte[] StringBytes = Encoding.UTF8.GetBytes("StringToFind");
for (i = 0; i <= (ByteBuffer.Length - StringBytes.Length); i++)
{
    if (ByteBuffer[i] == StringBytes[0])
    {
        for (j = 1; j < StringBytes.Length && ByteBuffer[i + j] == StringBytes[j]; j++) ;
        if (j == StringBytes.Length)
            Console.WriteLine("String was found at offset {0}", i);
    }
}


Please note that this is a case-sensitive search!
 
Share this answer
 
Well, I don't have such experience, just because I thoroughly avoid dealing with extreme stupidities, so my only advice would be: give up; through out all software dealing with the data structured in this weird way and make sure you prevent such fallacies in future; write brand new software, which will use some reasonable persistence; and save huge amount of time and nerve. If this advice seems to be not suitable for you, you are very welcome to ram the "problem" on your own.

I seriously doubt you can get better advice from anyone who "had similar experience". Some experiences are not really helpful.

—SA
 
Share this answer
 
v2

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900