C# binary files finding strings

Question

4.20/5 (2 votes)

See more:

Hello All,

I have some data files (~2Mb size) which contain data encoded with mixed little- and big-endian byte-ordering. Data-Chunks are delimited by Strings (UTF8) which act as labels for data fields. Since .NET's binary reader doesn't support such mixed data streams I have tried to implement a custom reader. To find the offset to a particular data-field I have tried the some thing like the following:

C#

byte[] byteBuffer = File.ReadAllBytes("SomeFilePath");
string byteBufferAsString = System.Text.Encoding.UTF8.GetString(byteBuffer);
Int32 offset1 = byteBufferAsString.IndexOf("StringToFind");

However this seems to have variable results. Sometimes the offset value point exactly to the start-position of the StringToFind text in the buffer and other times it will point two bytes in front of the actual start position i.e. pointing to a Int16 which indicates the byte-length of string immediately following.

Has anyone had similar experience? Otherwise does anyone have any advice for working with binary-files and searching for string positions?

cheers

Posted 18-Oct-12 10:11am

Ylno

Add a Solution

Comments

Sergey Alexandrovich Kryukov 18-Oct-12 16:16pm

Little and big-endian mixed? And it means it's not UTF-8 (no endianess), which is also mixed-in... What a mess! Why developing all that at all?
Anyway, no matter what you do, you should have pre-existing knowledge on the boundaries between encodings, or do some guess trial-and-error programmatically. Nothing can clearly safe stupid design.
--SA

3 solutions

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

CPallini · Answer 1 · 2012-10-18T10:27:00

Solution 2

I think this step

Quote:
string byteBufferAsString = System.Text.Encoding.UTF8.GetString(byteBuffer)

is weak.

You should instead do the opposite: get the array of bytes representing the search string and search it inside the data buffer.

Posted 18-Oct-12 10:27am

CPallini

Comments

Espen Harlinn 18-Oct-12 18:42pm

Good point :-D

Rey Terativo · Answer 2 · 2012-10-18T19:57:00

You need to search the UTF-8 string as binary. Something like this (not tested):

C#

byte[] ByteBuffer = File.ReadAllBytes("SomeFilePath");
byte[] StringBytes = Encoding.UTF8.GetBytes("StringToFind");
for (i = 0; i <= (ByteBuffer.Length - StringBytes.Length); i++)
{
    if (ByteBuffer[i] == StringBytes[0])
    {
        for (j = 1; j < StringBytes.Length && ByteBuffer[i + j] == StringBytes[j]; j++) ;
        if (j == StringBytes.Length)
            Console.WriteLine("String was found at offset {0}", i);
    }
}

Please note that this is a case-sensitive search!

Sergey Alexandrovich Kryukov · Answer 3 · 2012-10-18T10:21:00

Well, I don't have such experience, just because I thoroughly avoid dealing with extreme stupidities, so my only advice would be: give up; through out all software dealing with the data structured in this weird way and make sure you prevent such fallacies in future; write brand new software, which will use some reasonable persistence; and save huge amount of time and nerve. If this advice seems to be not suitable for you, you are very welcome to ram the "problem" on your own.

I seriously doubt you can get better advice from anyone who "had similar experience". Some experiences are not really helpful.

—SA

C# binary files finding strings

3 solutions

Solution 2

Solution 3

Solution 1

Add your solution here

Preview 0