LZW Compressor Byte Problem

Question

1.00/5 (1 vote)

See more:

Hey guy's!

I've been working in a LZW compressor in c# and i've got some problems...

So i build the dictionary initialy with the 255 known codes and then i star coding... the problem is for example when i try to code the int 256 to a byte this gives problems :S can someone help me ?

Cumps and thanks in advance

This part is to build the dictionary and compress a file

C#

while (br.BaseStream.Position < br.BaseStream.Length)
  {

      Console.WriteLine(omg);
      omg++;

      t = br.ReadByte();
      chr = t;
      int aux=-1;

      byte[] res = new byte[str.Count() + 1];

      for (int i = 0; i < str.Count(); i++)
      {
          res[i] = str[i];
      }

      res[str.Count()] = chr;

      int pos = isEqual(res, lista);

      if (pos != -1)
      {
          str = new byte[res.Count()];
          for(int k=0;k<res.Count();k++)
          {
              str[k] = res[k];
          }

      }
      else if (pos==-1)
      {

          aux = isEqual(str, lista);

          byte uh = (byte)aux;
          _FileStream.WriteByte(uh);

          Node nv = new Node();
          nv.by = new Byte[res.Count()];

          for (int k = 0; k < res.Count(); k++)
          {
              nv.by[k] = res[k];
          }

          lista.Add(nv);

          str = new byte[1];
          str[0] = chr;

      }

  }

Lista = Dictionary;
isEqual Function = function that returns the position of sequence of bytes that we are searching in the dictionary

This part was to uncompress -> and this where's the problem when i read the bytes... i dont get what i have written...

C#

while (br.BaseStream.Position < br.BaseStream.Length)
           {
               t = br.ReadByte();
               if (cnt == 0)
               {
                   NCODE = new byte[1];
                   NCODE[0] = t;
               }
               else
               {
                   NCODE = new byte[NCODE.Count()];
                   NCODE[NCODE.Count()] = t;
               }

               pcr = isEqual(NCODE, lista);

               if (pcr == -1)
               {
                   pcr = isEqual(OCODE, lista);
                   str = new byte[OCODE.Count()];

                   for (int i = 0; i < OCODE.Count(); i++)
                   {
                       str[i] = OCODE[i];
                   }

                   if (cnt > 0)
                   {
                       str = new byte[OCODE.Count() + 1];
                       str[OCODE.Count()] = chr;
                   }
               }
               else if (pcr > -1)
               {
                   pcr = isEqual(NCODE, lista);
                   str = new byte[OCODE.Count()];

                   for (int i = 0; i < OCODE.Count(); i++)
                   {
                       str[i] = OCODE[i];
                   }
               }

               for (int i = 0; i < str.Count(); i++)
               {
                   _FileStream.WriteByte((byte)str[i]);
               }

               chr = str[0];

               Node nv = new Node();
               nv.by = new byte[OCODE.Count() + 1];

               for (int i = 0; i < OCODE.Count(); i++)
               {
                   nv.by[i] = OCODE[i];
               }

               nv.by[OCODE.Count()] = chr;

               OCODE = new byte[NCODE.Count()];

               for (int i = 0; i < NCODE.Count(); i++)
               {
                   OCODE[i] = NCODE[i];
               }

           }

           _FileStream.Close();

Posted 12-Nov-12 15:18pm

SSilver009

Updated 19-Nov-12 23:05pm

v3

Add a Solution

Comments

lewax00 12-Nov-12 21:37pm

Well for starters, you can't represent 256 with a single byte...

Sergey Alexandrovich Kryukov 12-Nov-12 22:28pm

:-)

SSilver009 13-Nov-12 1:02am

i've read that is possible throw shift's and or's i think to have 9 bits in one byte

lewax00 13-Nov-12 9:45am

Then what you read is mistaken. In an PC CPU a byte is 8 bits. Period. This is a limitation that exists on a physical level. 8 bits only have 256 possible values, and since one of those is 0, the maximum is 255.

Sergey Alexandrovich Kryukov 12-Nov-12 22:28pm

"Gives problems..." What problems?
--SA

SSilver009 13-Nov-12 1:03am

when i write the byte 256 to the file he writes instead the byte 0 :S and this gives problems when i try to decompress the file :X

Sergey Alexandrovich Kryukov 13-Nov-12 9:55am

There is no such thing as byte 256! I would advise you to get more confidence on more simple tasks, before coming to compression.
--SA

lukeer 20-Nov-12 3:20am

Some comments would be great, especially where exactly the code doesn't behave like it should.

For now I suspect that you see an error in
_FileStream.WriteByte((byte)str[i]); correct?

If that's correct, the next interesting part is the byte-copying loop just above: str is created having the size of NCODE, but is copied from OCODE. Is that by intention or a possible source of the error?

SSilver009 20-Nov-12 4:59am

Yes you're right despite of not giving error, but writes the wrong thing :X its that thing that i said before it writes 1 instead of 257 :X so when im going to decode it decodes the wrong bytes :X

you were right the byte-copying looping to str was an error.

SSilver009 22-Nov-12 11:06am

Yes you're right despite of not giving error, but writes the wrong thing :X its that thing that i said before it writes 1 instead of 257 :X so when im going to decode it decodes the wrong bytes :X

you were right the byte-copying looping to str was an error.

2 solutions

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

BobJanova · Answer 1 · 2012-11-12T22:26:00

Solution 2

If you need to encode more than 256 different values, i.e. 0-255, you need to use more than 8 bits. In any normal scenario that means going from one byte to two, giving you a 16 bit ushort (0-65535).

Posted 12-Nov-12 22:26pm

BobJanova

Comments

Sergey Alexandrovich Kryukov 13-Nov-12 9:57am

No wonder OP has problems -- look at the comments to the question -- trying to write byte 256 (!). See what I advised...
--SA

SSilver009 13-Nov-12 10:15am

Its like lukeer said "Do you mean your problem is not a value greater than 255 but instead the bytes at indexes greater than 255 in an input file?" it's suposed to write the index of the dictionary to the file when coding with lzw right?

BobJanova 13-Nov-12 12:01pm

If it's file offsets it probably needs to be 4 or maybe even 8 bytes (though I doubt this person is working on seriously large 64-bit data just yet).

lukeer · Answer 2 · 2012-11-12T19:04:00

Solution 1

Before compressing with whatever technique, you first have to properly serialize your data. That means to transform what you have in a reversable way into a series of individual bytes.

An int is an alias for an Int32. That name hints to its 32 bit of memory consumption. You therefore have to break every int up into four bytes.

Handle all other types in a similar way. Always keep in mind that you have to restore your data from that byte sequence afterwards (after decompression). Then, you can compress byte by byte.

Posted 12-Nov-12 19:04pm

lukeer

Comments

SSilver009 13-Nov-12 1:18am

I understand what you're saying ... but all bytes in the file belong to the interval between 0 and 255 so it's that really necessary ? the lzw algorithm builds a new dicionary after 255 the problem is on writting after that :S

lukeer 13-Nov-12 2:02am

Do you mean your problem is not a value greater than 255 but instead the bytes at indexes greater than 255 in an input file?

SSilver009 13-Nov-12 10:12am

exactly! it's suposed to write the index of the dictionary to the file when coding with lzw right?

lukeer 14-Nov-12 2:59am

IIRC, you're right. You create a dictionary of frequently used byte sequences. Whenever one of those appears in the input file, you replace it with its dictionary index in the output file.

This only reduces file size if there are frequently used byte sequences longer than one byte. Otherwise all you're doing is add overhead and complexity.

Use the "Improve question" link beneath your original question and add the portion of your source that "gives problems". At least show definitions of the dictionary and indexing variables.

SSilver009 16-Nov-12 20:17pm

the thing is that im only building my dictionary and than printing to a file te code of the index that it gives the problem is when i go to index's bigger than 255 :/ it dont gives problems in writing but for example when i write byte 256 when i go to read it, the program read byte 0 (257 = 1, 258 = 2) :/ what should i do ? :X do you have any idea?

lukeer 19-Nov-12 1:33am

If the dictionary holds only 256 entries, then you shouldn't ask for indexes above 255.

If the dictionary is larger, then search your code for a cast. There may be an integer, or long or whatever that holds the index to read from the dictionary. If this is being cast to byte then the behaviour you describe occurs.

There are many "if"s in this post. Seeing your code would easen our attempts to help you (remember the "Improve question" link).

SSilver009 19-Nov-12 8:29am

Lukeer i've updated the question with the parts of the code that make the compression and the uncompression when i write i will always write things above the 255 because the dictionary is full until that position whit the ascii bytes... i think the problem is on the write because i'm not reading what i'm suposed to read :X (is like i said before ... i write 257 and when i go to read it gives me 1) :X thanks