Click here to Skip to main content
15,881,882 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
Hey guy's!

I've been working in a LZW compressor in c# and i've got some problems...

So i build the dictionary initialy with the 255 known codes and then i star coding... the problem is for example when i try to code the int 256 to a byte this gives problems :S can someone help me ?

Cumps and thanks in advance

This part is to build the dictionary and compress a file

C#
while (br.BaseStream.Position < br.BaseStream.Length)
  {

      Console.WriteLine(omg);
      omg++;

      t = br.ReadByte();
      chr = t;
      int aux=-1;

      byte[] res = new byte[str.Count() + 1];

      for (int i = 0; i < str.Count(); i++)
      {
          res[i] = str[i];
      }

      res[str.Count()] = chr;

      int pos = isEqual(res, lista);

      if (pos != -1)
      {
          str = new byte[res.Count()];
          for(int k=0;k<res.Count();k++)
          {
              str[k] = res[k];
          }

      }
      else if (pos==-1)
      {

          aux = isEqual(str, lista);

          byte uh = (byte)aux;
          _FileStream.WriteByte(uh);

          Node nv = new Node();
          nv.by = new Byte[res.Count()];

          for (int k = 0; k < res.Count(); k++)
          {
              nv.by[k] = res[k];
          }

          lista.Add(nv);

          str = new byte[1];
          str[0] = chr;

      }

  }


Lista = Dictionary;
isEqual Function = function that returns the position of sequence of bytes that we are searching in the dictionary


This part was to uncompress -> and this where's the problem when i read the bytes... i dont get what i have written...

C#
while (br.BaseStream.Position < br.BaseStream.Length)
           {
               t = br.ReadByte();
               if (cnt == 0)
               {
                   NCODE = new byte[1];
                   NCODE[0] = t;
               }
               else
               {
                   NCODE = new byte[NCODE.Count()];
                   NCODE[NCODE.Count()] = t;
               }

               pcr = isEqual(NCODE, lista);

               if (pcr == -1)
               {
                   pcr = isEqual(OCODE, lista);
                   str = new byte[OCODE.Count()];

                   for (int i = 0; i < OCODE.Count(); i++)
                   {
                       str[i] = OCODE[i];
                   }

                   if (cnt > 0)
                   {
                       str = new byte[OCODE.Count() + 1];
                       str[OCODE.Count()] = chr;
                   }
               }
               else if (pcr > -1)
               {
                   pcr = isEqual(NCODE, lista);
                   str = new byte[OCODE.Count()];

                   for (int i = 0; i < OCODE.Count(); i++)
                   {
                       str[i] = OCODE[i];
                   }
               }

               for (int i = 0; i < str.Count(); i++)
               {
                   _FileStream.WriteByte((byte)str[i]);
               }

               chr = str[0];

               Node nv = new Node();
               nv.by = new byte[OCODE.Count() + 1];

               for (int i = 0; i < OCODE.Count(); i++)
               {
                   nv.by[i] = OCODE[i];
               }

               nv.by[OCODE.Count()] = chr;

               OCODE = new byte[NCODE.Count()];

               for (int i = 0; i < NCODE.Count(); i++)
               {
                   OCODE[i] = NCODE[i];
               }

           }

           _FileStream.Close();
Posted
Updated 19-Nov-12 23:05pm
v3
Comments
lewax00 12-Nov-12 21:37pm    
Well for starters, you can't represent 256 with a single byte...
Sergey Alexandrovich Kryukov 12-Nov-12 22:28pm    
:-)
SSilver009 13-Nov-12 1:02am    
i've read that is possible throw shift's and or's i think to have 9 bits in one byte
lewax00 13-Nov-12 9:45am    
Then what you read is mistaken. In an PC CPU a byte is 8 bits. Period. This is a limitation that exists on a physical level. 8 bits only have 256 possible values, and since one of those is 0, the maximum is 255.
Sergey Alexandrovich Kryukov 12-Nov-12 22:28pm    
"Gives problems..." What problems?
--SA

If you need to encode more than 256 different values, i.e. 0-255, you need to use more than 8 bits. In any normal scenario that means going from one byte to two, giving you a 16 bit ushort (0-65535).
 
Share this answer
 
Comments
Sergey Alexandrovich Kryukov 13-Nov-12 9:57am    
No wonder OP has problems -- look at the comments to the question -- trying to write byte 256 (!). See what I advised...
--SA
SSilver009 13-Nov-12 10:15am    
Its like lukeer said "Do you mean your problem is not a value greater than 255 but instead the bytes at indexes greater than 255 in an input file?" it's suposed to write the index of the dictionary to the file when coding with lzw right?
BobJanova 13-Nov-12 12:01pm    
If it's file offsets it probably needs to be 4 or maybe even 8 bytes (though I doubt this person is working on seriously large 64-bit data just yet).
Before compressing with whatever technique, you first have to properly serialize your data. That means to transform what you have in a reversable way into a series of individual bytes.

An int is an alias for an Int32. That name hints to its 32 bit of memory consumption. You therefore have to break every int up into four bytes.

Handle all other types in a similar way. Always keep in mind that you have to restore your data from that byte sequence afterwards (after decompression). Then, you can compress byte by byte.
 
Share this answer
 
Comments
SSilver009 13-Nov-12 1:18am    
I understand what you're saying ... but all bytes in the file belong to the interval between 0 and 255 so it's that really necessary ? the lzw algorithm builds a new dicionary after 255 the problem is on writting after that :S
lukeer 13-Nov-12 2:02am    
Do you mean your problem is not a value greater than 255 but instead the bytes at indexes greater than 255 in an input file?
SSilver009 13-Nov-12 10:12am    
exactly! it's suposed to write the index of the dictionary to the file when coding with lzw right?
lukeer 14-Nov-12 2:59am    
IIRC, you're right. You create a dictionary of frequently used byte sequences. Whenever one of those appears in the input file, you replace it with its dictionary index in the output file.

This only reduces file size if there are frequently used byte sequences longer than one byte. Otherwise all you're doing is add overhead and complexity.

Use the "Improve question" link beneath your original question and add the portion of your source that "gives problems". At least show definitions of the dictionary and indexing variables.
SSilver009 16-Nov-12 20:17pm    
the thing is that im only building my dictionary and than printing to a file te code of the index that it gives the problem is when i go to index's bigger than 255 :/ it dont gives problems in writing but for example when i write byte 256 when i go to read it, the program read byte 0 (257 = 1, 258 = 2) :/ what should i do ? :X do you have any idea?

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900