Click here to Skip to main content
Click here to Skip to main content

Tagged as

GZipStream is helpful, but has some missing features

, 28 Jul 2009
Rate this:
Please Sign up or sign in to vote.
I recently had to work around a problem in a particularly ugly way (which I wont detail ), so after that painful experience I opted to create a class to solve my specific issue in a sane and reusable manner! Out of this unexpected need the class “GZipHelper” was born. This is really [.

I recently had to work around a problem in a particularly ugly way (which I wont detail :-) ), so after that painful experience I opted to create a class to solve my specific issue in a sane and reusable manner! Out of this unexpected need the class “GZipHelper” was born. This is really just a wrapper around the  base .Net System.IO.Compression.GZipStream . Its was kind of a sad day as I really didn’t want to be doing this type of wrapper code, I was hoping it would have just been nativity available in the existing GZipStream class and I could have got on with solving my real business problem at hand.

Firstly it should be said that the standard GZipStream stream provides the functionality I’m sure the MS engineers expected it to do, which was for HTTP based compression (at least I think that was its expected purpose). However it is certainly not a fully featured class that is really easy to use for the programmers looking to get quick & helpful access to the GZip compression.

Specifically the problem I needed to solved was I needed to know how big any given “.GZ” decompressed file was without fully reading and decompressing the file. It seemed trivial enough – “gzip.exe -l” does what I needed, but no amount of hunting within MSDN helped. So on to the ever handy GZip wikipedia entry that detailed enough of the file format and provided the reference to the “GZIP file format specification version 4.3“.

So armed this this information we can start to decode the GZip file format to extract the length. Infact this class will check the file to see if it is GZip compressed and returns the decompressed length for that or the regular file length if it is not compressed.

The following class functions have been implemented (see the bottom of the article for the link to the full project):

   /// <summary>
   /// Utility class to help with managing GZip (.gz) files in .Net
   /// </summary>
   /// <remarks>
   /// This is a trivial wrapper class on top of <see cref="GZipStream"/> that does a little magic
   /// under the covers by looking at the underlying data format and retrieves the
   /// stored data information within the GZip compressed file.
   /// </remarks>
   public class GZipHelper
   {
      /// <summary>
      /// Gets the compressed file details
      /// </summary>
      /// <param name="filename">The filename.</param>
      /// <returns>True if file exists, else false</returns>
      public bool GetFileDetails(string filename);

      /// <summary>
      /// Gets the compressed file information from a file stream
      /// </summary>
      /// <param name="fileStream">The file stream.</param>
      /// <remarks>
      /// Definitions provided by RFC 1952 -GZIP File Format Specification (May 1996).
      /// Coding was performed against ftp://ftp.isi.edu/in-notes/rfc1952.txt
      /// </remarks>
      public void GetFileInformation(FileStream fileStream);

      /// <summary>
      /// Compresses the file file
      /// </summary>
      /// <param name="filename">The filename.</param>
      /// <param name="overWriteExisting">if set to <c>true</c> [over write existing].</param>
      /// <returns></returns>
      public void CompressFile(string filename, bool overWriteExisting);

      /// <summary>
      /// Decompresses the file.
      /// </summary>
      /// <param name="filename">The filename.</param>
      /// <param name="overWriteExisting">if set to <c>true</c> [over write existing].</param>
      /// <returns></returns>
      public bool DecompressFile(string filename, bool overWriteExisting);

      /// <summary>
      /// Returns a seekable stream into either a file or compressed file (defaults read-only)
      /// </summary>
      /// <remarks>
      /// Decompresses the stream into a <see cref="MemoryStream"/> if the file is compressed
      /// otherwise just returns back a regular <see cref="FileStream"/> as a <see cref="Stream"/>
      /// </remarks>
      /// <param name="filename">The filename to open.</param>
      /// <returns>Reference to opened stream</returns>
      public Stream GetSeekableStream(string filename);
   }

In combination to this the following properties are available:

  • CompressedLength – Size of the compressed file (or regular file size if not compressed)
  • DecompressedLength – Size of the file if it were uncompressed (or regular file size if not compressed)
  • IsTextFile – Indicates if GZip thought the file was text based, potentially leading to better compression
  • CompressionModeValue – Numeric indication of the compression mode used
  • CRC16Present – Indicates a CRC16 is available for the file
  • ExtraFieldsPresent – Additional meta fields are available in the file
  • FileNamePresent – GZip contains the original file name
  • FileCommentPresent – Compressed file has a comment associated with it
  • IsCompressed – Indicates if the file is GZip compressed or not
  • CompressedDate – If stored this is the date the file was compressed.
  • CRC32 – CRC32 value associated with the file

Along with the project there are MSTest harnesses to test the class (trivial implementations). So the features of the class are:

  • Can trivially determine a true file size (regardless if it was compressed via GZip or is uncompressed). This makes your code path much more readable if you are dealing with mixed file types.
  • Provides a Seekable stream into the compressed file via via a MemoryStream. The key is that you dont need to worry about the compression (unless you are reading in BIG files) as you will get back a Stream for either a File or a Compressed file – both support seeking. This can be handy if you problem assumes it can Seek in the stream and you need to access GZip files!
  • Trivial Decompress file, this also honors the CompressedDate. If that date is set then the decompressed file has that creation date.
  • Trivial Compress file. Unfortunately at the time of writing I’ve not updated the header to include the date of the compressed file. This may come in a later version (and if so I’ll update the blog :-) – but definitely no promises!).

Simple example usages are (taken straight from the unit tests!):


// Perform a file compression
GZipHelper actual = new GZipHelper();
actual.CompressFile(_fileName, true);

// Perform a file decompression
GZipHelper actual = new GZipHelper();
string fileName = "CSharpHackerSmallTest.txt.gz";
actual.DecompressFile(fileName, true);

// Get a seekable stream
GZipHelper actual = new GZipHelper();
using (Stream dataStream = actual.GetSeekableStream("CSharpHackerSmallTest.txt.gz"))
{
    // Silly seek - but it just shows it can be done
    dataStream.Seek(0, SeekOrigin.Begin);
    StreamReader sr = new StreamReader(dataStream);
    string contents = sr.ReadToEnd();

    Assert.AreEqual(119, contents.Length);
}

// Gets natural decompressed file length from a compressed file.
GZipHelper actual = new GZipHelper();
actual.GetFileInformation("CSharpHackerSmallTest.txt.gz");
Assert.AreEqual(119, actual.DecompressedLength);

Finally it should be noted that by all accounts the standard implementation of GZipStream in the base .Net libraries (actually the DeflateStream) has a problem when attempting to compress random or already compressed data. There is a Microsoft Connect article [http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=93930] that details the issue.

The GZipStream and DeflateStream classes can _significantly_ increase the size of “compressed” data. That means, they don’t just add a few header bytes as stand-alone compressors do, but they _inflate_ the data by as much as 50%. This is apparently because these classes do not check for incompressible data which is a standard feature of all stand-alone compressors. Both classes work fine when the data actually can be compressed.

Please refer to this thread for more details:
http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=179704&SiteID=1

The base implementation worked for me and met my specific needs without the need of bringing in any third party DLLs. Which incidentally also has a nice benefit for those looking to bring this into proprietary software of avoiding any licensing discussions with supervisors! If you want a more robust GZipStream implementation you can check out http://dotnetzip.codeplex.com/. This apparently has a drop in replacement, but this class could still be useful even if use this drop in replacement as well.

I hope this helps some one out there :-)

[Download GZipHelper (Source + Project) Here]

This download link will always have the latest and greatest version.

Gareth

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

GarethI

United States United States
I'm Gareth and am a guy who loves software! My day job is working for a retail company and am involved in a large scale C# project that process large amounts of data into up stream data repositories.
 
My work rule of thumb is that everyone spends much more time working than not, so you better enjoy what you do!
 
Needless to say - I'm having a blast.
 
Have fun,
 
Gareth

Comments and Discussions

 
GeneralMy vote of 1 PinmemberPaw Jershauge18-Jul-13 2:49 
GeneralMy vote of 5 PinmemberVMAtm9-Jun-13 3:32 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web01 | 2.8.140827.1 | Last Updated 28 Jul 2009
Article Copyright 2009 by GarethI
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid