Click here to Skip to main content
Click here to Skip to main content

C# Use Zip Archives without External Libraries

By , 12 Jun 2011
 

Introduction

I found a lot of articles on how to access Zip archives in C# but all with significant disadvantages. The main problem is that Microsoft has Zip archives implemented in the operating system but there is no official API that we can use. In C# for example, we have the System.IO.Compression.GZip but there is no adequate System.IO.Compression.Zip class.

There are some free .NET compression libraries like SharpZipLib and .NET Zip Library, but this leads to additional installation effort and licensing problems.

It is also possible to use the free J# Library. J# has included Zip to keep compatible with the Java libraries. But to bundle a 3.6 MB DLL vjslib.dll, just to support Zip, seems like a really goofy hack.

Since .NET 3.0, we can use the System.IO.Packaging ZipPackage class in WindowsBase.DLL. It's just 1.1 MB, and it just seems to fit a lot better than importing Java libraries.

Problem only that the ZipPackage class isn't a generic Zip implementation, it's a packaging library for formats like XPS and Office Open XML that happen to use Zip.

To access simple Zip archives with ZipPackage fails because the content is checked for Package conventions.

For example, there has to be a file [Content_Types].xml in the root and only files with specified extensions are accessible. Filenames with special characters and spaces are not allowed and the access time is not the best because of the additional Package link logic.

However, the assembly WindowsBase.DLL is preinstalled and the generic Zip implementation is inside. The only problem is that the generic Zip classes are not public and visible for the programmers. But there is a simple way to get access to this hidden API and I wrote a small wrapper class for this.

Background

A quick check in the Object Browser shows us that WindowsBase.DLL has a namespace MS.Internal.IO.Zip. This sounds good, but there are no public classes visible.

However, the following call:

var types = typeof(System.IO.Packaging.Package).Assembly.GetTypes();

gives us 824 class types, public and non-public and especially one with the name MS.Internal.IO.Zip.ZipArchive. Now it is easy to get this special class type and the methods and properties:

var type = typeof(System.IO.Packaging.Package).Assembly.GetType
		("MS.Internal.IO.Zip.ZipArchive");
var static_methodes = type.GetMethods(BindingFlags.Static | 
		BindingFlags.Public | BindingFlags.NonPublic);
var nostatic_methodes = type.GetMethods(BindingFlags.Instance | 
		BindingFlags.Public | BindingFlags.NonPublic);

and we get the most important methods:

static ZipArchive OpenOnFile(string path, FileMode mode, 
	FileAccess access, FileShare share, bool streaming);
static ZipArchive OpenOnStream(Stream stream, FileMode mode, 
	FileAccess access, bool streaming);
ZipFileInfo AddFile(string path, 
	CompressionMethodEnum compmeth, DeflateOptionEnum option);
ZipFileInfo GetFile(string name);
ZipFileInfo DeleteFile(string name);
ZipFileInfoCollection GetFiles();
void Dispose();

The same procedure for ZipFileInfo and we get:

Stream GetStream(FileMode mode, FileAccess access);

and properties like: Name, LastModFileDateTime, FolderFlag...
This is all what we need to implement a small wrapper class and access over Reflection:

class ZipArchive : IDisposable
{
  private object external;
  public static ZipArchive OpenOnFile
      (string path, FileMode mode, FileAccess access, FileShare share, bool streaming)    
  {
    var type = typeof(System.IO.Packaging.Package).Assembly.GetType
		("MS.Internal.IO.Zip.ZipArchive");
    var meth = type.GetMethod("OpenOnFile", BindingFlags.Static | 
		BindingFlags.Public | BindingFlags.NonPublic);
    return new ZipArchive { external = meth.Invoke(null, new object[] 
		{ path, mode, access, share, streaming }) };
  } 
  //...
  public class ZipFileInfo //...
}

The complete ZipArchive wrapper implementation is in the demo project ZipArchiveTest in Program.cs.
Only 97 lines for this class and we can use it in a code sequence like this:

var str = new MemoryStream();

//create some files:
using (var arc = ZipArchive.OpenOnStream(str))
{
  var doc1 = new XDocument(new XElement
	("root", new XElement("item"), new XElement("item"), new XElement("item")));
  var doc2 = new XDocument(new XElement("root", Enumerable.Repeat
		("item", 1000).Select(p => new XElement(p))));
  using (var fs = arc.AddFile("test1.xml").GetStream
		(FileMode.Open, FileAccess.ReadWrite)) doc1.Save(fs);
  using (var fs = arc.AddFile("test2.xml").GetStream
		(FileMode.Open, FileAccess.ReadWrite)) doc2.Save(fs);
}

// read the files:
using (var arc = ZipArchive.OpenOnStream(str))
{
  var doc1 = XDocument.Load(arc.GetFile("test1.xml").GetStream());
  var doc2 = XDocument.Load(arc.GetFile("test2.xml").GetStream());
  var doc3 = XDocument.Load(arc.GetFile("dir/test3.xml").GetStream());
}

Using the Demo

The demo program ZipArchiveTest using the ZipArchive class is as small as possible and can show the content of Zip archives without any restrictions. Double click to files in the ListBox opens a new window to show the file content as text.

Conclusion

Microsoft should publish its hidden ZipArchive class, but till then, we can use such a simple wrapper to save terabytes of data worldwide.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

No Biography provided

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
QuestionLooks like someone has stolen your article...memberFlorian Storck22-May-13 2:59 
...take a look here:
 
http://chiragkanzariya.blogspot.de/2012/10/c-use-zip-archives-without-external.html
AnswerRe: Looks like someone has stolen your article...memberD. Christian Ohle22-May-13 6:01 
Thanks, no problem, I have enough money Smile | :)
QuestionGreat! But...memberCainKellye22-Apr-13 1:36 
It's nice to have an option to use zip without external dll-s. Thanks for the tip!
But reflection, they say, have a big impact on performance. So to minimize that impact you should save the PropertyInfo for each property and MethodInfo for each method you use, as well as the type like this:
 
Type reflectedType = external.GetType();
PropertyInfo propInfo = reflectedType.GetProperty(propertyName, BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic);
MethodInfo methodInfo = reflectedType.GetMethod("GetStream", BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic);
 
So when you need to retireve the value of the property, you can use:
propInfo.GetValue(external, null);
or
methodInfo.Invoke(external, ...);
instead of calling GetType(), GetProperty() and GetMethod() over and over again.
AnswerRe: Great! But...memberD. Christian Ohle22-Apr-13 2:00 
Hi, thanks, please check the code in the comment in "New ZipArchive class without reflection calls".
It is below probably at page two.
This should be fast as possible but the Expression compilation cost one time of course.
GeneralMy vote of 5memberRalf_13-Mar-13 9:07 
Very helpful !
 
With .NET 3.5/Win7SP1 I couldn't extract items with special German characters (umlaut) zipped with WinZip, since the encoder and decoder object fallback settings are char(63)='?'. But this character is forbidden for file system names.
GeneralRe: My vote of 5memberD. Christian Ohle13-Mar-13 20:52 
Hello Ralf
There is a simple workaround, it's possible to Change the encoding. please check the comment "Problem with non ascii characters in file names" some lines before:
 
var blockManagerType = packageType.Assembly.GetType("MS.Internal.IO.Zip.ZipIOBlockManager");
FieldInfo blockManagerField = zipArchiveType.GetField("_blockManager", BindingFlags.Instance | BindingFlags.NonPublic);
object blockManager = blockManagerField.GetValue(zipArchiveObject);
FieldInfo encodingField = blockManagerType.GetField("_encoding", BindingFlags.Instance | BindingFlags.NonPublic);
encodingField.SetValue(blockManager, new FakeAsciiEncoding());
public sealed class FakeAsciiEncoding : ASCIIEncoding
{
    private readonly Encoding encoding = GetEncoding(858);      
    public override byte[] GetBytes(string s) { return this.encoding.GetBytes(s); }     
    public override string GetString(byte[] bytes) { return this.encoding.GetString(bytes); }
}
 
Greetings from Germany
GeneralRe: My vote of 5memberRalf_14-Mar-13 0:30 
Sorry for my blindness, Christian - who can read is on the inside track.
Additionally I'm new with .NET.
Hence my question:
What is the type/definition of "zipArchiveObject" in call to function blockManagerField.GetValue() of this workaround ?
 
Using a ThreadPool to extract an archive of 0.5GB is very fast and takes about 30s. But I have to re-open the archive in each thread, since passing object references (stream, ZipArchive, ZipFileInfo) doesn't work between the different appdomains.
 
Kind regards from the same country
Ralf
GeneralRe: My vote of 5memberD. Christian Ohle14-Mar-13 6:13 
Hi, try this. "zip Archive Object" is equivalent to "external", the object itself, not the type or any member info of the type.
You said NET is new for you. In this case you could think this object is something like an IUnknown pointer in c++ or better something like IDispatch.
The way over reflection in NET is something like to ask IDispatch for methodes and properties based on names, and this is what we are doing.
 
For the case that you need more speed for many files - there is the possebillity to compile such relativ slow reflection calls to delegates.
The code is in the comment "New ZipArchive class without reflection calls" (13. Jun '11)
There is one time the compilation at first request but then the calls are realtime, therefore it makes sense, the additional effort, for many files, not for use sometimes.
 
Regards
 
public static ZipArchive OpenOnFile(string path, FileMode mode = FileMode.Open, FileAccess access = FileAccess.Read, FileShare share = FileShare.Read, bool streaming = false)
{
  var type = typeof(System.IO.Packaging.Package).Assembly.GetType("MS.Internal.IO.Zip.ZipArchive");
  var meth = type.GetMethod("OpenOnFile", BindingFlags.Static | BindingFlags.Public | BindingFlags.NonPublic);
 
  //old default ANSI:
  //return new ZipArchive { external = meth.Invoke(null, new object[] { path, mode, access, share, streaming }) };

  //new unicode extension:
  var zipArchiveObject = meth.Invoke(null, new object[] { path, mode, access, share, streaming });
  var blockManagerType = typeof(System.IO.Packaging.Package).Assembly.GetType("MS.Internal.IO.Zip.ZipIOBlockManager");
  var blockManagerField = type.GetField("_blockManager", BindingFlags.Instance | BindingFlags.NonPublic);
  var blockManager = blockManagerField.GetValue(zipArchiveObject);
  var encodingField = blockManagerType.GetField("_encoding", BindingFlags.Instance | BindingFlags.NonPublic);
  encodingField.SetValue(blockManager, new FakeAsciiEncoding());
  return new ZipArchive { external = zipArchiveObject };
}

GeneralRe: My vote of 5memberRalf_14-Mar-13 23:58 
Thank you very much, Christian.
 
My (beginners) error was to pass the zipArchiveObject casted as ZipArchive to the function GetValue.
 
The solution in "New ZipArchive" is very good as long as the ZipArchive is used in the same thread context. To extract each file in a different thread (parallelization), I can't pass a ZipArchive or ZipFileInfo object, but I have to re-open the archive in each thread and this always takes a lot of time for large achives. This, and of course the competitively hard disk access, makes the parallel extraction three times slower than in a single thread. I'll have to give up this idea probably.
 
Thanks again and have a nice weekend !
Ralf
GeneralMy vote of 5memberDanielSheets11-Mar-13 4:58 
Excellent!
GeneralMy vote of 5memberJaap Lamfers12-Feb-13 8:32 
Really nice!
Suggestion.NET 3.5 versionmemberStehtimSchilf11-Feb-13 8:49 
Hi
 
Christian's example is just wow! Unfortunately we've got some .NET 3.5 Applications which we cannot port that easily yet. Because of some .NET 4.0 language features which aren't supported in 3.5 (e.g. default values), I've adjusted Christian's code so it can be used in 3.5:
 
Thx Chris!
 
using System;
using System.Collections.Generic;
using System.Text;
 
using System.IO;
using System.Linq;
using System.Reflection;
 
// add ref to PresentationCore.dll for System.IO.Packaging
// add ref to WindowsBase.dll for System.IO.Packaging.Package

 
namespace codeproject.ZipArchive {
 

   /// <summary>
   /// This is a .NET 3.5 adaptation of Code Project article
   /// 'C# Use Zip Archives without External Libraries'
   /// http://www.codeproject.com/Articles/209731/Csharp-use-Zip-archives-without-external-libraries
   /// by  D. Christian Ohle
   /// </summary>
   /// <remarks>
   /// Only minor changes were needed:
   /// - set references to PresentationCore.dll and WindowsBase.dll
   /// - added ZipArchive(object) constructor
   /// - added ZipFileInfo(object) constructor
   /// - added additional OpenOnFile(), OpenOnStream(), AddFile()
   ///   to support function polymorphism. .NET 3.5 does not support default values for optional parameters by default
   /// - added CopyTo() method
   /// - calls of ZipArchive()-, ZipFileInfo()-constructors adjusted
   /// </remarks>
   public class ZipArchive : IDisposable {
      private object external;
 
      public enum CompressionMethodEnum { 
         Stored, 
         Deflated 
      };
      public enum DeflateOptionEnum { 
         Normal, 
         Maximum, 
         Fast, 
         SuperFast 
      };
 
      private ZipArchive() {
      }
 
      /// <remarks>
      /// added for .NET 3.5 support
      /// </remarks>
      private ZipArchive(object external) {
         this.external = external;
      }
 
      public static ZipArchive OpenOnFile(string path, FileMode mode, FileAccess access, FileShare share, bool streaming) {
         var type = typeof(System.IO.Packaging.Package).Assembly.GetType("MS.Internal.IO.Zip.ZipArchive");
         var meth = type.GetMethod("OpenOnFile", BindingFlags.Static | BindingFlags.Public | BindingFlags.NonPublic);
         return new ZipArchive(meth.Invoke(null, new object[] { path, mode, access, share, streaming }));
      }
 
      public static ZipArchive OpenOnFile(string path) {
         return ZipArchive.OpenOnFile(path, FileMode.Open, FileAccess.Read, FileShare.Read, false);
      }
 
      public static ZipArchive OpenOnStream(Stream stream) {
         return ZipArchive.OpenOnStream(stream, FileMode.OpenOrCreate, FileAccess.ReadWrite, false);
      }
 
      public static ZipArchive OpenOnStream(Stream stream, FileMode mode, FileAccess access, bool streaming) {
         var type = typeof(System.IO.Packaging.Package).Assembly.GetType("MS.Internal.IO.Zip.ZipArchive");
         var meth = type.GetMethod("OpenOnStream", BindingFlags.Static | BindingFlags.Public | BindingFlags.NonPublic);
         return new ZipArchive(meth.Invoke(null, new object[] { stream, mode, access, streaming }));
      }
 
      public ZipFileInfo AddFile(string path) {
         return this.AddFile(path, CompressionMethodEnum.Deflated, DeflateOptionEnum.Normal);
      }
 
      public ZipFileInfo AddFile(string path, CompressionMethodEnum compmeth, DeflateOptionEnum option) {
         var type = external.GetType();
         var meth = type.GetMethod("AddFile", BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic);
         var comp = type.Assembly.GetType("MS.Internal.IO.Zip.CompressionMethodEnum").GetField(compmeth.ToString()).GetValue(null);
         var opti = type.Assembly.GetType("MS.Internal.IO.Zip.DeflateOptionEnum").GetField(option.ToString()).GetValue(null);
         return new ZipFileInfo(meth.Invoke(external, new object[] { path, comp, opti }));
      }
 
      public void DeleteFile(string name) {
         var meth = external.GetType().GetMethod("DeleteFile", BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic);
         meth.Invoke(external, new object[] { name });
      }
 
      public void Dispose() {
         ((IDisposable)external).Dispose();
      }
      public ZipFileInfo GetFile(string name) {
         var meth = external.GetType().GetMethod("GetFile", BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic);
         return new ZipFileInfo(meth.Invoke(external, new object[] { name }));
      }
 
      public IEnumerable<ZipFileInfo> Files {
         get {
            var meth = external.GetType().GetMethod("GetFiles", BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic);
            var coll = meth.Invoke(external, null) as System.Collections.IEnumerable; //ZipFileInfoCollection
            foreach (var p in coll) yield return new ZipFileInfo(p);
         }
      }
 
      public IEnumerable<string> FileNames {
         get {
            return Files.Select(p => p.Name).OrderBy(p => p);
         }
      }
 
      /// <summary>
      /// 
      /// </summary>
      /// <param name="input"></param>
      /// <param name="output"></param>
      /// <remarks>
      /// added for .NET 3.5 support
      /// </remarks>
      public static void CopyTo(Stream input, Stream output) {
         byte[] buffer = new byte[16 * 1024]; // Fairly arbitrary size
         int bytesRead;
 
         while ((bytesRead = input.Read(buffer, 0, buffer.Length)) > 0) {
            output.Write(buffer, 0, bytesRead);
         }
      }
 
      public struct ZipFileInfo {
         internal object external;
 
         /// <summary>
         /// 
         /// </summary>
         /// <param name="external"></param>
         /// <remarks>
         /// added for .NET 3.5 support
         /// </remarks>
         internal ZipFileInfo(object external) {
            this.external = external;
         }
 
         private object GetProperty(string name) {
            return external.GetType().GetProperty(name, BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic).GetValue(external, null);
         }
 
         public override string ToString() {
            return Name;// base.ToString();
         }
 
         public string Name {
            get { return (string)GetProperty("Name"); }
         }
 
         public DateTime LastModFileDateTime {
            get { return (DateTime)GetProperty("LastModFileDateTime"); }
         }
 
         public bool FolderFlag {
            get { return (bool)GetProperty("FolderFlag"); }
         }
 
         public bool VolumeLabelFlag {
            get { return (bool)GetProperty("VolumeLabelFlag"); }
         }
 
         public object CompressionMethod {
            get { return GetProperty("CompressionMethod"); }
         }
 
         public object DeflateOption {
            get { return GetProperty("DeflateOption"); }
         }
 
         public Stream GetStream() {
            return this.GetStream(FileMode.Open, FileAccess.Read);
         }
 
         public Stream GetStream(FileMode mode, FileAccess access) {
            var meth = external.GetType().GetMethod("GetStream", BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic);
            return (Stream)meth.Invoke(external, new object[] { mode, access });
         }
 
      } // struct ZipFileInfo
   } // class ZipArchive
} // namepace
 
Some examples:
 
      private void testCreateZip() {
         string newZip = @"E:\Temp\Tests\Tests.zip";
         using (var zip = ZipArchive.OpenOnFile(newZip, FileMode.Create, FileAccess.ReadWrite, FileShare.None, false)) {
            var fs = zip.AddFile(@"FolderA\FileA.txt");
            var str = fs.GetStream(FileMode.Open, FileAccess.ReadWrite);
            var input = new FileStream(@"E:\Temp\Tests\FolderA\FileA.txt", FileMode.Open);
            ZipArchive.CopyTo(input, str);
            str.Dispose();
 
            fs = zip.AddFile(@"FolderB\FileB.txt");
            str = fs.GetStream(FileMode.Open, FileAccess.ReadWrite);
            input = new FileStream(@"E:\Temp\Tests\FolderB\FileB.txt", FileMode.Open);
            ZipArchive.CopyTo(input, str);
            str.Dispose();
         }
      }
 
      private void testReadZip() {
         string existingZip = @"E:\Temp\links.zip";
         using (var zip = ZipArchive.OpenOnFile(existingZip)) {
            listBox1.Items.AddRange(zip.Files.OrderBy(file => file.Name).Select(file => file.Name).ToArray());
         }
      }
 
cheerioh
SiS
GeneralThanks!memberrbignu5-Jan-13 1:44 
very useful
QuestionAwesome!memberWildbird3-Jan-13 10:04 
This code example if full of awsomeness!
GeneralMy vote of 5memberAwchie13-Nov-12 15:28 
It removes the need for third party libraries.
GeneralThanks! High 5 DudememberAwchie13-Nov-12 15:21 
I really need a file zipper that does not use third party libs (to not become their %!$@&). Thanks for the article!
GeneralMy vote of 5membergelo_one12-Nov-12 23:07 
Good!
GeneralThanksmemberTheGrandBazaar12-Nov-12 4:21 
This is very useful for people not working with .NET 4.5 and who
don't want to use third party libraries.
GeneralMy vote of 5memberElron022-Nov-12 13:28 
exactly what i wanted! Thank you.
GeneralMy vote of 5memberCollin Heine21-Sep-12 7:57 
I've been looking for some kind of guidance like this forever! Thanks!
GeneralMy vote of 5memberMark Lemke12-Sep-12 0:57 
It seems that I forgot to vote. This article is very useful, especially since I can't upgrade to .NET 4.5 and don't want to use a third party solution for multiple reasons.
QuestionNo file compression ??memberBillJam119-Aug-12 8:25 
Am I missing something? I get the code here to work and create a zip file and add files to it but they are all uncompressed even though I'm specifying Deflated for the compression option and Normal for the Deflate option (yes, I tried Maximum here also).
Anyone?
AnswerRe: No file compression ??memberBillJam119-Aug-12 8:49 
And I new it had to be something stupid. My test files were already compressed - didn't know that.
QuestionProblem with non ascii characters in file names [modified]memberNyarlatotep18-Jun-12 7:45 
This is an useful class.
 
However, a strange problem arises when file names added (AddFile) to the zip folder, contain non ascii characters (accented characters, by the way).
The zip archive seems to contains all the files, but if I try to open that zip with windows explorer, those files are not shown in the archive list.
Opening the archive with 7zip, shows all the files but accented letters are replaced with question marks.
Could it be a problem with WindowsBase/Reflection non supporting unicode strings (it sounds weird, anyway)?
 
Thanks

modified 18-Jun-12 13:55pm.

AnswerRe: Problem with non ascii characters in file namesmemberD. Christian Ohle19-Jun-12 6:54 
Hello Nyarlatotep
you are right, I could reproduce it with a filename like "é.txt".
But it is no problem with Reflection.
Internal the MS...ZipArchive has a field _blockManager._encoding and this is as Type fixed to System.Text.ASCIIEncoding.
It means we can not change the encoding.
It seems to be a limitation, but there is always a way, unfortunately I found nothing.
Would be interesting if someone has an idea.
Regards
Christian

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web04 | 2.6.130617.1 | Last Updated 12 Jun 2011
Article Copyright 2011 by D. Christian Ohle
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid