Click here to Skip to main content
Click here to Skip to main content

C# Use Zip Archives without External Libraries

By , 12 Jun 2011
 

Introduction

I found a lot of articles on how to access Zip archives in C# but all with significant disadvantages. The main problem is that Microsoft has Zip archives implemented in the operating system but there is no official API that we can use. In C# for example, we have the System.IO.Compression.GZip but there is no adequate System.IO.Compression.Zip class.

There are some free .NET compression libraries like SharpZipLib and .NET Zip Library, but this leads to additional installation effort and licensing problems.

It is also possible to use the free J# Library. J# has included Zip to keep compatible with the Java libraries. But to bundle a 3.6 MB DLL vjslib.dll, just to support Zip, seems like a really goofy hack.

Since .NET 3.0, we can use the System.IO.Packaging ZipPackage class in WindowsBase.DLL. It's just 1.1 MB, and it just seems to fit a lot better than importing Java libraries.

Problem only that the ZipPackage class isn't a generic Zip implementation, it's a packaging library for formats like XPS and Office Open XML that happen to use Zip.

To access simple Zip archives with ZipPackage fails because the content is checked for Package conventions.

For example, there has to be a file [Content_Types].xml in the root and only files with specified extensions are accessible. Filenames with special characters and spaces are not allowed and the access time is not the best because of the additional Package link logic.

However, the assembly WindowsBase.DLL is preinstalled and the generic Zip implementation is inside. The only problem is that the generic Zip classes are not public and visible for the programmers. But there is a simple way to get access to this hidden API and I wrote a small wrapper class for this.

Background

A quick check in the Object Browser shows us that WindowsBase.DLL has a namespace MS.Internal.IO.Zip. This sounds good, but there are no public classes visible.

However, the following call:

var types = typeof(System.IO.Packaging.Package).Assembly.GetTypes();

gives us 824 class types, public and non-public and especially one with the name MS.Internal.IO.Zip.ZipArchive. Now it is easy to get this special class type and the methods and properties:

var type = typeof(System.IO.Packaging.Package).Assembly.GetType
		("MS.Internal.IO.Zip.ZipArchive");
var static_methodes = type.GetMethods(BindingFlags.Static | 
		BindingFlags.Public | BindingFlags.NonPublic);
var nostatic_methodes = type.GetMethods(BindingFlags.Instance | 
		BindingFlags.Public | BindingFlags.NonPublic);

and we get the most important methods:

static ZipArchive OpenOnFile(string path, FileMode mode, 
	FileAccess access, FileShare share, bool streaming);
static ZipArchive OpenOnStream(Stream stream, FileMode mode, 
	FileAccess access, bool streaming);
ZipFileInfo AddFile(string path, 
	CompressionMethodEnum compmeth, DeflateOptionEnum option);
ZipFileInfo GetFile(string name);
ZipFileInfo DeleteFile(string name);
ZipFileInfoCollection GetFiles();
void Dispose();

The same procedure for ZipFileInfo and we get:

Stream GetStream(FileMode mode, FileAccess access);

and properties like: Name, LastModFileDateTime, FolderFlag...
This is all what we need to implement a small wrapper class and access over Reflection:

class ZipArchive : IDisposable
{
  private object external;
  public static ZipArchive OpenOnFile
      (string path, FileMode mode, FileAccess access, FileShare share, bool streaming)    
  {
    var type = typeof(System.IO.Packaging.Package).Assembly.GetType
		("MS.Internal.IO.Zip.ZipArchive");
    var meth = type.GetMethod("OpenOnFile", BindingFlags.Static | 
		BindingFlags.Public | BindingFlags.NonPublic);
    return new ZipArchive { external = meth.Invoke(null, new object[] 
		{ path, mode, access, share, streaming }) };
  } 
  //...
  public class ZipFileInfo //...
}

The complete ZipArchive wrapper implementation is in the demo project ZipArchiveTest in Program.cs.
Only 97 lines for this class and we can use it in a code sequence like this:

var str = new MemoryStream();

//create some files:
using (var arc = ZipArchive.OpenOnStream(str))
{
  var doc1 = new XDocument(new XElement
	("root", new XElement("item"), new XElement("item"), new XElement("item")));
  var doc2 = new XDocument(new XElement("root", Enumerable.Repeat
		("item", 1000).Select(p => new XElement(p))));
  using (var fs = arc.AddFile("test1.xml").GetStream
		(FileMode.Open, FileAccess.ReadWrite)) doc1.Save(fs);
  using (var fs = arc.AddFile("test2.xml").GetStream
		(FileMode.Open, FileAccess.ReadWrite)) doc2.Save(fs);
}

// read the files:
using (var arc = ZipArchive.OpenOnStream(str))
{
  var doc1 = XDocument.Load(arc.GetFile("test1.xml").GetStream());
  var doc2 = XDocument.Load(arc.GetFile("test2.xml").GetStream());
  var doc3 = XDocument.Load(arc.GetFile("dir/test3.xml").GetStream());
}

Using the Demo

The demo program ZipArchiveTest using the ZipArchive class is as small as possible and can show the content of Zip archives without any restrictions. Double click to files in the ListBox opens a new window to show the file content as text.

Conclusion

Microsoft should publish its hidden ZipArchive class, but till then, we can use such a simple wrapper to save terabytes of data worldwide.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

D. Christian Ohle
Germany Germany
Member
No Biography provided

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
QuestionLooks like someone has stolen your article...memberFlorian Storck22 May '13 - 2:59 
...take a look here:
 
http://chiragkanzariya.blogspot.de/2012/10/c-use-zip-archives-without-external.html
AnswerRe: Looks like someone has stolen your article...memberD. Christian Ohle22 May '13 - 6:01 
Thanks, no problem, I have enough money Smile | :)
QuestionGreat! But...memberCainKellye22 Apr '13 - 1:36 
It's nice to have an option to use zip without external dll-s. Thanks for the tip!
But reflection, they say, have a big impact on performance. So to minimize that impact you should save the PropertyInfo for each property and MethodInfo for each method you use, as well as the type like this:
 
Type reflectedType = external.GetType();
PropertyInfo propInfo = reflectedType.GetProperty(propertyName, BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic);
MethodInfo methodInfo = reflectedType.GetMethod("GetStream", BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic);
 
So when you need to retireve the value of the property, you can use:
propInfo.GetValue(external, null);
or
methodInfo.Invoke(external, ...);
instead of calling GetType(), GetProperty() and GetMethod() over and over again.
AnswerRe: Great! But...memberD. Christian Ohle22 Apr '13 - 2:00 
Hi, thanks, please check the code in the comment in "New ZipArchive class without reflection calls".
It is below probably at page two.
This should be fast as possible but the Expression compilation cost one time of course.
GeneralMy vote of 5memberRalf_13 Mar '13 - 9:07 
Very helpful !
 
With .NET 3.5/Win7SP1 I couldn't extract items with special German characters (umlaut) zipped with WinZip, since the encoder and decoder object fallback settings are char(63)='?'. But this character is forbidden for file system names.
GeneralRe: My vote of 5memberD. Christian Ohle13 Mar '13 - 20:52 
Hello Ralf
There is a simple workaround, it's possible to Change the encoding. please check the comment "Problem with non ascii characters in file names" some lines before:
 
var blockManagerType = packageType.Assembly.GetType("MS.Internal.IO.Zip.ZipIOBlockManager");
FieldInfo blockManagerField = zipArchiveType.GetField("_blockManager", BindingFlags.Instance | BindingFlags.NonPublic);
object blockManager = blockManagerField.GetValue(zipArchiveObject);
FieldInfo encodingField = blockManagerType.GetField("_encoding", BindingFlags.Instance | BindingFlags.NonPublic);
encodingField.SetValue(blockManager, new FakeAsciiEncoding());
public sealed class FakeAsciiEncoding : ASCIIEncoding
{
    private readonly Encoding encoding = GetEncoding(858);      
    public override byte[] GetBytes(string s) { return this.encoding.GetBytes(s); }     
    public override string GetString(byte[] bytes) { return this.encoding.GetString(bytes); }
}
 
Greetings from Germany
GeneralRe: My vote of 5memberRalf_14 Mar '13 - 0:30 
Sorry for my blindness, Christian - who can read is on the inside track.
Additionally I'm new with .NET.
Hence my question:
What is the type/definition of "zipArchiveObject" in call to function blockManagerField.GetValue() of this workaround ?
 
Using a ThreadPool to extract an archive of 0.5GB is very fast and takes about 30s. But I have to re-open the archive in each thread, since passing object references (stream, ZipArchive, ZipFileInfo) doesn't work between the different appdomains.
 
Kind regards from the same country
Ralf
GeneralRe: My vote of 5memberD. Christian Ohle14 Mar '13 - 6:13 
Hi, try this. "zip Archive Object" is equivalent to "external", the object itself, not the type or any member info of the type.
You said NET is new for you. In this case you could think this object is something like an IUnknown pointer in c++ or better something like IDispatch.
The way over reflection in NET is something like to ask IDispatch for methodes and properties based on names, and this is what we are doing.
 
For the case that you need more speed for many files - there is the possebillity to compile such relativ slow reflection calls to delegates.
The code is in the comment "New ZipArchive class without reflection calls" (13. Jun '11)
There is one time the compilation at first request but then the calls are realtime, therefore it makes sense, the additional effort, for many files, not for use sometimes.
 
Regards
 
public static ZipArchive OpenOnFile(string path, FileMode mode = FileMode.Open, FileAccess access = FileAccess.Read, FileShare share = FileShare.Read, bool streaming = false)
{
  var type = typeof(System.IO.Packaging.Package).Assembly.GetType("MS.Internal.IO.Zip.ZipArchive");
  var meth = type.GetMethod("OpenOnFile", BindingFlags.Static | BindingFlags.Public | BindingFlags.NonPublic);
 
  //old default ANSI:
  //return new ZipArchive { external = meth.Invoke(null, new object[] { path, mode, access, share, streaming }) };

  //new unicode extension:
  var zipArchiveObject = meth.Invoke(null, new object[] { path, mode, access, share, streaming });
  var blockManagerType = typeof(System.IO.Packaging.Package).Assembly.GetType("MS.Internal.IO.Zip.ZipIOBlockManager");
  var blockManagerField = type.GetField("_blockManager", BindingFlags.Instance | BindingFlags.NonPublic);
  var blockManager = blockManagerField.GetValue(zipArchiveObject);
  var encodingField = blockManagerType.GetField("_encoding", BindingFlags.Instance | BindingFlags.NonPublic);
  encodingField.SetValue(blockManager, new FakeAsciiEncoding());
  return new ZipArchive { external = zipArchiveObject };
}

GeneralRe: My vote of 5memberRalf_14 Mar '13 - 23:58 
Thank you very much, Christian.
 
My (beginners) error was to pass the zipArchiveObject casted as ZipArchive to the function GetValue.
 
The solution in "New ZipArchive" is very good as long as the ZipArchive is used in the same thread context. To extract each file in a different thread (parallelization), I can't pass a ZipArchive or ZipFileInfo object, but I have to re-open the archive in each thread and this always takes a lot of time for large achives. This, and of course the competitively hard disk access, makes the parallel extraction three times slower than in a single thread. I'll have to give up this idea probably.
 
Thanks again and have a nice weekend !
Ralf
GeneralMy vote of 5memberDanielSheets11 Mar '13 - 4:58 
Excellent!

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web03 | 2.6.130523.1 | Last Updated 12 Jun 2011
Article Copyright 2011 by D. Christian Ohle
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid