About 7-Zip
7-Zip is an open-source archive program with plug-in interface. New archive formats and/or archive codecs can be added by DLLs. 7-Zip ships with several archive formats preinstalled:
- 7z — its own format features good compression (LZMA, PPMd) but can be slow in terms of packing/unpacking
- Packing / unpacking: ZIP, GZIP, BZIP2 and TAR
- Unpacking only: RAR, CAB, ISO, ARJ, LZH, CHM, Z, CPIO, RPM, DEB and NSIS
The project is written in C++ language.
You can find more on the official 7-Zip site — here.
About this Contribution
This contribution allows you to use 7-zip archive format DLLs in your programs written in .NET languages.
I created this module for my own project to have the ability to work with archives. Currently my project only has extract capabilities, so only this part of 7-Zip interface translated to C#. Later I plan to translate compress capability as well. For now if you need such functionality you can implement it by yourself, with this code, and the 7-Zip source code.
This translation is tested and already working in my own project.
Implementation Details
All communication with archive DLLs is done with COM-like interfaces (why COM-like, and not COM can be see in the known issues section). Callbacks are also implemented as interfaces.
Every DLL contains a class that can implement one or more interface. Some formats allows only extracting, some also provide compress abilities. Public interfaces translated to C#:
IProgress
- basic progress callback
IArchiveOpenCallback
- archive open callback
ICryptoGetTextPassword
- callback for prompt password for archive
IArchiveExtractCallback
- extract files from archive callback
IArchiveOpenVolumeCallback
- open additional archive volumes callback
ISequentialInStream
- simple read-only stream interface
ISequentialOutStream
- simple write-only stream interface
IInStream
- input stream interface with seek capability
IOutStream
- output stream interface
IInArchive
- main archive interface
Every DLL export function is for creating archive class handlers and functions in order to get archive format properties. These functions are translated as .NET delegates:
CreateObject
- creates object with given class id. Used mostly for create IInArchive
instance.
GetHandlerProperty
- get archive format description (implemented class ids, default archive extension, etc)
Update (1.3): In 7-Zip 4.45 there are some changes in the DLL interface. Now all archive formats and compression codecs are implemented as one big DLL. So several new exported functions (and delegates for these functions in translation) are added to handle several archive handler classes in one DLL.
Points of Interest
7-Zip interfaces uses variants (PropVariant) for property values. C# does not support such variants as classes and all such parameters are implemented in C# as IntPtr
. This is done for compatibility and because I prefer not to use unsafe code in my projects.
Fortunately a managed class System.Runtime.InteropServices.Marshal
has a method GetObjectForNativeVariant
that you can use for converting such "pointers" to objects. However this method does not handle all PropVariant
types (for example VT_FILETIME
), so for these cases I added my GetObjectForNativeVariant
method to this translation.
7-Zip works with files through its own interfaces, so if you want to open file on disk, or in memory you need to provide class implement one or more necessary interfaces. Several such wrapper classes are also present in this translation (they are wrap around standard .NET Stream class).
Update (1.2): Most of the complexity related to PropVariant
processing is now hidden in special PropVariant
structure. And interface methods now return PropVariant
instead of IntPtr
.
Known Issues
The first and most disappointing issue is that you cannot use 7-Zip DLLs directly. This means that you cannot simply take such DLLs from 7-Zip distribution and use them in your projects. This is because of the incomplete COM interfaces implementations in 7-Zip code. All issues are related to IUnknown.QueryInterface
implementation. 7-Zip's QueryInterface
does not return IUnknown
interface if prompted (this part is most critical for working with COM-interfaces in .NET), and some classes do not return any interface at all!
This is done because 7-Zip code is C++ code and works with pointers, and most functions returns direct pointers to interface implementation. That means that 7-Zip code not use QueryInterface
at all. Sad, but .NET works in a different way, and the first access to any interface always goes though QueryInterface
and IUnknown
.
So if we use DLLs directly we have constant InvalidCastException
. So we need to make several changes in 7-Zip code and rebuild DLLs. Or ask Igor Pavlov to include such changes to the 7-Zip code itself :)
Important Update: Starting from 7-Zip 4.46 alpha Igor did necessary changes in the code. So, from this version forward, you can use format DLLs directly, without applying any patch. Superb!
The second issue is much smaller one. It is related to multi-threading. If you plan to use 7-Zip interfaces only in one stream you have no problem. The problem comes when you try to use one interface in several threads. In this case all threads except the main one (threads where interfaces are created) throw exceptions on any interface method calls. This is because of RCW behavior. RCW is an object that wraps COM-interface in .NET. When you try to use interface in different thread RCW tries to marshal interface and fails (because this implementation does not support ITypeInfo
).
Fortunately I've found simple solutions for this. Main interface (IInArchive
) returns as IntPtr
, and not as RCW object. When you need to access this interface, call System.Runtime.InteropServices.Marshal.GetTypedObjectForIUnknown
or any other related method and get RCW object. If you need to use this interface in another thread simple call System.Runtime.InteropServices.Marshal.FinalReleaseComObject
(or ReleaseComObject
), and create another RCW wrapper around returned IntPtr
pointer. Of course in this case you can use interface only in one thread in time, but this is better than using interface only in one thread. And any logic can be easily implemented with correct thread locking.
And the third issue is a well known issue but I think it must be noted here. It appears that .NET runtime does not support COM interfaces inheritance (interfaces marked with the ComImport
attribute). This is definitely a .NET bug, but I don't know when Microsoft fixes this bug, or fix if they fix it at all.
There is simple solution to avoid this bug. Inherited interface must be declared as standalone one and first methods must be methods of inherited interfaces in the order of appearance. You can see sample of such "inheritance" in this translation source.
Demo
Due to many requests, I have spend some time and written a little demo program. The demo program lacks proper error checking, lacks different archives support (zip format is hardcoded in source, but can be easily changed), it lacks almost everything, but it has two advantages: it's simple, and it works.
The demo has only two modes, the first to list all files in archive, and the second is to extract a single file from the archive. I think that this is enough to understand how to use 7-zip interfaces and how to create something more complex.
If you want to run demo, don't forget to put 7z.dll (can be found on official 7-zip site) to the executable folder with executable.
Version History
1.5 - Small demo added
1.3 - Added two new delegate for features added in 7-Zip 4.45
1.2 - Variant type changed from IntPtr to newly created PropVariant structure
1.1 - Stream wrappers added, minor interface translation changes for better usability
1.0 - initial release