5,317,180 members and growing! (23,352 online)
Email Password   helpLost your password?
Desktop Development » Files and Folders » Files     Intermediate License: The Code Project Open License (CPOL)

Cabinet File (*.CAB) Compression and Extraction

By Elmue

How to implement creation and extration of Microsoft CAB files
C++, C++/CLI, C#, Windows, .NETVS2005, Visual Studio, Dev

Posted: 29 Aug 2006
Updated: 2 Jul 2008
Views: 152,335
Announcements
Want a new Job?



Search    
Advanced Search
Sitemap
80 votes for this Article.
Popularity: 9.10 Rating: 4.78 out of 5
1 vote, 1.3%
1
1 vote, 1.3%
2
2 votes, 2.5%
3
6 votes, 7.5%
4
70 votes, 87.5%
5

Introduction

With this project, C++ and .NET programmers get a very versatile library for compression and extraction of Microsoft CAB files.

.NET 1.1 does not offer compression functionality. .NET 2.0 offers the System.IO.Compression.GZipStream class. But this is awkward to use and very primitive; it can only compress a stream but it is not possible to compress folders containing files and subfolders.

If you search the internet for more comfortable compression libraries, you find, for example, ICSharpCode.SharpZipLib.dll which offers ZIP compression. But this library is awkward to use and buggy, and so is unusable. Although the bugs have been known for years, the author has not fixed them.

I asked myself why should I search for another open source library (which will again have other bugs) while Windows itself supports CAB archives since the first days? Microsoft's Cabinet.dll (in the System32 directory) is not buggy. Many Microsoft installers (like the installer for Internet Explorer or Windows patches) use it. Additionally, CAB reaches a much better compression ratio than ZIP. Finally I found the project "Microsoft Cabinet Templates" from Luuk Weltevreden on CodeProject. He created a very versatile wrapper around Microsoft's Cabinet.dll, consisting of C++ templates.

But he didn't even do half of the work. He wrote good extraction classes, but the compression class was completely missing. I worked on his code, fixed a serious bug, simplified some awkward code, removed the templates, and added the missing compression functionality, encryption, Unicode support, internet extraction and more. Additionally, I added all files you need to compile the project. There is no need to download anything from Microsoft anymore.

Features

  • This library is VERY easy to use.
  • This library is lightweight and VERY fast. (pure C++ code)
  • This library can be extended very easily.
  • One project is for C++ developers.
  • Another project is for .NET developers.
  • Both projects compile on Visual Studio .NET (7.0), .NET 2003 (7.1) and .NET 2005 (8.0)
  • The C++ project additionally compiles on Visual Studio 6
  • The C++ project runs on Windows 95, 98, ME, NT, 2000, XP, Vista and higher
  • The .NET project runs on all platforms where the .NET framework is installed
  • Optional Encryption / Decryption of CAB files.
  • CAB files can contain trees of subfolders and files.
  • File dates (in either UTC or local time) and file attributes are preserved when compressing / extracting
  • Extraction of CAB files which are embedded in the resources of your Win32 project or .NET project.
  • In .NET you can additionally extract CAB files from a stream.
  • The compression can split large CAB files into multiple pieces. (Pack1.cab, Pack2.cab, Pack3.cab etc)
  • An event handler allows to display the compression / extraction progress in the GUI of your application.
  • A lot of event handlers are called during compression and extraction which allows to interact with the progress (for example filtering specific files)
  • Both projects come with a demo application which shows how to compress and extract files and embedded CAB resources. Encryption and Decryption is also included in the demo.
  • This project makes use of Microsoft's Cabinet.dll in your System(32) directory, which is part of the operating system since Windows NT/98.
  • Cabinet.dll will be loaded only when it is needed and unloaded afterwards.
  • No additional Microsoft downloads required to compile and run this project.
  • The C++ and the .NET project support Unicode in paths and filenames (e.g. Japanese) independent if compiled as MBCS or UNICODE.
  • The .NET library is thread safe.
  • Since version Jan 2007 Msvcp70/71/80.DLL is not required anymore.
  • Since version Oct 2007 Msvcr70/71.DLL is not required anymore.
  • Since version Apr 2008 you can extract CAB files directly from a server, (URL extraction from HTTP(S) / FTP)
  • You can even extract only specific files out of a CAB on a server without downloading the entire CAB file. (partial download)
  • You can also abuse this library to only download a file (MP3, AVI,..) from the internet to disk without CAB extraction.
  • Since version May 2008 the .NET library also extracts files with the extensions .URL or .LNK.
  • You can download an Universal Installer (includes source code) which is based on the CabLib Library for software setups and updates.
  • Since version June 2008 the encryption/decryption of CAB files uses the Blowfish algorithm.

Limitations

  • The total size of files which are to be packed into the CAB file (or all splitted CAB files together) must not exceed 2 GB.
  • The resulting CAB file must not exceed 2 GB.
  • This project cannot compress or extract InstallShield CAB files. (see below)
  • You cannot add files to or delete files from an existing CAB archive.
  • For Windows 95 you have to deliver Microsofts "Cabinet.dll" which is part of the operation system since Windows NT/98.
  • Thanks to Microsoft's ever missing downward compatibility you cannot use the /MT compiler switch since Visual Studio 2005 anymore. If you compile with Visual Studio 8.0 you have to deliver MSVCR80.DLL and MSVCM80.DLL with your project.
    Or you deliver CabLib.DLL compiled with VS 2003 instead, which does not require any additional DLLs and also runs on Framework 2.0.

Source Code

You will find a very clean source code with a tidy error handling and plenty comments written by a very experienced programmer.

You get a high quality library and you will save several weeks of coding time. The code is reusable, you can reuse for example the Internet class for downloads from FTP / HTTP(S) or the String class to encode /decode UTF strings.

You can study LibExtract.h to see how managed callbacks can be passed to unmanaged C++ code (which is not easy and requires gcroot or GCHandle)
All code is written in Plain C++ / Manged C++ and does neither rely on MFC nor on other libraries to avoid problems with missing DLLs on the computers where your application will run.

Different CAB File Formats

There are two completely different types of CAB files: The ones which this project supports are the "Microsoft CAB" files (also called "MS-CAB"). The Microsoft pack format is also known as MSZIP.
Some years later, InstallShield created the "InstallShield CAB" files. But these are absolutely incompatible with the MS-CAB files although they use the same file extension!

If you open a MS-CAB file with a hex editor, you will notice that the first four bytes are "MSCF" (MicroSoft Cab File), while the first three bytes of an InstallShield-CAB file are "ISc". (InstallShield Cab). You cannot open or create InstallShield CAB files with this project. There exist only very few tools which are capable of managing InstallShield CAB files; for example, the tool WinPack which you can download from my homepage.

Compression Ratio

MS-CAB files have a very good compression ratio. To test this I packed a bunch of about hundred text files. This is the result of my test:

Pack Format Packed File Size
CAB 139 kB
TAR + GZ 142 kB
ARJ 174 kB
TAR + LZH 189 kB
RAR 197 kB
TAR + JAR 242 kB
ZIP 242 kB

Intelligent Installers

Microsoft's intention of CAB files was to use them for installations:

  • They are used in the Internet Explorer 6 Setup.
  • You see plenty CAB files on the Windows 95/98/ME setup CD.
  • All files ending with an underscore like "Kernel32.dl_" on your Windows 2000/XP setup CD are CAB files with the wrong file extension.

Many installers are stupid. If you start an unintelligent installer (like the one of Nero 6) you will see:

  1. that it first extracts ALL files from the packed EXE installer into a temp directory. This is slow and the user has to wait until the first dialog will open.
  2. It is wasting diskspace and if the user's drive space on C: is low he will get an error of "no disk space".
  3. It carries the risk that temporary files remain on the disk when aborting the installation.

In contrary with this Cabinet library you can build an Intelligent Installer:

Scenario 1. You Deliver only Two Files to your Clients: A Tiny EXE File and a Huge CAB File

Put a huge CAB file on a local server or CD or DVD and let the user only start a tiny EXE setup file. The installer will start immediately and extract only the files from the CAB file which are really needed. This cabinet library obviously can extract the whole CAB file. But it is also possible to extract only specific files directly from the CAB file on server/CD/DVD to harddisk. The data transfer is compressed and — if you like — encrypted.

Scenario 2. You Deliver only one Huge EXE File to Your Clients

You can also embed the CAB file into the Setup.exe and directly extract specific files from the embedded resource in memory without creating temporary files. This scenario only makes sense for small setup's otherwise users with little RAM will get problems if they start a 100 MB EXE file!

Scenario 3. You Deliver only a URL to your Clients

If you chose URL extraction you deliver only a tiny EXE file which downloads the CAB file from the internet. (FTP or HTTP(S))

If you use this for updates you can even configure the Cabinet library to download only the files from the CAB archive which require an update. Example: Your company sells an ASP server which consists of 500 files. You put a CAB file of 100 Megabyte on your Update server which contains all files. Let's say the client wants to update to the latest version and needs to replace only 15 files of 500 files. The Cabinet library downloads only 2 Megabyte instead of 100 Megabyte from your server! The data transfer is compressed and optionally encrypted.

If you need an installer/updater, download my project "An Intelligent .NET Multilanguage Installer

The C++ Project

To add CAB support to your C++ project download the project Cabinet at the top of this page (a demo application is included) and copy the entire subfolder "Cabinet" to your project.

The .NET Project

The second project is for .NET developers. I wrote a wrapper in Managed C++ around this C++ project. The result compiles into a .NET DLL. You simply add the .NET assembly CabLib.dll to the references of your .NET project (C# or Visual Basic .NET or Managed C++) and you get CAB support. In the second download at the top of this page you will find CabLib.DLL already compiled and ready to use. (A demo application is included)

Cabinet.dll

Microsoft's tiny Cabinet.dll which is located in your System(32) directory since Windows NT/98 offers the following Compression API:
FciCreate FciAddFile FciFlushCabinet FciFlushFolder FciDestroy

And the Extraction API:
FdiCreate FdiIsCabinet FdiCopy FdiDestroy

You get a detailed description of these functions in the file Microsoft Cabinet.dll Doku.doc which you find in both projects and the files FCI.H and FDI.H contain plenty comments.

The API in Cabinet.Dll uses a bunch of Callbacks which are called while a CAB file is created or extracted. The C++ project wraps these callbacks and you can override each of the callback functions to modify the behaviour.

The .NET project offers events which you can use to handle these callbacks in your .NET application.

You can use these callbacks / events to filter specific files or you can read compression data from a stream or from memory instead of a file on disk. This makes the library extremly versatile. (examples see below)

Unicode

The underlying Cabinet.DLL does not support Unicode, but this project allows Unicode paths and filenames to be compressed by encoding them as UTF7 in the CAB archive. This has the advantage over MBCS (MultiByte) that the encoding is independent of any codepage and avoids lots of problems. (e.g. UTF avoids using the buggy GetShortPathName API)

The C++ project offers all file-functions in an ANSII version and an Unicode version.
(e.g. ANSII: AddFileA(), Unicode (Wide): AddFileW())

If your application is run on Windows 95/98/ME the library automatically detects the operating system and uses the appropriate API. You DON'T have to care about calling the ..A() or ..W() functions depending on the operating system. The ..W() functions will also work on Windows 95/98/ME. To use the Unicode functionality it is NOT required to compile the C++ project with the UNICODE compiler switch (#ifdef UNICODE). Although compiled as MBCS the Wide versions of the functions ...W() will work!

The .NET project uses the Unicode versions ..W() of the underlying C++ code.

Only the operating system limits the usage of Unicode: If you extract a CAB file which contains Unicode filenames on Windows 95/98/ME the Unicode files are skipped as it is impossible to store for example Japanese files on an English Windows 95/98/ME.

UTC Time

With the parameter b_UtcTime in CreateFCIContext() you can decide if during compression the CAB archive will store file times as UTC time or local time. On extraction the library uses automatically the correct time so you don't have to worry about this. The Windows file system stores all file times always at UTC time! The file times you see in Explorer depend on your timezone and daylight saving. (Open a folder in Explorer, remember the file times, change your timezone in Control Panel and you will see that all files on your disk show another time now!!)

b_UtcTime = false:

  • Compression: Read the UTC file times from harddisk, convert them to local time and store these in the CAB file.
  • Extraction: Read local times from the CAB file, convert them to UTC time and store these on harddisk.

b_UtcTime = true:

  • Compression: Read the UTC file times from harddisk and store these unchanged in the CAB file.
  • Extraction: Read UTC times from the CAB file and store these unchanged on harddisk.

It is recommended to compress using UTC time so after changing the PC's timezone or after daylight saving has changed the files in the CAB archive and on disk will still have the same time.

On the other hand if you compress/extract using local time a CAB file extracted in winter has a time shift of one hour compared with a CAB file extracted in summer.

Using the Compression Functions

File Compression

The following sample compresses into a file C:\Temp\Packed.cab.
The file C:\Windows\Explorer.exe will be packed into a subfolder FileManager in the CAB file.
The file C:\Windows\Notepad.exe will be packed into a subfolder TextManager in the CAB file.

C++

Cabinet::CCompress i_Compress;
if (!i_Compress.CreateFCIContextA("C:\\Temp\\Packed.cab"))
    { Error handling... }

if (!i_Compress.AddFileA("C:\\Windows\\Explorer.exe", "FileManager\\Explorer.exe", 0))
    { Error handling... }


if (!i_Compress.AddFileA("C:\\Windows\\Notepad.exe",  "TextManager\\Notepad.exe", 0))
    { Error handling... }

if (!i_Compress.FlushCabinet(FALSE))
    { Error handling... }

C#

ArrayList i_Files = new ArrayList();
i_Files.Add(new string[] { @"C:\Windows\Explorer.exe", @"FileManager\Explorer.exe" });
i_Files.Add(new string[] { @"C:\Windows\Notepad.exe",  @"TextManager\Notepad.exe"  });

CabLib.Compress i_Compress = new CabLib.Compress();
i_Compress.CompressFileList(i_Files, @"C:\Temp\Packed.cab", 0);

You can also easily compress all HTM files in the folder C:\Web and all its subfolders into a CAB file which will reflect the folder structure found on harddisk:

C#

CabLib.Compress i_Compress = new CabLib.Compress();
i_Compress.CompressFolder(@"C:\Web", @"C:\Temp\Packed.cab", "*.htm", 0);

Compression Splitting

If you want to deliver your data on a medium with limited size or for download on a webpage you can split the CAB file into pieces which the extraction functions will automatically put together afterwards.

The following sample will create CAB files of 200 KB.
In this case the file name !MUST! contain a %d at the end!
The minimum allowed split size is 20 KB.

C++

Cabinet::CCompress i_Compress;
if (!i_Compress.CreateFCIContextA("C:\\Temp\\Packed_%d.cab", TRUE, 200000))
    { Error handling... }

etc..

C#

i_Compress.CompressFileList(i_Files, @"C:\Temp\Packed_%d.cab", 200000);
or
i_Compress.CompressFolder(@"C:\Web", @"C:\Temp\Packed_%d.cab", "*.htm", 200000);

Setting the Compression TEMP Directory

During compression Cabinet.DLL will create some temporary files which will be automatically deleted afterwards.

By default it uses the TEMP directory which Windows specifies. If you want to compress huge files and the space on drive C: is low you should specify a TEMP directory on another drive. It is possible to use the same directory as output folder for the CAB file and as TEMP directory.

C++ and C#

i_Compress.SetTempDirectory("E:\\Temp");

Encryption

You can encrypt the CAB file with a key. The C++ code encrypts/decrypts the CAB data "on the fly" in blocks of 8 Bytes using the Blowfish algorithm. Blowfish is a very fast, symmetrical, license-free algorithm.

As encryption key you can use any binary data up to 72 Byte length. If the key is longer than 72 Bytes the remaining bytes will be ignored. If the key is shorter than 72 Bytes, some bytes are reused.

It is possible but not recommended to use a plain text password directly for the Blowfish encryption. (see: KDF) Instead you should derive a binary hash from the plain text password.

The .NET code does this with a SHA 512 hash which always has a length of 64 Bytes.

In the .NET project you can set a plain text password (string) of any length, of which first a 64 Byte SHA hash is derived and then this hash is used as key for the Blowfish encryption of the CAB data.

You can also directly set your own binary data as key for Blowfish.

C#

i_Compress.SetEncryptionKey(String);   // SHA 512 + Blowfish

or

i_Compress.SetEncryptionKey(Byte[72]); // only Blowfish

C++

i_Compress.SetEncryptionKey(void* p_Key, DWORD u32_KeyLength); // only Blowfish

More Compression Functions

Normally you will not need the following C++ functions:
With i_Compress.AbortOperation() you can abort a lenghty compression. Obviously this must be called from another thread.

With i_Compress.FlushFolder() you can force that the current folder is finished.

With i_Compress.FlushCabinet() you force that the current CAB file is closed and any further files to be added will be written to the next CAB file in the split sequence.

For details see the file Microsoft Cabinet.dll Doku.doc and the plenty comments in the file FCI.H

With i_Compress.EnumFiles() you can create an ArrayList of the files of interest. You can call this funtion once for each file extension you want to compress and then pass the ArrayList to CompressFileList().

Compression Callbacks / Events

CCompress.OnFilePlaced() (C++ Callback)
CabLib.Compress.evFilePlaced (.NET event)
This is called whenever the compression has successfully placed a file into the cabinet.

CCompress.OnUpdateStatus() (C++ Callback)
CabLib.Compress.evUpdateStatus (.NET event)

This can be used to update your GUI to display the progress during a lengthy compression.

ATTENTION

It is recommended to start compression from another thread to avoid a dead GUI and to be able to call AbortOperation() and show the progress.

In C# you must call Control.BeginInvoke() in the event handler routine to asynchronously access GUI elements otherwise you will run into trouble!

For details see the file Microsoft Cabinet.dll Doku.doc and the plenty comments in the file FCI.H

Extensions / Modifications

If you want a different behaviour for compression, do NOT modify the existing compression class CCompress. Instead derive a new class from CCompress and override the functions you want to change.

Using the Extraction Functions

File Extraction

During extraction there will be NO temporary files created.

The following sample extracts a file C:\Temp\Packed.cab into the folder E:\ExtractFolder. The required subfolders will be created automatically if the CAB file contains subfolders.

C++

Cabinet::CExtract i_Extract;
if (!i_Extract.CreateFDIContext()) 
    { Error Handling ... }
if (!i_Extract.ExtractFileA("C:\\Temp\\Packed.cab", "E:\\ExtractFolder"))
    { Error Handling ... }

C#

CabLib.Extract i_Extract = new CabLib.Extract();
i_Extract.ExtractFile(@"C:\Temp\Packed.cab", @"E:\ExtractFolder");

The .NET library automatically detects files with the extension .LNK and resolves the shortcut (to the CAB file) inside and files with the extension .URL are redirected to URL extraction.

Win32 Resource Extraction

The following sample extracts a Cabinet file which is stored in the Win32 resources of a DLL or EXE file. You can extract files DIRECTLY from a CAB file in memory!

There are some rules to respect when you add a CAB file to the resources of your project:

In the file Cabinet.rc of the C++ project and in CabLib.rc of the .NET project you find this line:

ID_CAB_TEST             CABFILE                 "Res\\Test.cab"

and in the file Resource.h you find this line:

#define ID_CAB_TEST                     101

IMPORTANT:
If you define ID_CAB_TEST in Resource.h, the resource will be stored under:
ResourceName = 101 (integer)
ResourceType = "CABFILE" (string)

If you do NOT define ID_CAB_TEST in Resource.h, the resource will be stored under:
ResourceName = "ID_CAB_TEST" (string)
ResourceType = "CABFILE" (string)


To extract the embedded resource Test.cab (which I added to both projects) write:

C++

Cabinet::CExtractResource i_Extract;
if (!i_Extract.CreateFDIContext()) 
    { Error Handling ... }

if (!i_Extract.ExtractResourceA("Cabinet.exe", ID_CAB_TEST, "CABFILE", "C:\\ExtractFolder"))
    { Error Handling ... }

C#

CabLib.Extract i_Extract = new CabLib.Extract();
i_Extract.ExtractResource("CabLib.dll", 101, "CABFILE", @"C:\ExtractFolder");

The first parameter specifies the filename (without path) from which to extract the Win32 CAB resource. You can set this = 0 (null) if the resource is inside the EXE which has created the process.

You can use this functionality to extract a CAB file from ANY DLL currently loaded into the process or from the application EXE itself.

To explore the resources of files which are already compiled download the tool ResourceHacker.

Most of the Windows Update patches contain a CAB file inside.

.NET Resource Extraction / Stream Extraction

.NET stores resources in a completely different way so you cannot see them in the tool ResourceHacker.
Under the resource's properties you must set the Build Action = "Embedded Resource".

To extract a file Test.cab which is located in a project named MyProject in a subfolder named Resources write:

C#

System.Reflection.Assembly i_Ass  = System.Reflection.Assembly.GetExecutingAssembly();
System.IO.Stream           i_Strm = i_Ass.GetManifestResourceStream(
                           "MyProject.Resources.Test.cab");

CabLib.Extract i_Extract = new CabLib.Extract();
i_Extract.ExtractStream(i_Strm, @"E:\ExtractFolder");

URL Extraction

Let's say you want to build an updater which updates your software package to the latest version on the client's computers. If your software package consists of 500 files, of which only 15 files have changed in the latest version you could deliver an update patch which contains these 15 files.

A more intelligent solution is to build an updater which extracts only the files which require an update from a huge CAB archive on your update server. This CAB archive contains the latest version of your entire software package and no matter which version is currently installed on your client's computer, the updater will only download the files which are out of date:

The data transfered will be compressed data and optionally encrypted data. (see CAB encryption)

Obviously you will have to pack a filelist with the MD5's of all files into the CAB archive so that the updater knows in advance which files he has to download and which files are up to date.

A complete installer project using this type of URL extraction (and more) can be found here: "An Intelligent .NET Multilanguage Installer"

Approximately the first 1% of the archive is an index which contains the filenames and pointers into the compressed data.

The Cabinet library reads this index and for each file it calls the callback function OnBeforeCopyFile. (see below)

If you return FALSE in this callback the file will neither be downloaded nor extracted.

To be able to download a file only partially from the server you have two options:

  1. An FTP server which permits resuming broken downloads (the server must support the FTP command "REST" (Restart))
  2. An HTTP(S) server that runs a little script which returns only the requested part of a file.
    The script must support the following GET command:
    www.server.com/Download.php?File=Setup_1.35.cab&Offset=20000&Length=50000

Here an example how such a script can be written in PHP: (Download.php)

if (strlen($_GET["File"]) > 0)
{
    DownloadFilePartially("Updates", $_GET["File"], $_GET["Offset"], $_GET["Length"]);
}


function DownloadFilePartially($sFolder, $sFile, $Offset, $Length)
{
    $sPath = getcwd() ."/";
    if (strlen($sFolder) > 0) $sPath .= $sFolder ."/";
    $sPath .= $sFile;

    // Block hacker attacks (restrict access to the given folder "Updates")
    // Details about hacker attacks see OwaspGuide.PDF on:  www.owasp.org

    if (substr($sFile, 0, 1) == '.' || strpos($sFile, "..") !== FALSE || 
        strpos($sFile, '/') !== FALSE || $Offset < 0 || $Length < 0 || 
       !is_file($sPath))
    {
        header("Status: 501 Invalid Parameters", true, 501);
        echo "Invalid Parameters";
        exit; // Do NOT return Status code 200 (OK) on errors !!!

    }

    $hFile = fopen($sPath, "rb");
    $Filesize = filesize($sPath);

    if ($Length == 0) $Length = $Filesize; // return entire file

        
    $Blocksize = 32 * 1024;
    $Length = min($Length, $Filesize - $Offset);
    
    header("Accept-Ranges: bytes");
    header("Content-Description: File Transfer");
    header("Content-Transfer-Encoding: binary");
    header("Content-Type: application/x-zip-compressed");
    header("Content-Disposition: attachment; filename=\"Cab.part\";");
    header("Content-Length: " .$Length);
    header("Last-Modified: " .gmdate ("D, j M Y H:i:s", filemtime($sPath)) ." GMT");
    
    fseek($hFile, $Offset);
    while (!feof($hFile) && connection_status() == 0 && $Length > 0)
    {
        $Blocksize = min ($Blocksize, $Length);
        set_time_limit(40);
        print(fread($hFile, $Blocksize));
        $Length -= $Blocksize;
        flush();
    }

    fclose($hFile);
    exit;
}

Microsofts Cabinet.dll requests block sizes between 8 Bytes and 32000 Bytes in the callback FDIRead. It would be nonsense to request such tiny blocks from the server. Additionally Cabinet.Dll accesses the CAB data not comletely sequentially. For that the Cabinet library uses the class CCache between Cabinet.dll and the internet to assure that no data block must be read twice and to improve the performance.

An important factor is the size of the blocks which the cache reads from the server.

A too small blocksize results in a bad performance because for each block a new data connection is opened to the server. A too big blocksize will download more data than is really needed when extracting only specific files from a huge archive.

I recommend to enable tracing in the file Trace.hpp and play around with the blocksize. In the Trace you will see the download speed in KB/s.

If you do not want to use the functionality of extracting only parts of a CAB file it is strongly recommended to download the entire CAB archive to a temporary file on disk and then extract it. The cabinet library will do that for you automatically and also delete the temporary file after extracting it. This results in the maximum download speed because there is a continuous data stream coming from the server.
You enable the download of the entire CAB file to disk by setting the blocksize = 0.

Blocksize
URL
Note
Blocksize = 50 kB Slow download speed, but no download of unnecessary data.
For partial updates, not recommended for complete setups.
http(s)://www.server.com:Port/Download.php?File=Setup_1.35.cab
user:password@ftp://ftp.server.com:Port/Updates/Setup_1.35.cab
HTTP(S) server must
run a script.

FTP server must
support the
command "REST".
Blocksize = 1 MB Higher download speed, but downloading more unnecessary data.
For partial updates, not recommended for complete setups.
http(s)://www.server.com:Port/Download.php?File=Setup_1.35.cab
user:password@ftp://ftp.server.com:Port/Updates/Setup_1.35.cab
Blocksize = 0 Highest download speed, download entire CAB, then extract it.
For full setups, not recommended for partial updates.
http(s)://www.server.com:Port/Updates/Setup_1.35.cab
user:password@ftp://ftp.server.com:Port/Updates/Setup_1.35.cab
No special
requirements
for the servers.

You should not use blocksizes greater than 2 MB as you don't gain any advantage by doing that.
See my comments in the file ExtractUrl.hpp !
If the Blocksize is zero, the download uses only a little buffer in memory to copy the data to disk.

If you do not specify username and password inside the URL, FTP will use anonymous login.

The class CExtractUrl uses the internet functionality in Wininet.dll to download data from the server. Internet Explorer 5.0 or higher is required.

C++

Cabinet::CExtractUrl i_Extract;
if (!i_Extract.CreateFDIContext()) 
    { Error Handling ... }

if (!i_Extract.ExtractUrlA(URL, Blocksize, DownloadFile, ExtractFolder))
    { Error Handling ... }
i_Extract.CleanUp(); // see below

C#

CabLib.Extract i_Extract = new CabLib.Extract();
i_Extract.ExtractUrl(URL, Blocksize, DownloadFile, ExtractFolder);
i_Extract.CleanUp(); // see below

There are several options using this function:

Download the entire CAB file to "C:\Installer.cab", then extract it:
Blocksize == 0 DownloadFile = "C:\Installer.cab" ExtractFolder = "C:\Extracted"
Download the entire CAB file to a temporary file, extract it, then delete it:
Blocksize == 0 DownloadFile = "" ExtractFolder = "C:\Extracted"
Download ANY file from FTP / HTTP(S) to disk without CAB extraction:
Blocksize == 0 DownloadFile = "C:\Metallica.mp3" ExtractFolder = ""
Download blocks of CAB file to memory and extract them:
Blocksize > 0 DownloadFile = "" ExtractFolder = "C:\Extracted"

You can extract from the same CAB file on the server multiple times, so the content of the internet cache can be reutilized. A downloaded temporary file can also be reutilized in this way:

i_Extract.SetSingleFile(FileName1);
i_Extract.ExtractUrl(Url, Blocksize, DownloadFile, ExtractFolder);
i_Extract.SetSingleFile(FileName2);
i_Extract.ExtractMoreUrl(ExtractFolder);
i_Extract.CleanUp(); // see below

More Internet Functions

With i_Extract.CleanUp() you free the memory of the internet cache and close downloaded files. Call this when you are ready with the CAB file.

IMPORTANT:
CleanUp() is not called automatically to allow reusing partial downloads to memory or full downloads to a temp file for later extractions from the same CAB file. So you MUST call CleanUp() manually after the last URL extraction as the library can not know when you are done.

With i_Extract.SetProxy() you can specify a CERN, TIS or SOCKS proxy for HTTP, HTTPS, FTP.

The string must be in the format "http=http://Proxy1.com:8000 https=https://Proxy2.com:443". If you pass an empty string or never call this funtion the default settings of Internet Explorer will be used. (stored in the Registry)

With i_Extract.SetPassiveFtpMode() you can turn to active or passive FTP mode. If you never call this function passive mode will be used.

With i_Extract.SetHttpHeaders() you can specify additional HTTP headers which are sent to the server.

The headers consist of "Name: Value". Multiple headers can be separated by the pipe character. Example: "Referer: http://www.test.com|Accept-Language:en" There must be no space before or after the pipe character!

With i_Extract.InternetGetProgress() you can display the download progress in a progress bar. (This is not a callback!) The extraction must be started from another thread and in the GUI thread you must set a timer (e.g. 500 ms) to poll the progress from this function. A progressbar does not make much sense for partial updates.

(If you specified a blocksize > 0 the progressbar will restart at zero for each downloaded block anew.)

Attention: Not all HTTP Servers return "CONTENT-LENGTH" (e.g. AOL servers), in this case the progressbar will not work.

More Extraction Functions

Normally you will not need the following C++ functions:
With i_Extract.AbortOperation() you can abort a lenghty extraction. Obviously this must be called from another thread.

With i_Extract.IsCabinet() you can check if the specified CAB file is corrupt. If you try to extract a corrupt file you will get an error, so calling this is not necessary.

With i_Extract.SetSingleFile() you can extract only the specified file from the cabinet. The file must be in the root folder of the CAB. (see below)

Decryption

Similar to the encryption you can decrypt an archive:

C++ and C#

i_Extract.SetDecryptionKey("KHzt/(90aresD$%§&UGjhgoh89äÖLÜnkjjkbIUH(I/H809z9z");

How to Extract Only One File from the CAB File/Resource/Stream

C#

CabLib.Extract i_Extract = new CabLib.Extract();
i_Extract.SetSingleFile("File_1.txt");
i_Extract.ExtractFile(@"C:\Temp\Packed.cab", @"E:\ExtractFolder");

This will create a file E:\ExtractFolder\File_1.txt. The file to be exctracted MUST be located in the root folder of the CAB archive. If the file does not exist in the archive, nothing will happen (no error). If a single file is extracted the event evBeforeCopyFile will not be fired. (see below)

Extraction Callbacks / Events

CExtract.OnBeforeCopyFile() (C++ Callback)
CabLib.Extract.evBeforeCopyFile (.NET event)

This is called before Cabinet.dll copies an extracted file to disk. You get detailed information about the file to be extracted. If you don't want this file to be copied to disk you can return FALSE here and the file will be skipped. (Examples see below)

You can use this callback to display progress information in your GUI.

CExtract.OnAfterCopyFile() (C++ Callback)
CabLib.Extract.evAfterCopyFile (.NET event)

This is called after Cabinet.dll has placed a new file onto disk. You can use this callback to display progress information in your GUI.

CExtract.OnCabinetInfo() (C++ Callback)
CabLib.Extract.evCabinetInfo (.NET event)

This function will be called exactly once for each cabinet when it is opened. It passes information about the CAB file.

CExtract.OnNextCabinet() (C++ Callback)
CabLib.Extract.evNextCabinet (.NET event)

This function will be called when the next cabinet file in the sequence of splitted cabinets needs to be opened.

Here you can display a message like "Please insert disk 2!"

ATTENTION:

It is recommended to start extraction from another thread to avoid a dead GUI and to be able to call AbortOperation() and show the progress.

In C# you must call Control.BeginInvoke() in the event handler routine to asynchronously access GUI elements otherwise you will run into trouble!

For details see the file Microsoft Cabinet.dll Doku.doc and the plenty comments in the file FDI.H

Manipulating the Extraction Process

With the callback OnBeforeCopyFile() you can control exactly what you want to extract from the CAB file. The callback/event passes a structure kCabinetFileInfo which tells you details of the file to be extracted: file name, subfolder, full path, file size, file date/time and file attributes.

With this information you can decide if you want the file to be extracted and return false if not.

In C# you must attach an event handler first:

C#

CabLib.Extract.delBeforeCopyFile i_Delegate = new CabLib.Extract.delBeforeCopyFile(
    OnBeforeCopyFile);
i_Extract.evBeforeCopyFile += i_Delegate;

i_Extract.ExtractResource("CabLib.dll", 101, "CABFILE", @"E:\ExtractFoder");

i_Extract.evBeforeCopyFile -= i_Delegate;

How to Extract only Files with a Specific File Extension

If you want to extract only the files from the CAB which have the extension ".DLL" (including all subfolders) you can write:

C++

BOOL OnBeforeCopyFile(kCabinetFileInfo &k_Info, void* p_Param)
{ 
    int Len = (int)strlen(k_Info.s8_File);   // length of filename
    return (stricmp(k_Info.s8_File +Len -4, ".Dll") == 0);
}

C#

private bool OnBeforeCopyFile(CabLib.Extract.kCabinetFileInfo k_Info)
{
    return k_Info.s_File.ToUpper().EndsWith(".DLL");
}

How to Extract only Files within a Specific Subfolder in the CAB

If you want to extract only one folder from a CAB with the name "Setup\" and all its subfolders, write:

C++

BOOL OnBeforeCopyFile(kCabinetFileInfo &k_Info, void* p_Param)
{ 
    return (strnicmp(k_Info.s8_SubFolder, "Setup\\", 6) == 0);
}

C#

private bool OnBeforeCopyFile(CabLib.Extract.kCabinetFileInfo k_Info)
{
    return k_Info.s_SubFolder.ToUpper().StartsWith(@"SETUP\");
}

How to Extract only Newer Files

If you want to make an update of existing files and you want only files on disk to be overwritten which have an older date than the files in the CAB, you can write:

C++

BOOL OnBeforeCopyFile(kCabinetFileInfo &k_Info, void* p_Param)
{
    // try to open the file on disk
    HANDLE h_File = CreateFile(k_Info.s8_FullPath, GENERIC_READ,
                               FILE_SHARE_READ, 0, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL,
                               0);
    
    // The file does not yet exist --> copy it!

    if (h_File == INVALID_HANDLE_VALUE)
        return TRUE;
    
    FILETIME k_FileTime, k_LocalTime;
    BOOL b_OK = GetFileTime(h_File, 0, 0, &k_FileTime);
    
    CloseHandle(h_File);
    
    if (!b_OK)
        return TRUE;
    
    // Last write time UTC --> Local time
    FileTimeToLocalFileTime(&k_FileTime,    &k_LocalTime);
    return (CompareFileTime(&k_Info.k_Time, &k_LocalTime) > 0);
}

C#

private bool OnBeforeCopyFile(CabLib.Extract.kCabinetFileInfo k_Info)
{
    if (!System.IO.File.Exists(k_Info.s_FullPath)) 
        return true;
    
    // retrieve local file time
    System.DateTime k_FileTime = System.IO.File.GetLastWriteTime(k_Info.s_FullPath);
        return (k_Info.k_Time.CompareTo(k_FileTime) > 0);
}

The Extraction Class Hierarchy

This diagram demonstrates the C++ classes which are used in both projects:

If you want a different behaviour, do NOT modify the existing classes. Instead derive a new class from the existing classes and override the functions you want to change.

CExtract contains the functions to extract a "real" CAB file from disk.

CExtract contains the following callbacks which are called from Cabinet.dll:

  • Open() to open a file
  • Read() to read from a file
  • Write() to write to a file
  • Seek() to set the file pointer or ask its position
  • Close() to close a file

IMPORTANT: These callbacks are called from Cabinet.dll to read the CAB file AND to write all the extracted files to disk.

CExtractMemory is a class which overrides the file access functions and replaces them with functions which read the CAB data from memory instead of disk. CExtractMemory itself cannot be instanciated. Other classes must be derived from it. It provides these additional callbacks:

  • OpenMem() to open the memory which represents the CAB file
  • ReadMem() to read from the memory of the CAB file
  • SeekMem() to set the memory pointer or ask its position
  • CloseMem() to release the memory which holds the CAB file

IMPORTANT: These callbacks are ONLY called when Cabinet.dll wants to read the CAB file.

CExtractResource is derived from CExtractMemory to read data from a Win32 resource. CExtractStream is derived from CExtractMemory to read data from a .NET stream. CExtractUrl is derived from CExtractMemory to read data from the internet.

You can easily derive your own classes for example to read data from a pipe or whatever you like. The data stream must be capable of seeking (random access).

Degugging

If you want to debug the whole compression/extraction process with a tool like DebugView from SysInternals you can modify the file Trace.hpp:

#define _TraceCompress (_DEBUG && TRUE) // CAB compression

and / or
#define _TraceExtract  (_DEBUG && TRUE) // CAB extraction
and / or
#define _TraceInternet (_DEBUG && TRUE) // communication with the server

and / or
#define _TraceCache    (_DEBUG && TRUE) // storage of downloaded blocks in cache

IMPORTANT:
To see anything in DebugView you must compile CabLib in DEBUG mode and start the compiled application in Visual Studio with CTRL + F5

An Installer / Updater based on CabLib

Check out my project "An Intelligent .NET Multilanguage Installer" which installs or updates a software package from a local CAB file, a CAB file on a fileserver or a CAB file on a FTP / HTTP(S) server.

P.S.
From my homepage you can download free C++ books in compiled HTML format.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Elmue



Location: Germany Germany

Other popular Files and Folders articles:

Article Top
Sign Up to vote for this article
You must Sign In to use this message board.
FAQ FAQ Noise ToleranceSearch Search Messages 
 Layout  Per page   
 Msgs 1 to 25 of 233 (Total in Forum: 233) (Refresh)FirstPrevNext
Subject  Author Date 
GeneralStore only CAB? [modified]memberLeung Yat Chun1:11 3 Jul '08  
GeneralRe: Store only CAB?memberElmue8:14 7 Jul '08  
GeneralRe: Store only CAB?memberLeung Yat Chun8:37 7 Jul '08  
GeneralRe: Store in an uncompressed CAB?memberElmue8:07 13 Jul '08  
GeneralRe: Store in an uncompressed CAB?memberLeung Yat Chun7:33 14 Jul '08  
GeneralRe: Store in an uncompressed CAB?memberElmue14:50 14 Jul '08  
NewsIMPORTANT: New CabLib version 9.6 released !!memberElmue5:37 25 Jun '08  
GeneralEncryptionmvpJeffrey Walton3:27 12 Jun '08  
General[Message Deleted]mvpJeffrey Walton10:33 13 Jun '08  
GeneralRe: EncryptionmemberElmue5:39 25 Jun '08  
GeneralRe: EncryptionmvpJeffrey Walton11:32 25 Jun '08