|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Announcements
Want a new Job?
Chapters
Services
Feature Zones
|
IntroductionWith this project, C++ and .NET programmers get a very versatile library for compression and extraction of Microsoft CAB files. .NET 1.1 does not offer compression functionality. .NET 2.0 offers the If you search the internet for more comfortable compression libraries, you find, for example, ICSharpCode.SharpZipLib.dll which offers ZIP compression. But this library is awkward to use and buggy, and so is unusable. Although the bugs have been known for years, the author has not fixed them. I asked myself why should I search for another open source library (which will again have other bugs) while Windows itself supports CAB archives since the first days? Microsoft's Cabinet.dll (in the System32 directory) is not buggy. Many Microsoft installers (like the installer for Internet Explorer or Windows patches) use it. Additionally, CAB reaches a much better compression ratio than ZIP. Finally I found the project "Microsoft Cabinet Templates" from Luuk Weltevreden on CodeProject. He created a very versatile wrapper around Microsoft's Cabinet.dll, consisting of C++ templates. But he didn't even do half of the work. He wrote good extraction classes, but the compression class was completely missing. I worked on his code, fixed a serious bug, simplified some awkward code, removed the templates, and added the missing compression functionality, encryption, Unicode support, internet extraction and more. Additionally, I added all files you need to compile the project. There is no need to download anything from Microsoft anymore. Features
Limitations
Source CodeYou will find a very clean source code with a tidy error handling and plenty comments written by a very experienced programmer. You get a high quality library and you will save several weeks of coding time. The code is reusable, you can reuse for example the Internet class for downloads from FTP / HTTP(S) or the String class to encode /decode UTF strings. You can study LibExtract.h to see how managed callbacks can be passed to unmanaged C++ code (which is not easy and requires gcroot or GCHandle) Different CAB File FormatsThere are two completely different types of CAB files: The ones which this project supports are the "Microsoft CAB" files (also called "MS-CAB"). The Microsoft pack format is also known as MSZIP. If you open a MS-CAB file with a hex editor, you will notice that the first four bytes are "MSCF" (MicroSoft Cab File), while the first three bytes of an InstallShield-CAB file are "ISc". (InstallShield Cab). You cannot open or create InstallShield CAB files with this project. There exist only very few tools which are capable of managing InstallShield CAB files; for example, the tool WinPack which you can download from my homepage. Compression RatioMS-CAB files have a very good compression ratio. To test this I packed a bunch of about hundred text files. This is the result of my test:
Intelligent InstallersMicrosoft's intention of CAB files was to use them for installations:
Many installers are stupid. If you start an unintelligent installer (like the one of Nero 6) you will see:
In contrary with this Cabinet library you can build an Intelligent Installer: Scenario 1. You Deliver only Two Files to your Clients: A Tiny EXE File and a Huge CAB File Put a huge CAB file on a local server or CD or DVD and let the user only start a tiny EXE setup file. The installer will start immediately and extract only the files from the CAB file which are really needed. This cabinet library obviously can extract the whole CAB file. But it is also possible to extract only specific files directly from the CAB file on server/CD/DVD to harddisk. The data transfer is compressed and — if you like — encrypted. Scenario 2. You Deliver only one Huge EXE File to Your Clients You can also embed the CAB file into the Setup.exe and directly extract specific files from the embedded resource in memory without creating temporary files. This scenario only makes sense for small setup's otherwise users with little RAM will get problems if they start a 100 MB EXE file! Scenario 3. You Deliver only a URL to your Clients If you chose URL extraction you deliver only a tiny EXE file which downloads the CAB file from the internet. (FTP or HTTP(S)) If you use this for updates you can even configure the Cabinet library to download only the files from the CAB archive which require an update. Example: Your company sells an ASP server which consists of 500 files. You put a CAB file of 100 Megabyte on your Update server which contains all files. Let's say the client wants to update to the latest version and needs to replace only 15 files of 500 files. The Cabinet library downloads only 2 Megabyte instead of 100 Megabyte from your server! The data transfer is compressed and optionally encrypted. If you need an installer/updater, download my project "An Intelligent .NET Multilanguage Installer The C++ ProjectTo add CAB support to your C++ project download the project Cabinet at the top of this page (a demo application is included) and copy the entire subfolder "Cabinet" to your project.
The .NET ProjectThe second project is for .NET developers. I wrote a wrapper in Managed C++ around this C++ project. The result compiles into a .NET DLL. You simply add the .NET assembly CabLib.dll to the references of your .NET project (C# or Visual Basic .NET or Managed C++) and you get CAB support. In the second download at the top of this page you will find CabLib.DLL already compiled and ready to use. (A demo application is included)
Cabinet.dllMicrosoft's tiny Cabinet.dll which is located in your System(32) directory since Windows NT/98 offers the following Compression API:
And the Extraction API:
You get a detailed description of these functions in the file Microsoft Cabinet.dll Doku.doc which you find in both projects and the files FCI.H and FDI.H contain plenty comments. The API in Cabinet.Dll uses a bunch of Callbacks which are called while a CAB file is created or extracted. The C++ project wraps these callbacks and you can override each of the callback functions to modify the behaviour. The .NET project offers events which you can use to handle these callbacks in your .NET application. You can use these callbacks / events to filter specific files or you can read compression data from a stream or from memory instead of a file on disk. This makes the library extremly versatile. (examples see below) UnicodeThe underlying Cabinet.DLL does not support Unicode, but this project allows Unicode paths and filenames to be compressed by encoding them as UTF7 in the CAB archive. This has the advantage over MBCS (MultiByte) that the encoding is independent of any codepage and avoids lots of problems. (e.g. UTF avoids using the buggy The C++ project offers all file-functions in an ANSII version and an Unicode version. If your application is run on Windows 95/98/ME the library automatically detects the operating system and uses the appropriate API. You DON'T have to care about calling the ..A() or ..W() functions depending on the operating system. The ..W() functions will also work on Windows 95/98/ME. To use the Unicode functionality it is NOT required to compile the C++ project with the UNICODE compiler switch (#ifdef UNICODE). Although compiled as MBCS the Wide versions of the functions ...W() will work! The .NET project uses the Unicode versions ..W() of the underlying C++ code. Only the operating system limits the usage of Unicode: If you extract a CAB file which contains Unicode filenames on Windows 95/98/ME the Unicode files are skipped as it is impossible to store for example Japanese files on an English Windows 95/98/ME. UTC TimeWith the parameter
It is recommended to compress using UTC time so after changing the PC's timezone or after daylight saving has changed the files in the CAB archive and on disk will still have the same time. On the other hand if you compress/extract using local time a CAB file extracted in winter has a time shift of one hour compared with a CAB file extracted in summer. Using the Compression Functions
C++Cabinet::CCompress i_Compress;
if (!i_Compress.CreateFCIContextA("C:\\Temp\\Packed.cab"))
{ Error handling... }
if (!i_Compress.AddFileA("C:\\Windows\\Explorer.exe", "FileManager\\Explorer.exe", 0))
{ Error handling... }
if (!i_Compress.AddFileA("C:\\Windows\\Notepad.exe", "TextManager\\Notepad.exe", 0))
{ Error handling... }
if (!i_Compress.FlushCabinet(FALSE))
{ Error handling... }
C#ArrayList i_Files = new ArrayList();
i_Files.Add(new string[] { @"C:\Windows\Explorer.exe", @"FileManager\Explorer.exe" });
i_Files.Add(new string[] { @"C:\Windows\Notepad.exe", @"TextManager\Notepad.exe" });
CabLib.Compress i_Compress = new CabLib.Compress();
i_Compress.CompressFileList(i_Files, @"C:\Temp\Packed.cab", 0);
You can also easily compress all HTM files in the folder C:\Web and all its subfolders into a CAB file which will reflect the folder structure found on harddisk: C#CabLib.Compress i_Compress = new CabLib.Compress();
i_Compress.CompressFolder(@"C:\Web", @"C:\Temp\Packed.cab", "*.htm", 0);
Compression Splitting
C++Cabinet::CCompress i_Compress;
if (!i_Compress.CreateFCIContextA("C:\\Temp\\Packed_%d.cab", TRUE, 200000))
{ Error handling... }
etc..
C#i_Compress.CompressFileList(i_Files, @"C:\Temp\Packed_%d.cab", 200000);
or
i_Compress.CompressFolder(@"C:\Web", @"C:\Temp\Packed_%d.cab", "*.htm", 200000);
Setting the Compression TEMP DirectoryDuring compression Cabinet.DLL will create some temporary files which will be automatically deleted afterwards. By default it uses the TEMP directory which Windows specifies. If you want to compress huge files and the space on drive C: is low you should specify a TEMP directory on another drive. It is possible to use the same directory as output folder for the CAB file and as TEMP directory. C++ and C#i_Compress.SetTempDirectory("E:\\Temp"); EncryptionYou can encrypt the CAB file with a key. The C++ code encrypts/decrypts the CAB data "on the fly" in blocks of 8 Bytes using the Blowfish algorithm. Blowfish is a very fast, symmetrical, license-free algorithm. As encryption key you can use any binary data up to 72 Byte length. If the key is longer than 72 Bytes the remaining bytes will be ignored. If the key is shorter than 72 Bytes, some bytes are reused. It is possible but not recommended to use a plain text password directly for the Blowfish encryption. (see: KDF) Instead you should derive a binary hash from the plain text password. The .NET code does this with a SHA 512 hash which always has a length of 64 Bytes. In the .NET project you can set a plain text password (string) of any length, of which first a 64 Byte SHA hash is derived and then this hash is used as key for the Blowfish encryption of the CAB data. You can also directly set your own binary data as key for Blowfish. C#i_Compress.SetEncryptionKey(String); // SHA 512 + Blowfish
or i_Compress.SetEncryptionKey(Byte[72]); // only Blowfish
C++i_Compress.SetEncryptionKey(void* p_Key, DWORD u32_KeyLength); // only Blowfish
More Compression FunctionsNormally you will not need the following C++ functions: With With For details see the file Microsoft Cabinet.dll Doku.doc and the plenty comments in the file FCI.H With Compression Callbacks / Events
This can be used to update your GUI to display the progress during a lengthy compression. ATTENTIONIt is recommended to start compression from another thread to avoid a dead GUI and to be able to call In C# you must call For details see the file Microsoft Cabinet.dll Doku.doc and the plenty comments in the file FCI.H Extensions / ModificationsIf you want a different behaviour for compression, do NOT modify the existing compression class Using the Extraction FunctionsFile ExtractionDuring extraction there will be NO temporary files created. The following sample extracts a file C:\Temp\Packed.cab into the folder E:\ExtractFolder. The required subfolders will be created automatically if the CAB file contains subfolders. C++Cabinet::CExtract i_Extract;
if (!i_Extract.CreateFDIContext())
{ Error Handling ... }
if (!i_Extract.ExtractFileA("C:\\Temp\\Packed.cab", "E:\\ExtractFolder"))
{ Error Handling ... }
C#CabLib.Extract i_Extract = new CabLib.Extract();
i_Extract.ExtractFile(@"C:\Temp\Packed.cab", @"E:\ExtractFolder");
The .NET library automatically detects files with the extension .LNK and resolves the shortcut (to the CAB file) inside and files with the extension .URL are redirected to URL extraction. Win32 Resource ExtractionThe following sample extracts a Cabinet file which is stored in the Win32 resources of a DLL or EXE file. You can extract files DIRECTLY from a CAB file in memory! There are some rules to respect when you add a CAB file to the resources of your project: In the file Cabinet.rc of the C++ project and in CabLib.rc of the .NET project you find this line: ID_CAB_TEST CABFILE "Res\\Test.cab" and in the file Resource.h you find this line: #define ID_CAB_TEST 101
To extract the embedded resource Test.cab (which I added to both projects) write: C++Cabinet::CExtractResource i_Extract;
if (!i_Extract.CreateFDIContext())
{ Error Handling ... }
if (!i_Extract.ExtractResourceA("Cabinet.exe", ID_CAB_TEST, "CABFILE", "C:\\ExtractFolder"))
{ Error Handling ... }
C#CabLib.Extract i_Extract = new CabLib.Extract();
i_Extract.ExtractResource("CabLib.dll", 101, "CABFILE", @"C:\ExtractFolder");
The first parameter specifies the filename (without path) from which to extract the Win32 CAB resource. You can set this = 0 (null) if the resource is inside the EXE which has created the process. You can use this functionality to extract a CAB file from ANY DLL currently loaded into the process or from the application EXE itself. To explore the resources of files which are already compiled download the tool ResourceHacker. Most of the Windows Update patches contain a CAB file inside. .NET Resource Extraction / Stream Extraction
C#System.Reflection.Assembly i_Ass = System.Reflection.Assembly.GetExecutingAssembly();
System.IO.Stream i_Strm = i_Ass.GetManifestResourceStream(
"MyProject.Resources.Test.cab");
CabLib.Extract i_Extract = new CabLib.Extract();
i_Extract.ExtractStream(i_Strm, @"E:\ExtractFolder");
URL ExtractionLet's say you want to build an updater which updates your software package to the latest version on the client's computers. If your software package consists of 500 files, of which only 15 files have changed in the latest version you could deliver an update patch which contains these 15 files. A more intelligent solution is to build an updater which extracts only the files which require an update from a huge CAB archive on your update server. This CAB archive contains the latest version of your entire software package and no matter which version is currently installed on your client's computer, the updater will only download the files which are out of date:
The data transfered will be compressed data and optionally encrypted data. (see CAB encryption) Obviously you will have to pack a filelist with the MD5's of all files into the CAB archive so that the updater knows in advance which files he has to download and which files are up to date. A complete installer project using this type of URL extraction (and more) can be found here: "An Intelligent .NET Multilanguage Installer" Approximately the first 1% of the archive is an index which contains the filenames and pointers into the compressed data. The Cabinet library reads this index and for each file it calls the callback function If you return FALSE in this callback the file will neither be downloaded nor extracted. To be able to download a file only partially from the server you have two options:
Here an example how such a script can be written in PHP: (Download.php) if (strlen($_GET["File"]) > 0)
{
DownloadFilePartially("Updates", $_GET["File"], $_GET["Offset"], $_GET["Length"]);
}
function DownloadFilePartially($sFolder, $sFile, $Offset, $Length)
{
$sPath = getcwd() ."/";
if (strlen($sFolder) > 0) $sPath .= $sFolder ."/";
$sPath .= $sFile;
// Block hacker attacks (restrict access to the given folder "Updates")
// Details about hacker attacks see OwaspGuide.PDF on: www.owasp.org
if (substr($sFile, 0, 1) == '.' || strpos($sFile, "..") !== FALSE ||
strpos($sFile, '/') !== FALSE || $Offset < 0 || $Length < 0 ||
!is_file($sPath))
{
header("Status: 501 Invalid Parameters", true, 501);
echo "Invalid Parameters";
exit; // Do NOT return Status code 200 (OK) on errors !!!
}
$hFile = fopen($sPath, "rb");
$Filesize = filesize($sPath);
if ($Length == 0) $Length = $Filesize; // return entire file
$Blocksize = 32 * 1024;
$Length = min($Length, $Filesize - $Offset);
header("Accept-Ranges: bytes");
header("Content-Description: File Transfer");
header("Content-Transfer-Encoding: binary");
header("Content-Type: application/x-zip-compressed");
header("Content-Disposition: attachment; filename=\"Cab.part\";");
header("Content-Length: " .$Length);
header("Last-Modified: " .gmdate ("D, j M Y H:i:s", filemtime($sPath)) ." GMT");
fseek($hFile, $Offset);
while (!feof($hFile) && connection_status() == 0 && $Length > 0)
{
$Blocksize = min ($Blocksize, $Length);
set_time_limit(40);
print(fread($hFile, $Blocksize));
$Length -= $Blocksize;
flush();
}
fclose($hFile);
exit;
}
Microsofts Cabinet.dll requests block sizes between 8 Bytes and 32000 Bytes in the callback FDIRead. It would be nonsense to request such tiny blocks from the server. Additionally Cabinet.Dll accesses the CAB data not comletely sequentially. For that the Cabinet library uses the class An important factor is the size of the blocks which the cache reads from the server. A too small blocksize results in a bad performance because for each block a new data connection is opened to the server. A too big blocksize will download more data than is really needed when extracting only specific files from a huge archive. I recommend to enable tracing in the file Trace.hpp and play around with the blocksize. In the Trace you will see the download speed in KB/s. If you do not want to use the functionality of extracting only parts of a CAB file it is strongly recommended to download the entire CAB archive to a temporary file on disk and then extract it. The cabinet library will do that for you automatically and also delete the temporary file after extracting it. This results in the maximum download speed because there is a continuous data stream coming from the server.
You should not use blocksizes greater than 2 MB as you don't gain any advantage by doing that. If you do not specify username and password inside the URL, FTP will use anonymous login. The class C++Cabinet::CExtractUrl i_Extract;
if (!i_Extract.CreateFDIContext())
{ Error Handling ... }
if (!i_Extract.ExtractUrlA(URL, Blocksize, DownloadFile, ExtractFolder))
{ Error Handling ... }
i_Extract.CleanUp(); // see below
C#CabLib.Extract i_Extract = new CabLib.Extract();
i_Extract.ExtractUrl(URL, Blocksize, DownloadFile, ExtractFolder);
i_Extract.CleanUp(); // see below
There are several options using this function:
You can extract from the same CAB file on the server multiple times, so the content of the internet cache can be reutilized. A downloaded temporary file can also be reutilized in this way: i_Extract.SetSingleFile(FileName1); i_Extract.ExtractUrl(Url, Blocksize, DownloadFile, ExtractFolder); i_Extract.SetSingleFile(FileName2); i_Extract.ExtractMoreUrl(ExtractFolder); i_Extract.CleanUp(); // see below More Internet FunctionsWith IMPORTANT: With The string must be in the format "http=http://Proxy1.com:8000 https=https://Proxy2.com:443". If you pass an empty string or never call this funtion the default settings of Internet Explorer will be used. (stored in the Registry) With With The headers consist of "Name: Value". Multiple headers can be separated by the pipe character. Example: "Referer: http://www.test.com|Accept-Language:en" There must be no space before or after the pipe character! With (If you specified a blocksize > 0 the progressbar will restart at zero for each downloaded block anew.) Attention: Not all HTTP Servers return "CONTENT-LENGTH" (e.g. AOL servers), in this case the progressbar will not work. More Extraction FunctionsNormally you will not need the following C++ functions: With With DecryptionSimilar to the encryption you can decrypt an archive: C++ and C#i_Extract.SetDecryptionKey("KHzt/(90aresD$%§&UGjhgoh89äÖLÜnkjjkbIUH(I/H809z9z");
How to Extract Only One File from the CAB File/Resource/StreamC#CabLib.Extract i_Extract = new CabLib.Extract();
i_Extract.SetSingleFile("File_1.txt");
i_Extract.ExtractFile(@"C:\Temp\Packed.cab", @"E:\ExtractFolder");
This will create a file E:\ExtractFolder\File_1.txt. The file to be exctracted MUST be located in the root folder of the CAB archive. If the file does not exist in the archive, nothing will happen (no error). If a single file is extracted the event Extraction Callbacks / Events
This is called before Cabinet.dll copies an extracted file to disk. You get detailed information about the file to be extracted. If you don't want this file to be copied to disk you can return FALSE here and the file will be skipped. (Examples see below) You can use this callback to display progress information in your GUI.
This is called after Cabinet.dll has placed a new file onto disk. You can use this callback to display progress information in your GUI.
This function will be called exactly once for each cabinet when it is opened. It passes information about the CAB file.
This function will be called when the next cabinet file in the sequence of splitted cabinets needs to be opened. Here you can display a message like "Please insert disk 2!" ATTENTION: It is recommended to start extraction from another thread to avoid a dead GUI and to be able to call In C# you must call For details see the file Microsoft Cabinet.dll Doku.doc and the plenty comments in the file FDI.H Manipulating the Extraction ProcessWith the callback With this information you can decide if you want the file to be extracted and return false if not. In C# you must attach an event handler first: C#CabLib.Extract.delBeforeCopyFile i_Delegate = new CabLib.Extract.delBeforeCopyFile(
OnBeforeCopyFile);
i_Extract.evBeforeCopyFile += i_Delegate;
i_Extract.ExtractResource("CabLib.dll", 101, "CABFILE", @"E:\ExtractFoder");
i_Extract.evBeforeCopyFile -= i_Delegate;
How to Extract only Files with a Specific File ExtensionIf you want to extract only the files from the CAB which have the extension ".DLL" (including all subfolders) you can write: C++ BOOL OnBeforeCopyFile(kCabinetFileInfo &k_Info, void* p_Param)
{
int Len = (int)strlen(k_Info.s8_File); // length of filename
return (stricmp(k_Info.s8_File +Len -4, ".Dll") == 0);
}
C#private bool OnBeforeCopyFile(CabLib.Extract.kCabinetFileInfo k_Info)
{
return k_Info.s_File.ToUpper().EndsWith(".DLL");
}
How to Extract only Files within a Specific Subfolder in the CABIf you want to extract only one folder from a CAB with the name "Setup\" and all its subfolders, write: C++BOOL OnBeforeCopyFile(kCabinetFileInfo &k_Info, void* p_Param)
{
return (strnicmp(k_Info.s8_SubFolder, "Setup\\", 6) == 0);
}
C#private bool OnBeforeCopyFile(CabLib.Extract.kCabinetFileInfo k_Info)
{
return k_Info.s_SubFolder.ToUpper().StartsWith(@"SETUP\");
}
How to Extract only Newer FilesIf you want to make an update of existing files and you want only files on disk to be overwritten which have an older date than the files in the CAB, you can write: C++BOOL OnBeforeCopyFile(kCabinetFileInfo &k_Info, void* p_Param)
{
// try to open the file on disk
HANDLE h_File = CreateFile(k_Info.s8_FullPath, GENERIC_READ,
FILE_SHARE_READ, 0, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL,
0);
// The file does not yet exist --> copy it!
if (h_File == INVALID_HANDLE_VALUE)
return TRUE;
FILETIME k_FileTime, k_LocalTime;
BOOL b_OK = GetFileTime(h_File, 0, 0, &k_FileTime);
CloseHandle(h_File);
if (!b_OK)
return TRUE;
// Last write time UTC --> Local time
FileTimeToLocalFileTime(&k_FileTime, &k_LocalTime);
return (CompareFileTime(&k_Info.k_Time, &k_LocalTime) > 0);
}
C#private bool OnBeforeCopyFile(CabLib.Extract.kCabinetFileInfo k_Info)
{
if (!System.IO.File.Exists(k_Info.s_FullPath))
return true;
// retrieve local file time
System.DateTime k_FileTime = System.IO.File.GetLastWriteTime(k_Info.s_FullPath);
return (k_Info.k_Time.CompareTo(k_FileTime) > 0);
}
The Extraction Class HierarchyThis diagram demonstrates the C++ classes which are used in both projects:
If you want a different behaviour, do NOT modify the existing classes. Instead derive a new class from the existing classes and override the functions you want to change.
IMPORTANT: These callbacks are called from Cabinet.dll to read the CAB file AND to write all the extracted files to disk.
IMPORTANT: These callbacks are ONLY called when Cabinet.dll wants to read the CAB file.
You can easily derive your own classes for example to read data from a pipe or whatever you like. The data stream must be capable of seeking (random access). DeguggingIf you want to debug the whole compression/extraction process with a tool like DebugView from SysInternals you can modify the file Trace.hpp: #define _TraceCompress (_DEBUG && TRUE) // CAB compression
and / or
#define _TraceExtract (_DEBUG && TRUE) // CAB extraction
and / or
#define _TraceInternet (_DEBUG && TRUE) // communication with the server
and / or
#define _TraceCache (_DEBUG && TRUE) // storage of downloaded blocks in cache
IMPORTANT: An Installer / Updater based on CabLibCheck out my project "An Intelligent .NET Multilanguage Installer" which installs or updates a software package from a local CAB file, a CAB file on a fileserver or a CAB file on a FTP / HTTP(S) server. P.S.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||