Click here to Skip to main content
13,003,930 members (132,062 online)
Click here to Skip to main content
Add your own
alternative version

Stats

101.2K views
2.7K downloads
50 bookmarked
Posted 21 Sep 2003

How to Write a Simple Packer/Unpacker with a Self-Extractor (SFX)

, 21 Sep 2003
Rate this:
Please Sign up or sign in to vote.
An example of writing a self-extracting archive using pack and unpack routines.

Introduction

In this article I will show how to write a file packer/unpacker and how to make a self-extracting version of the archive (SFX).

Please note this article and code has been written for learning purposes and not for complex functionality, thus the following limitations apply:

  • Only packing of files (binding them into one file) and no compression
  • Packer doesn't pack files in subdirectories
  • Packer header is not really optimized - just enough for our purposes
  • All code presented here compiles as a console application and no GUI version is provided

The Archive File Format

The idea is to build a structure/format that will allow us to hold a file list and file contents in one file in such a way that we will be able to restore the files to their original state.

Thus this design of the pack header:

  • Signature - Offset 0x02/DWORD
    This will occupy the first 4 bytes of the header. It will contain a simple signature that will allow us to identify our packed files.

  • NumOfFiles - Offset 0x04/DWORD
    Here we stored a DWORD holding the number of files in a subject.

  • FilesInfo - Offset 0x08/sizeof(packdata_t)
    Here we start storing the file information in a sequence defined as the array packdata_t FileInfo[NumOfFiles].

    The packdata_t structure is defined as:<PRE lang=c++>struct packdata_t { char FileName[MAX_PATH]; long filesize; }

    As you noticed, we simply save the file's size and name. The packdata_t structure is not the optimal way of storing file names or information, because we could have used a variable length packdata_t struct defined as<PRE lang=c++>struct packdata_t { long filesize; // Other file info, such as creation date , attributes, ... char filenameLength; char FileName[1]; }

    But, of course, managing this last struct is beyond the scope of this article.

After the pack header we have the files' contents stored in sequence. So the whole archive file format will look like this:

Signature
NumOfFiles
packdata_t Files[NumOfFiles]
File1 content
File2 content
.
.
.
File(NumOfFiles) content

Writing the Packer

In order to make the code a little extensible, I have defined a structure that will hold callback functions triggered from inside the packer/unpacker routines. These callbacks are used for visual notifications and updates.

The callback struct is defined as:<PRE lang=c++>typedef struct { void (*newfile)(char *name, long size); void (*fileprogress)(long pos); } packcallbacks_t;

The newfile() callback is called whenever the packer/unpacker encounters or processes a new file. It will be passed the file's name and size.

The fileprogress() callback is called whenever an operation is in progress. It will be passed the current position that the packer/unpacker is currently processing.

Now, let us define the packfiles function prototype:<PRE lang=c++>int packfilesEx(char *path, char *mask, char *archive, packcallbacks_t * pcb = NULL);

  • We need a path that will designate the source directory.
  • The mask which will tell us what files to search for and pack.
  • The archive which will hold the archive file name.
  • An optional pcb which will hold a list of callbacks used for visual notifications.

Before going to the code, here is the packfilesEx() code flow:

  1. Build packdata_t array of all files to be packed (storing their names and size)
  2. Create the archive file and write in it the Signature and file count
  3. Write the packdata_t array into the archive
  4. Start reading every file and write its content in the archive
  5. Loop (4) until all files are stored
  6. Close the archive file

This operation is enough to pack all files into one single archive file. Now we go straight to the code:<PRE class=c++>int packfilesEx(char *path, char *mask, char *archive, packcallbacks_t *pcb) { TCHAR szCurDir[MAX_PATH]; // define a vector that will hold the packdata_t array. // STL Vectors are stored in contiquous memory. std::vector<packdata_t> filesList; // make sure the current source directory is valid // and change working directory to it if so. // save current directory GetCurrentDirectory(MAX_PATH, szCurDir); // go to new working directory if (!SetCurrentDirectory(path)) return packerrorPath; WIN32_FIND_DATA fd; HANDLE findHandle; packdata_t pdata; findHandle = FindFirstFile(mask, &fd); if (findHandle == INVALID_HANDLE_VALUE) return packerrorNoFiles; long lTemp; // this loop is for storing file's headers only // directories are omitted do { // skip directory entries if ((fd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) == FILE_ATTRIBUTE_DIRECTORY) continue; // clear record memset(&pdata, 0, sizeof(pdata)); // fill packdata entry strcpy(pdata.filename, fd.cFileName); pdata.filesize = fd.nFileSizeLow; // save entry filesList.push_back(pdata); } while(FindNextFile(findHandle, &fd)); FindClose(findHandle); FILE *fpArchive = fopen(archive, "wb"); if (!fpArchive) return packerrorCannotCreateArchive; // write signature lTemp = 'KCPL'; // lallous pack! (L-PCK) fwrite(&lTemp, sizeof(lTemp), 1, fpArchive); // write entries count lTemp = filesList.size(); fwrite(&lTemp, sizeof(lTemp), 1, fpArchive); // store files entries (since std::vector stores elements // in a linear manner) fwrite(&filesList[0], sizeof(pdata), filesList.size(), fpArchive); // process all files to copy for (unsigned int cnt=0;cnt<filesList.size();cnt++) { FILE *inFile = fopen(filesList[cnt].filename, "rb"); long size = filesList[cnt].filesize; // if callback assigned then trigger it if (pcb && pcb->newfile) pcb->newfile(filesList[cnt].filename, size); // copy file name long pos = 0; while (size > 0) { char buffer[4096]; long toread = size > sizeof(buffer) ? sizeof(buffer) : size; fread(buffer, toread, 1, inFile); fwrite(buffer, toread, 1, fpArchive); pos += toread; size -= toread; if (pcb && pcb->fileprogress) pcb->fileprogress(pos); } fclose(inFile); } // close archive and restore working directory fclose(fpArchive); SetCurrentDirectory(szCurDir); return packerrorSuccess; }

Writing the Unpacker

As the packing process has been explained in details, the unpacking part become more obvious; therefore, only the code flow will be presented:

  1. Open archive file
  2. Read pack header
  3. Verify signature - if not valid - report and exit
  4. Having read the pack header (Signature, NumOfFiles, packdata_t array) start extracting the files
  5. Create a new file named packdata_t[idx].FileName and write its contents from the archive file
  6. Process next file
  7. close archive file and exit
<PRE class=c++>int unpackfileEx(char *archive, char *dest, packcallbacks_t * pcb, long startPos) { FILE *fpArchive = fopen(archive, "rb"); // failed to open archive? if (!fpArchive) return packerrorCouldNotOpenArchive; long nFiles; if (startPos) fseek(fpArchive, startPos, SEEK_SET); // read signature fread(&nFiles, sizeof(nFiles), 1, fpArchive); if (nFiles != 'KCPL') return (fclose(fpArchive), packerrorNotAPackedFile); // read files entries count fread(&nFiles, sizeof(nFiles), 1, fpArchive); // no files? if (!nFiles) return (fclose(fpArchive), packerrorNoFiles); // read all files entries std::vector<packdata_t> filesList(nFiles); fread(&filesList[0], sizeof(packdata_t), nFiles, fpArchive); // loop in all files for (unsigned int i=0;i<filesList.size();i++) { FILE *fpOut; char Buffer[4096]; packdata_t *pdata = &filesList[i]; // trigger callback if (pcb && pcb->newfile) pcb->newfile(pdata->filename, pdata->filesize); strcpy(Buffer, dest); strcat(Buffer, pdata->filename); fpOut = fopen(Buffer, "wb"); if (!fpOut) return (fclose(fpArchive), packerrorExtractError); // how many chunks of Buffer_Size is there is in filesize? long size = pdata->filesize; long pos = 0; while (size > 0) { long toread = size > sizeof(Buffer) ? sizeof(Buffer) : size; fread(Buffer, toread, 1, fpArchive); fwrite(Buffer, toread, 1, fpOut); pos += toread; size -= toread; if (pcb && pcb->fileprogress) pcb->fileprogress(pos); } fclose(fpOut); nFiles--; } fclose(fpArchive); return packerrorSuccess; }

Writing the Self-Extractor (SFX)

The SFX is simply a special version of the unpacker (we will call it UnpackerStub) that instead of taking the archive file as command line it will look for an archive file that is embedded into it.
If you are a math geek you can think of an SFX as "UnpackerStub.exe + Archive.bin = UnpackerArchive.exe".

Now how to embed the archive file into the unpacker to form an SFX?

In order to do that we need to write some information in the UnpackerStub that will help it locate the Archive.bin body.

For this purpose I use the e_res2 field in the IMAGE_DOS_HEADER to store a pointer to the archive data inside the unpacker stub.
Every executable has a well documented and defined format that will instruct and tell the OS how to load/run it. The IMAGE_DOS_HEADER (defined in WINNT.H) is located at offset zero of every exectuable and has the following fields:<PRE lang=c++>typedef struct _IMAGE_DOS_HEADER { // DOS .EXE header WORD e_magic; // Magic number WORD e_cblp; // Bytes on last page of file WORD e_cp; // Pages in file WORD e_crlc; // Relocations WORD e_cparhdr; // Size of header in paragraphs WORD e_minalloc; // Minimum extra paragraphs needed WORD e_maxalloc; // Maximum extra paragraphs needed WORD e_ss; // Initial (relative) SS value WORD e_sp; // Initial SP value WORD e_csum; // Checksum WORD e_ip; // Initial IP value WORD e_cs; // Initial (relative) CS value WORD e_lfarlc; // File address of relocation table WORD e_ovno; // Overlay number WORD e_res[4]; // Reserved words WORD e_oemid; // OEM identifier (for e_oeminfo) WORD e_oeminfo; // OEM information; e_oemid specific WORD e_res2[10]; // Reserved words LONG e_lfanew; // File address of new exe header } IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;

I store a pointer to the archive file address into the e_res2 field which is large enough to hold a DWORD. After storing the pointer to the archive, I make sure to append the archive content into the UnpackerStub at that pointer location.

Two functions has been written to get/store the pointer of the archive data:<PRE class=c++>int SfxSetInsertPos(char *filename, long pos) { FILE *fp = fopen(filename, "rb+"); if (fp == NULL) return packerrorCouldNotOpenArchive; IMAGE_DOS_HEADER idh; // read dos header fread((void *)&idh, sizeof(idh), 1, fp); // adjust position value in an unused MZ field *(long *)&idh.e_res2[0] = pos; // update header rewind(fp); fwrite((void *)&idh, sizeof(idh), 1, fp); fclose(fp); return packerrorSuccess; }

This function will store the pointer. First it reads the header, updates the e_res2 field then writes the header back again.<PRE class=c++>int SfxGetInsertPos(char *filename, long *pos) { FILE *fp = fopen(filename, "rb"); if (fp == NULL) return packerrorCouldNotOpenArchive; IMAGE_DOS_HEADER idh; fread((void *)&idh, sizeof(idh), 1, fp); fclose(fp); *pos = *(long *)&idh.e_res2[0]; return packerrorSuccess; }

This function will read the header and extract the value from the e_res2 field.

In short, the unpacker stub works like this:

  1. Call SfxGetInsertPos() to get the position of the archive file
  2. Call the UnpackFilesEx() while passing the position (start of embedded archive.bin) of the archive file and the archive filename which is itself (computed by calling GetModuleFileName(NULL, ...)

Now I continue to describe how the Packer builds the SFX:<PRE class=c++>// check if unpackerstub.exe exists if (GetFileAttributes(sfxStubFile) == (DWORD)-1) { printf("SFX stub file not found!"); return 1; } // open archive file FILE *fpArc = fopen(argv[3], "rb"); if (!fpArc) { printf("Failed to open archive!\n"); return 1; } // get archive size fseek(fpArc, 0, SEEK_END); long arcSize = ftell(fpArc); rewind(fpArc); // form output sfx file name char sfxName[MAX_PATH]; strcpy(sfxName, argv[3]); strcat(sfxName, ".sfx.exe"); // take a copy from SFX if (!CopyFile(sfxStubFile, sfxName, FALSE)) { fclose(fpArc); printf("Could not create SFX file!\n"); return 1; } // append data to SFX FILE *fpSfx = fopen(sfxName, "rb+"); fseek(fpSfx, 0, SEEK_END); // get SFX size before archive appending long sfxSize = ftell(fpSfx); // start appending from archive file to the end of SFX file char Buffer[4096 * 2]; while (arcSize > 0) { long rw = arcSize > sizeof(Buffer) ? sizeof(Buffer) : arcSize; fread(Buffer, rw, 1, fpArc); fwrite(Buffer, rw, 1, fpSfx); arcSize -= rw; } fclose(fpArc); fclose(fpSfx); // mark archive data position inside SFX SfxSetInsertPos(sfxName, sfxSize); // delete archive file while keeping only the SFX DeleteFile(argv[3]); printf("SFX created: %s\n", sfxName);

That's all!

Using the Code and Binaries

The article comes with Packer.cpp and Unpacker.cpp, two examples demonstrating how to use the pack and unpack functionality.

Packer.exe usage

You should always specify full paths because relative paths are not currently supported.

c:>packer e:\temp\bc *.* e:\test.bin

This will pack contents of e:\temp\bc\*.* to e:\test.bin (archive)

If you add 'sfx' as:

c:>packer e:\temp\bc *.* e:\test.bin sfx

an SFX of name e:\test.bin.sfx.exe will be created

Unpacker.exe usage

Make sure you specify a valid output directory:

c:\>unpacker e:\test.bin e:\out

This will unpack contents of e:\test.bin to e:\out\

Sfx.exe usage

The sfx takes only one parameter which is the destination directory.

c:\>sfx.exe e:\out

This will extract to e:\out\

Final Notes

I hope you enjoyed reading this article and learned something new.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

Share

About the Author

Elias Bachaalany
Web Developer
United States United States
Elias (aka lallousx86, @0xeb) has always been interested in the making of things and their inner workings.

His computer interests include system programming, reverse engineering, writing libraries, tutorials and articles.

In his free time, and apart from researching, his favorite reading topics include: dreams, metaphysics, philosophy, psychology and any other human/mystical science.

Former employee of Hex-Rays (the creators of IDA Pro), was responsible about many debugger plugins, IDAPython project ownership and what not.

Elias currently works at Microsoft as a software security engineer.

More articles and blog posts can be found here:

- http://lallousx86.wordpress.com/
- http://0xeb.wordpress.com/
- http://www.hexblog.com/?author=3

You may also be interested in...

Comments and Discussions

 
QuestionRe Pin
Member 1268998817-Aug-16 4:18
memberMember 1268998817-Aug-16 4:18 
QuestionWho can help me? Thanks. Pin
Moer12110-Jun-14 6:38
memberMoer12110-Jun-14 6:38 
QuestionNot that good Pin
Rafael_Yousuf26-Sep-13 5:41
memberRafael_Yousuf26-Sep-13 5:41 
Generalidentify sfx files Pin
chin1019-Jun-08 23:33
memberchin1019-Jun-08 23:33 
Is there anyway we can check weather an exe file is self extractor or not by reading the format of the DOS file?
GeneralRe: identify sfx files Pin
lallous22-Jun-08 20:34
memberlallous22-Jun-08 20:34 
GeneralResources Pin
Peter Ritchie30-Sep-03 3:18
memberPeter Ritchie30-Sep-03 3:18 
GeneralRe: Resources Pin
lallous30-Sep-03 20:36
memberlallous30-Sep-03 20:36 
Generalmissing article's files are here Pin
lallous23-Sep-03 21:03
memberlallous23-Sep-03 21:03 
GeneralFile not Found Pin
Mario M22-Sep-03 16:11
memberMario M22-Sep-03 16:11 
GeneralRe: File not Found Pin
lallous23-Sep-03 20:45
memberlallous23-Sep-03 20:45 
Generalthanks Pin
dang!22-Sep-03 14:23
memberdang!22-Sep-03 14:23 
GeneralRe: thanks Pin
lallous23-Sep-03 21:25
memberlallous23-Sep-03 21:25 
GeneralRe: thanks Pin
dang!23-Sep-03 21:39
memberdang!23-Sep-03 21:39 
GeneralRe: thanks Pin
dang!24-Sep-03 13:49
memberdang!24-Sep-03 13:49 
GeneralRe: thanks Pin
lallous25-Sep-03 2:20
memberlallous25-Sep-03 2:20 
General404 Pin
Dominik Reichl22-Sep-03 6:28
memberDominik Reichl22-Sep-03 6:28 
GeneralRe: 404 Pin
lallous23-Sep-03 21:41
memberlallous23-Sep-03 21:41 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.170627.1 | Last Updated 22 Sep 2003
Article Copyright 2003 by Elias Bachaalany
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid