Click here to Skip to main content
15,881,027 members
Articles / Desktop Programming / MFC
Article

How to Write a Simple Packer/Unpacker with a Self-Extractor (SFX)

Rate me:
Please Sign up or sign in to vote.
4.00/5 (18 votes)
21 Sep 20035 min read 130.5K   3.2K   53   18
An example of writing a self-extracting archive using pack and unpack routines.

Introduction

In this article I will show how to write a file packer/unpacker and how to make a self-extracting version of the archive (SFX).

Please note this article and code has been written for learning purposes and not for complex functionality, thus the following limitations apply:

  • Only packing of files (binding them into one file) and no compression
  • Packer doesn't pack files in subdirectories
  • Packer header is not really optimized - just enough for our purposes
  • All code presented here compiles as a console application and no GUI version is provided

The Archive File Format

The idea is to build a structure/format that will allow us to hold a file list and file contents in one file in such a way that we will be able to restore the files to their original state.

Thus this design of the pack header:

  • Signature - Offset 0x02/DWORD
    This will occupy the first 4 bytes of the header. It will contain a simple signature that will allow us to identify our packed files.

  • NumOfFiles - Offset 0x04/DWORD
    Here we stored a DWORD holding the number of files in a subject.

  • FilesInfo - Offset 0x08/sizeof(packdata_t)
    Here we start storing the file information in a sequence defined as the array packdata_t FileInfo[NumOfFiles].

    The packdata_t structure is defined as:

    C++
    struct packdata_t
    {
      char FileName[MAX_PATH];
      long filesize;
    }

    As you noticed, we simply save the file's size and name. The packdata_t structure is not the optimal way of storing file names or information, because we could have used a variable length packdata_t struct defined as

    C++
    struct packdata_t
    {
      long filesize;
      // Other file info, such as creation date , attributes, ...
      char filenameLength;
      char FileName[1];
    }

    But, of course, managing this last struct is beyond the scope of this article.

After the pack header we have the files' contents stored in sequence. So the whole archive file format will look like this:

Signature
NumOfFiles
packdata_t Files[NumOfFiles]
File1 content
File2 content
.
.
.
File(NumOfFiles) content

Writing the Packer

In order to make the code a little extensible, I have defined a structure that will hold callback functions triggered from inside the packer/unpacker routines. These callbacks are used for visual notifications and updates.

The callback struct is defined as:

C++
typedef struct
{
  void (*newfile)(char *name, long size);
  void (*fileprogress)(long pos);
} packcallbacks_t;

The newfile() callback is called whenever the packer/unpacker encounters or processes a new file. It will be passed the file's name and size.

The fileprogress() callback is called whenever an operation is in progress. It will be passed the current position that the packer/unpacker is currently processing.

Now, let us define the packfiles function prototype:

C++
int packfilesEx(char *path, char *mask, char *archive,
  packcallbacks_t * pcb = NULL);
  • We need a path that will designate the source directory.
  • The mask which will tell us what files to search for and pack.
  • The archive which will hold the archive file name.
  • An optional pcb which will hold a list of callbacks used for visual notifications.

Before going to the code, here is the packfilesEx() code flow:

  1. Build packdata_t array of all files to be packed (storing their names and size)
  2. Create the archive file and write in it the Signature and file count
  3. Write the packdata_t array into the archive
  4. Start reading every file and write its content in the archive
  5. Loop (4) until all files are stored
  6. Close the archive file

This operation is enough to pack all files into one single archive file. Now we go straight to the code:

int packfilesEx(char *path, char *mask, char *archive, packcallbacks_t *pcb)
{
  TCHAR szCurDir[MAX_PATH];

  // define a vector that will hold the packdata_t array.
  // STL Vectors are stored in contiquous memory.
  std::vector<packdata_t> filesList;
  
  // make sure the current source directory is valid 
  // and change working directory to it if so.

  // save current directory
  GetCurrentDirectory(MAX_PATH, szCurDir);

  // go to new working directory
  if (!SetCurrentDirectory(path))
    return packerrorPath;
    
  WIN32_FIND_DATA fd;
  HANDLE findHandle;
  packdata_t pdata;

  findHandle = FindFirstFile(mask, &fd);
  if (findHandle == INVALID_HANDLE_VALUE)
    return packerrorNoFiles;

  long lTemp;

  // this loop is for storing file's headers only
  // directories are omitted
  do
  {
    // skip directory entries
    if ((fd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)
      == FILE_ATTRIBUTE_DIRECTORY)
      continue;

    // clear record
    memset(&pdata, 0, sizeof(pdata));

    // fill packdata entry
    strcpy(pdata.filename, fd.cFileName);
    pdata.filesize = fd.nFileSizeLow;

    // save entry
    filesList.push_back(pdata);
  } while(FindNextFile(findHandle, &fd));
  FindClose(findHandle);

  FILE *fpArchive = fopen(archive, "wb");
  if (!fpArchive)
    return packerrorCannotCreateArchive;

  // write signature
  lTemp = 'KCPL'; // lallous pack! (L-PCK)
  fwrite(&lTemp, sizeof(lTemp), 1, fpArchive);

  // write entries count
  lTemp = filesList.size();
  fwrite(&lTemp, sizeof(lTemp), 1, fpArchive);

  // store files entries (since std::vector stores elements
  // in a linear manner)
  fwrite(&filesList[0], sizeof(pdata), filesList.size(), fpArchive);

  // process all files to copy
  for (unsigned int cnt=0;cnt<filesList.size();cnt++)
  {
      FILE *inFile = fopen(filesList[cnt].filename, "rb");
    long size = filesList[cnt].filesize;

    // if callback assigned then trigger it
    if (pcb && pcb->newfile)
      pcb->newfile(filesList[cnt].filename, size);

    // copy file name
    long pos = 0;
    while (size > 0)
    {
      char buffer[4096];
      long toread = size > sizeof(buffer) ? sizeof(buffer) : size;
      fread(buffer, toread, 1, inFile);
      fwrite(buffer, toread, 1, fpArchive);
      pos += toread;
      size -= toread;
      if (pcb && pcb->fileprogress)
        pcb->fileprogress(pos);
    }
    fclose(inFile);
  }

  // close archive and restore working directory
  fclose(fpArchive);

  SetCurrentDirectory(szCurDir);
  return packerrorSuccess;
}

Writing the Unpacker

As the packing process has been explained in details, the unpacking part become more obvious; therefore, only the code flow will be presented:

  1. Open archive file
  2. Read pack header
  3. Verify signature - if not valid - report and exit
  4. Having read the pack header (Signature, NumOfFiles, packdata_t array) start extracting the files
  5. Create a new file named packdata_t[idx].FileName and write its contents from the archive file
  6. Process next file
  7. close archive file and exit
int unpackfileEx(char *archive, char *dest, packcallbacks_t * pcb,
  long startPos)
{
  FILE *fpArchive = fopen(archive, "rb");

  // failed to open archive?
  if (!fpArchive)
    return packerrorCouldNotOpenArchive;

  long nFiles;

  if (startPos)
    fseek(fpArchive, startPos, SEEK_SET);

  // read signature
  fread(&nFiles, sizeof(nFiles), 1, fpArchive);
  if (nFiles != 'KCPL')
    return (fclose(fpArchive), packerrorNotAPackedFile);

  // read files entries count
  fread(&nFiles, sizeof(nFiles), 1, fpArchive);

  // no files?
  if (!nFiles)
    return (fclose(fpArchive), packerrorNoFiles);

  // read all files entries
  std::vector<packdata_t> filesList(nFiles);
  fread(&filesList[0], sizeof(packdata_t), nFiles, fpArchive);

  // loop in all files
  for (unsigned int i=0;i<filesList.size();i++)
  {
    FILE *fpOut;
    char Buffer[4096];
    packdata_t *pdata = &filesList[i];

    // trigger callback
    if (pcb && pcb->newfile)
      pcb->newfile(pdata->filename, pdata->filesize);

    strcpy(Buffer, dest);
    strcat(Buffer, pdata->filename);
    fpOut = fopen(Buffer, "wb");
    if (!fpOut)
      return (fclose(fpArchive), packerrorExtractError);

    // how many chunks of Buffer_Size is there is in filesize?
    long size = pdata->filesize;
    long pos = 0;
    while (size > 0)
    {
      long toread =  size > sizeof(Buffer) ? sizeof(Buffer) : size;
      fread(Buffer, toread, 1, fpArchive);
      fwrite(Buffer, toread, 1, fpOut);
      pos += toread;
      size -= toread;
      if (pcb && pcb->fileprogress)
        pcb->fileprogress(pos);
    }
    fclose(fpOut);
    nFiles--;
  }
  fclose(fpArchive);
  return packerrorSuccess;
}

Writing the Self-Extractor (SFX)

The SFX is simply a special version of the unpacker (we will call it UnpackerStub) that instead of taking the archive file as command line it will look for an archive file that is embedded into it.
If you are a math geek you can think of an SFX as "UnpackerStub.exe + Archive.bin = UnpackerArchive.exe".

Now how to embed the archive file into the unpacker to form an SFX?

In order to do that we need to write some information in the UnpackerStub that will help it locate the Archive.bin body.

For this purpose I use the e_res2 field in the IMAGE_DOS_HEADER to store a pointer to the archive data inside the unpacker stub.
Every executable has a well documented and defined format that will instruct and tell the OS how to load/run it. The IMAGE_DOS_HEADER (defined in WINNT.H) is located at offset zero of every exectuable and has the following fields:

C++
typedef struct _IMAGE_DOS_HEADER {    // DOS .EXE header
  WORD   e_magic;                     // Magic number
  WORD   e_cblp;                      // Bytes on last page of file
  WORD   e_cp;                        // Pages in file
  WORD   e_crlc;                      // Relocations
  WORD   e_cparhdr;                   // Size of header in paragraphs
  WORD   e_minalloc;                  // Minimum extra paragraphs needed
  WORD   e_maxalloc;                  // Maximum extra paragraphs needed
  WORD   e_ss;                        // Initial (relative) SS value
  WORD   e_sp;                        // Initial SP value
  WORD   e_csum;                      // Checksum
  WORD   e_ip;                        // Initial IP value
  WORD   e_cs;                        // Initial (relative) CS value
  WORD   e_lfarlc;                    // File address of relocation table
  WORD   e_ovno;                      // Overlay number
  WORD   e_res[4];                    // Reserved words
  WORD   e_oemid;                     // OEM identifier (for e_oeminfo)
  WORD   e_oeminfo;                   // OEM information; e_oemid specific
  WORD   e_res2[10];                  // Reserved words
  LONG   e_lfanew;                    // File address of new exe header
} IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;

I store a pointer to the archive file address into the e_res2 field which is large enough to hold a DWORD. After storing the pointer to the archive, I make sure to append the archive content into the UnpackerStub at that pointer location.

Two functions has been written to get/store the pointer of the archive data:

int SfxSetInsertPos(char *filename, long pos)
{
  FILE *fp = fopen(filename, "rb+");
  if (fp == NULL)               
    return packerrorCouldNotOpenArchive;

  IMAGE_DOS_HEADER idh;

  // read dos header
  fread((void *)&idh, sizeof(idh), 1, fp);

  // adjust position value in an unused MZ field
  *(long *)&idh.e_res2[0] = pos;

  // update header
  rewind(fp);
  fwrite((void *)&idh, sizeof(idh), 1, fp);
  fclose(fp);
  return packerrorSuccess;
}

This function will store the pointer. First it reads the header, updates the e_res2 field then writes the header back again.

int SfxGetInsertPos(char *filename, long *pos)
{
  FILE *fp = fopen(filename, "rb");
  if (fp == NULL)
    return packerrorCouldNotOpenArchive;

  IMAGE_DOS_HEADER idh;

  fread((void *)&idh, sizeof(idh), 1, fp);
  fclose(fp);
  *pos = *(long *)&idh.e_res2[0];
  return packerrorSuccess;
}

This function will read the header and extract the value from the e_res2 field.

In short, the unpacker stub works like this:

  1. Call SfxGetInsertPos() to get the position of the archive file
  2. Call the UnpackFilesEx() while passing the position (start of embedded archive.bin) of the archive file and the archive filename which is itself (computed by calling GetModuleFileName(NULL, ...)

Now I continue to describe how the Packer builds the SFX:

// check if unpackerstub.exe exists
  if (GetFileAttributes(sfxStubFile) == (DWORD)-1)
    {
      printf("SFX stub file not found!");
      return 1;
    }

    // open archive file
    FILE *fpArc = fopen(argv[3], "rb");
    if (!fpArc)
    {
      printf("Failed to open archive!\n");
      return 1;
    }
    // get archive size
    fseek(fpArc, 0, SEEK_END);
    long arcSize = ftell(fpArc);
    rewind(fpArc);

    // form output sfx file name
    char sfxName[MAX_PATH];
    strcpy(sfxName, argv[3]);
    strcat(sfxName, ".sfx.exe");

    // take a copy from SFX
    if (!CopyFile(sfxStubFile, sfxName, FALSE))
    {
      fclose(fpArc);
      printf("Could not create SFX file!\n");
      return 1;
    }

    // append data to SFX
    FILE *fpSfx = fopen(sfxName, "rb+");
    fseek(fpSfx, 0, SEEK_END);

    // get SFX size before archive appending
    long sfxSize = ftell(fpSfx);

    // start appending from archive file to the end of SFX file
    char Buffer[4096 * 2];
    while (arcSize > 0)
    {
      long rw = arcSize > sizeof(Buffer) ? sizeof(Buffer) : arcSize;
      fread(Buffer, rw, 1, fpArc);
      fwrite(Buffer, rw, 1, fpSfx);
      arcSize -= rw;
    }
    fclose(fpArc);
    fclose(fpSfx);

    // mark archive data position inside SFX
    SfxSetInsertPos(sfxName, sfxSize);

    // delete archive file while keeping only the SFX
    DeleteFile(argv[3]);

    printf("SFX created: %s\n", sfxName);

That's all!

Using the Code and Binaries

The article comes with Packer.cpp and Unpacker.cpp, two examples demonstrating how to use the pack and unpack functionality.

Packer.exe usage

You should always specify full paths because relative paths are not currently supported.

c:>packer e:\temp\bc *.* e:\test.bin

This will pack contents of e:\temp\bc\*.* to e:\test.bin (archive)

If you add 'sfx' as:

c:>packer e:\temp\bc *.* e:\test.bin sfx

an SFX of name e:\test.bin.sfx.exe will be created

Unpacker.exe usage

Make sure you specify a valid output directory:

c:\>unpacker e:\test.bin e:\out

This will unpack contents of e:\test.bin to e:\out\

Sfx.exe usage

The sfx takes only one parameter which is the destination directory.

c:\>sfx.exe e:\out

This will extract to e:\out\

Final Notes

I hope you enjoyed reading this article and learned something new.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Web Developer
United States United States
Elias (aka lallousx86, @0xeb) has always been interested in the making of things and their inner workings.

His computer interests include system programming, reverse engineering, writing libraries, tutorials and articles.

In his free time, and apart from researching, his favorite reading topics include: dreams, metaphysics, philosophy, psychology and any other human/mystical science.

Former employee of Microsoft and Hex-Rays (the creators of IDA Pro), was responsible about many debugger plugins, IDAPython project ownership and what not.

Elias currently works as an Anticheat engineer in Blizzard Entertainment.

Elias co-authored 2 books and authored one book:

- Practical Reverse Engineering
- The Antivirus Hacker's Handbook
- The Art of Batch Files Programming

Comments and Discussions

 
QuestionRe Pin
Member 1268998817-Aug-16 4:18
Member 1268998817-Aug-16 4:18 
QuestionWho can help me? Thanks. Pin
Moer12110-Jun-14 6:38
Moer12110-Jun-14 6:38 
QuestionNot that good Pin
Rafael_Yousuf26-Sep-13 5:41
professionalRafael_Yousuf26-Sep-13 5:41 
GeneralModify code for my own use. Pin
xpuser327-Aug-08 21:41
xpuser327-Aug-08 21:41 
Generalidentify sfx files Pin
chin1019-Jun-08 23:33
chin1019-Jun-08 23:33 
GeneralRe: identify sfx files Pin
Elias Bachaalany22-Jun-08 20:34
Elias Bachaalany22-Jun-08 20:34 
GeneralResources Pin
Peter Ritchie30-Sep-03 3:18
Peter Ritchie30-Sep-03 3:18 
GeneralRe: Resources Pin
Elias Bachaalany30-Sep-03 20:36
Elias Bachaalany30-Sep-03 20:36 
Generalmissing article's files are here Pin
Elias Bachaalany23-Sep-03 21:03
Elias Bachaalany23-Sep-03 21:03 
GeneralFile not Found Pin
Mario M22-Sep-03 16:11
Mario M22-Sep-03 16:11 
GeneralRe: File not Found Pin
Elias Bachaalany23-Sep-03 20:45
Elias Bachaalany23-Sep-03 20:45 
Hello,

I have just mailed Codeproject's Webmaster regarding this issue.

Thanks for the report.
Generalthanks Pin
.dan.g.22-Sep-03 14:23
professional.dan.g.22-Sep-03 14:23 
GeneralRe: thanks Pin
Elias Bachaalany23-Sep-03 21:25
Elias Bachaalany23-Sep-03 21:25 
GeneralRe: thanks Pin
.dan.g.23-Sep-03 21:39
professional.dan.g.23-Sep-03 21:39 
GeneralRe: thanks Pin
.dan.g.24-Sep-03 13:49
professional.dan.g.24-Sep-03 13:49 
GeneralRe: thanks Pin
Elias Bachaalany25-Sep-03 2:20
Elias Bachaalany25-Sep-03 2:20 
General404 Pin
Dominik Reichl22-Sep-03 6:28
Dominik Reichl22-Sep-03 6:28 
GeneralRe: 404 Pin
Elias Bachaalany23-Sep-03 21:41
Elias Bachaalany23-Sep-03 21:41 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.