Click here to Skip to main content
14,270,872 members

Streaming ZIP File in PHP Without Temp File

Rate this:
4.73 (3 votes)
Please Sign up or sign in to vote.
4.73 (3 votes)
7 Jun 2019CPOL
On-the-fly streaming multiple files or directories in a ZIP file without a temp file

Introduction

When I was trying to implement a "download directory" function into my custom web application, all solutions I could find were based on creating the zip file first and then sending it. In my case, this could result in large temporary files which (as being mostly images) couldn't be compressed anyway.

So I came up with the idea to just create an uncompressed ZIP archive on-the-fly around the raw data - and as I found out, this is quite easy.

Background

For this purpose, it's enough to regard the minimum necessary structure of a ZIP archive: We won't need multi-part files and we won't need extra information stored for each file - and of course, we don't need any knowledge of compression algorithms.

The basic structure of a ZIP archive makes it easy to just assemble it on-the-fly:

File Entry 1
  File Header 1
  File Data 1
File Entry 2
  File Header 2
  File Data 2
...
File Entry n
  File Header n
  File Data n
Directory Entry 1
Directory Entry 2
...
Directory Entry n
End of Directory

In detail, let's dive into to some bytes. This is a ZIP file containing an uncompressed text file "test.txt" containing the text "The quick brown fox jumps over the lazy dog." I colored the regions from above and each value and included a summary of each value's meaning:

Image 1

General Data Types

  • UInt16 - A 2 byte, 16 bit number in little endian byte order (e.g. 0x1234 = [34, 12])
  • UInt32 - A 4 byte, 32 bit number in little endian byte order (e.g. 0x12345678 = [78, 56, 34, 12]
  • DateTime - A timestamp with two second accuracy, bit format YYYYYYYmmmmddddd HHHHHiiiiiisssss in little endian byte order, e.g. 2019-01-23 22:33:44:
      Value Binary In place
    Year -1980 39 0b00100111 0b01001110001101111011010000110110
    Month 1 0b00000001 0b01001110001101111011010000110110
    Day 23 0b00010111 0b01001110001101111011010000110110
    Hour 22 0b00010110 0b01001110001101111011010000110110
    Minute 33 0b00100001 0b01001110001101111011010000110110
    Second/2 22 0b00010110 0b01001110001101111011010000110110

    0b01001110001101111011010000110110 = 0x4E37B436 => [36, B4, 37, 4E]

  • CRC-32 - A 4 byte CRC-32 checksum over the file data using the magical number 0xdebb20e3. In PHP, this is the hash algorithm with name "crc32b".

File Entry

A file entry is a part describing a file and containing its data. File entries are stacked one after another.

Name Length Data type Description
Signature 4 Signature A file entry signature consisting of "PK" followed by the bytes 03 and 04
Version 2 UInt16 The host system and compatibility version - for this purpose, I just use 0x000A indicating Windows/NTFS but it really doesn't matter that much
Flags 2 UInt16 Options as to how to read this file - for this purpose, I use 0x0800, meaning UFT-8 encoded filename and comments and nothing else
Compression method 2 UInt16 The method the data was compressed with - for this purpose, 0x0000 is used, meaning "uncompressed"
Filetime 4 UInt32 The last modification time of the file, no other time is saved, format see above
Checksum 4 UInt32 The CRC-32 checksum of the file data, format see above
Compressed size 4 UInt32 The size of the compressed file data - for this purpose, the same as the file size
Uncompressed size 4 UInt32 The size of the uncompressed file
Filename length 2 UInt16 The length of the filename
Extra data length 2 UInt16 The length of the extra data - for this purpose, no extra data is used, so this is always 0x0000
Filename * String The filename in UTF-8 encoding
File data * Bytes The file data - usually compressed but in this case, just the raw data
Extra data * Special Extra data, e.g., for creation time, attributes and more - for this purpose not used

Central Directory Entry

A central directory entry contains more detailed data about a file entry. The central directory entries are stacked on another and build a kind of table of content.

Name Length Data type Description
Signature 4 Signature A central directory entry signature consisting of "PK" followed by the bytes 01 and 02
OS version 2 UInt16 The version the archive was made by - for this purpose, I just use 0x003F
Version 2 UInt16 The minimum required version for extracting - for this purpose, I just use 0x000A
Flags 2 UInt16 Options as to how to read this file - for this purpose I use 0x0800, meaning UFT-8 encoded filename and comments and nothing else
Compression method 2 UInt16 The method the data was compressed with - for this purpose, 0x0000 is used, meaning "uncompressed"
Filetime 4 DateTime The last modification time of the file, no other time is saved, format see above
Checksum 4 CRC32 The CRC-32 checksum of the file data, format see above
Compressed size 4 UInt32 The size of the compressed file data - for this purpose, the same as the file size
Uncompressed size 4 UInt32 The size of the uncompressed file
Filename length 2 UInt16 The length of the filename
Extra data length 2 UInt16 The length of the extra data - for this purpose, no extra data is used, so this is always 0x0000
Comment length 2 UInt16 The length of the file comment
Disk 2 UInt16 The disk number the file is on - for this purpose, I only use a single file so this is always 0x0000
Internal attributes 2 UInt16 Attributes for internal usage - for this purpose, this is not used and always 0x0000
External attributes 4 UInt32 Attributed for external usage - for this purpose, this is not used and always 0x00000000
Offset of file entry 4 UInt32 The offset inside the file where the fileentry to this central directory entry starts
Filename * String The filename in UTF-8 encoding
Extra data * Special Extra data, e.g., for creation time, attributes and more - for this purpose, not used
Comment * String A comment for the described file

End of Central Directory Entry

This entry only occurs once - at least for this purpose - directly stacked on the last central directory entry.

Name Length Data type Description
Signature 4 Signature A central directory entry signature consisting of "PK" followed by the bytes 05 and 06
Disk index 2 UInt16 The index of this disk - for this purpose, I do not use multiple disks so this is always 0x0000
Start disk 2 UInt16 The disk index this central directory starts on - for this purpose, I do not use multiple disks so this is always 0x0000
File count, disk 2 UInt16 The number of files on this disk - for this purpose, this is always the total count of included files
File count, central dir 2 UInt16 The number of files in this central directory - for this purpose, this is always the total count of included files
Size 4 UInt32 The size of the central directory, excluding this entry
Offset 4 UInt32 The offset of the first central directory entry on this disk - for this purpose, this is always the offset of the first central directory entry in this file
Comment length 2 UInt16 The length of the archive comment
Comment * String The archive comment

Using the Code

The code is a PHP class named BjSZipper which includes static and instance functionality depending on the method you choose to use. In both cases, only file information is stored in memory, the file data is streamed just-in-time.

1. Collect Information Then Send (Instance)

This method uses an instance of the class, collects information for each file to send (including calculating CRC-32 checksums) and then starts to send the archive. The profit for the user is that he get's a progress bar because the client get's to know the archive size in advance. The downside is a slightly later start of the download after requesting it - especially if there are a lot of or big files to process.

Methods

__construct($zipName = "download.zip", $comment = "")

The constructor of the BjSZipper. Takes two parameters:

  • $zipName - the filename of the ZIP archive sent to the client, optional, default is "download.zip"
  • $comment - An archive comment, optional, default is empty
AddDir($path, $recursive = true, $filter = null)

Prepares a path and its content for including into the zip archive. Paths are stored relative to $path to the archive root. Takes three parameters:

  • $path - a directory path to take the files from
  • $recursive - a bool, if true the directory is scanned recursively, optional, default is true
  • $filter - a Regular Expression for files to include, optional, by default all files are included
AddFile($file, $name = null, $relativePath = "", $comment = "")

Prepares a single file to be included into the archive. Takes four arguments:

  • $file - a full file path
  • $name - the name of the file in the archive, optional, default is the base name of the file
  • $relativePath - the path of the file inside the archive, optional, default is the archive root, use slash '/' as path separator
  • $comment - a file comment, optional, default is empty
AddData($data, $name, $relativePath = '', $comment = '', $filetime = null)

Prepares a single file to be sent from raw data. Takes five parameters:

  • $data - the raw data of the file, stored in memory
  • $name - the name of the file in the archive
  • $relativePath - the path of the file inside the archive, optional, default is the archive root, use slash '/' as path separator
  • $comment - a file comment, optional, default is empty
  • $filetime - the last modification time of the file, optional, default is current time
Clear()

Resets the instance to start from scratch.

Send()

Sends the collected files in an assembled ZIP archive to the client.

Example

require_once('BjSZipper.php');

// Create a new instance
$zip = new BjSZipper('images.zip');

// Add files and data to send
$zip->AddDir(dirname(__FILE__), true, '/\.(jpg|jpeg)/i'); // All JPEGs recursively
$zip->AddFile('/var/www/html/testdata.bin');              // Just a normal file
$zip->AddData('All the JPEG images.', 'desc.txt');        // A raw text file

// Start sending the archive
$zip->Send();

2. Immediately Start Sending (Static)

This method uses a static approach. Each file is directly sent after collecting its data, file information is stored in memory for the final central directory. The profit is a faster reaction time for the client because the download starts immediately after the first file is processed, also the memory usage is slightly better as only archive relevant data is stored and in case raw data is added that is not kept for later sending. The downside is that the script cannot know the resulting archive size thus there will be no progress display for the client.

Methods

static Begin($zipName = 'downlaod.zip', $unlimitedTime = true)

Sends the download header to the client. Takes two parameters:

  • $zipName - the filename of the archive presented to the client, optional, default is 'download.zip'
  • $unlimitedTime - if true, set_time_limit(0) is used to disable the PHP execution time limit, optional, default is true
static SendFile($file, $name = null, $relativePath = '', $comment = '')

Appends a single file to the archive stream to the client. Takes four parameters:

  • $file - the full path of the file
  • $name - the name of the file in the archive, optional, default is the file's base name
  • $relativePath - the path of the file relative to the archive root, delimiter is a slash '/', optional, default is the archive root
  • $comment - a comment for this file, optional, default is empty
static SendDir($path, $recursive, $filter = null)

Appends all specified files from a directory to the archive stream to the client. All files are added relative to $path to the archive root. Takes three parameters:

  • $path - the full path of the directory to get the files from
  • $recursive - if true, subdirectories are searched also, optional, default is true
  • $filter - a Regular Expression filtering files to add, optional, default is all files found
static SendData($data, $name, $relativePath = '', $comment = '', $filetime = null)

Appends a file from raw data to the archive stream to the client. Takes five parameters:

  • $data - the raw data of the file to append
  • $name - the name of the file in the archive
  • $relativePath - the path of the file in the archive relative to the archive root, optional, default is the archive root
  • $comment - a comment for this file, optional, default is empty
  • $filetime - the file modification time for the file in the archive, optional, default is the current time
static End($comment = '')

Sends the central directory and end part to the client and thus ends the archive. Takes one parameter:

  • $comment - a comment for the archive

Example

require_once('BjSZipper.php');

// Send the HTTP headers
BjSZipper::Begin('images.zip');

// Add files and data to send
BjSZipper::SendDir(dirname(__FILE__), true, '/\.(jpg|jpeg)/i'); // All JPEGs recursively
BjSZipper::SendFile('/var/www/html/testdata.bin');              // Just a normal file
BjSZipper::SendData('All the JPEG images.', 'desc.txt');        // A raw text file

// Send the archive directory and end the archive
BjSZipper::End();

Points of Interest

I wrote this code with the aim to get it to work - there are basically no security measures included and almost no exception handling. Please be aware of that when using this.

History

  • Version 1.0: Instance and static functionality

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Bjørn
Software Developer
Germany Germany
I'm working mainly on .NET Compact Framework C# on mobile devices at work. At home it's .NET Full Framework C# and a bit JavaScript.

Comments and Discussions

 
QuestionNew title suggestion for the article Pin
Vladimir Vissoultchev10-Jun-19 1:59
memberVladimir Vissoultchev10-Jun-19 1:59 
New title: Streaming Uncompressed ZIP File in PHP Without Temp File

When I peeked at the source at first I thought these lines were some kind of a mistake:
291  			while (!feof($handle)) {
292  				$buffer = fread($handle, self::BUFFER_SIZE);
293  				echo($buffer);
294  				flush();
295  			}
Then I read the article and found it's actually coded as intended. . .

cheers,
</wqw>

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Article
Posted 7 Jun 2019

Stats

6.8K views
78 downloads
3 bookmarked