Streaming ZIP File in PHP Without Temp File






4.73/5 (3 votes)
On-the-fly streaming multiple files or directories in a ZIP file without a temp file
Introduction
When I was trying to implement a "download directory" function into my custom web application, all solutions I could find were based on creating the zip file first and then sending it. In my case, this could result in large temporary files which (as being mostly images) couldn't be compressed anyway.
So I came up with the idea to just create an uncompressed ZIP archive on-the-fly around the raw data - and as I found out, this is quite easy.
Background
For this purpose, it's enough to regard the minimum necessary structure of a ZIP archive: We won't need multi-part files and we won't need extra information stored for each file - and of course, we don't need any knowledge of compression algorithms.
The basic structure of a ZIP archive makes it easy to just assemble it on-the-fly:
| ||||||
| ||||||
| ||||||
| ||||||
| ||||||
| ||||||
| ||||||
| ||||||
|
In detail, let's dive into to some bytes. This is a ZIP file containing an uncompressed text file "test.txt" containing the text "The quick brown fox jumps over the lazy dog." I colored the regions from above and each value and included a summary of each value's meaning:
General Data Types
UInt16
- A 2 byte, 16 bit number in little endian byte order (e.g. 0x1234 = [34, 12])UInt32
- A 4 byte, 32 bit number in little endian byte order (e.g. 0x12345678 = [78, 56, 34, 12]DateTime
- A timestamp with two second accuracy, bit format YYYYYYYmmmmddddd HHHHHiiiiiisssss in little endian byte order, e.g. 2019-01-23 22:33:44:Value Binary In place Year -1980 39 0b00100111 0b01001110001101111011010000110110 Month 1 0b00000001 0b01001110001101111011010000110110 Day 23 0b00010111 0b01001110001101111011010000110110 Hour 22 0b00010110 0b01001110001101111011010000110110 Minute 33 0b00100001 0b01001110001101111011010000110110 Second/2 22 0b00010110 0b01001110001101111011010000110110 0b01001110001101111011010000110110 = 0x4E37B436 => [36, B4, 37, 4E]
CRC-32
- A 4 byte CRC-32 checksum over the file data using the magical number0xdebb20e3
. In PHP, this is the hash algorithm with name "crc32b
".
File Entry
A file entry is a part describing a file and containing its data. File entries are stacked one after another.
Name | Length | Data type | Description |
Signature | 4 | Signature | A file entry signature consisting of "PK " followed by the bytes 03 and 04 |
---|---|---|---|
Version | 2 | UInt16 | The host system and compatibility version - for this purpose, I just use 0x000A indicating Windows/NTFS but it really doesn't matter that much |
Flags | 2 | UInt16 | Options as to how to read this file - for this purpose, I use 0x0800, meaning UFT-8 encoded filename and comments and nothing else |
Compression method | 2 | UInt16 | The method the data was compressed with - for this purpose, 0x0000 is used, meaning "uncompressed" |
Filetime | 4 | UInt32 | The last modification time of the file, no other time is saved, format see above |
Checksum | 4 | UInt32 | The CRC-32 checksum of the file data, format see above |
Compressed size | 4 | UInt32 | The size of the compressed file data - for this purpose, the same as the file size |
Uncompressed size | 4 | UInt32 | The size of the uncompressed file |
Filename length | 2 | UInt16 | The length of the filename |
Extra data length | 2 | UInt16 | The length of the extra data - for this purpose, no extra data is used, so this is always 0x0000 |
Filename | * | String | The filename in UTF-8 encoding |
File data | * | Bytes | The file data - usually compressed but in this case, just the raw data |
Extra data | * | Special | Extra data, e.g., for creation time, attributes and more - for this purpose not used |
Central Directory Entry
A central directory entry contains more detailed data about a file entry. The central directory entries are stacked on another and build a kind of table of content.
Name | Length | Data type | Description |
Signature | 4 | Signature | A central directory entry signature consisting of "PK " followed by the bytes 01 and 02 |
---|---|---|---|
OS version | 2 | UInt16 | The version the archive was made by - for this purpose, I just use 0x003F |
Version | 2 | UInt16 | The minimum required version for extracting - for this purpose, I just use 0x000A |
Flags | 2 | UInt16 | Options as to how to read this file - for this purpose I use 0x0800 , meaning UFT-8 encoded filename and comments and nothing else |
Compression method | 2 | UInt16 | The method the data was compressed with - for this purpose, 0x0000 is used, meaning "uncompressed" |
Filetime | 4 | DateTime | The last modification time of the file, no other time is saved, format see above |
Checksum | 4 | CRC32 | The CRC-32 checksum of the file data, format see above |
Compressed size | 4 | UInt32 | The size of the compressed file data - for this purpose, the same as the file size |
Uncompressed size | 4 | UInt32 | The size of the uncompressed file |
Filename length | 2 | UInt16 | The length of the filename |
Extra data length | 2 | UInt16 | The length of the extra data - for this purpose, no extra data is used, so this is always 0x0000 |
Comment length | 2 | UInt16 | The length of the file comment |
Disk | 2 | UInt16 | The disk number the file is on - for this purpose, I only use a single file so this is always 0x0000 |
Internal attributes | 2 | UInt16 | Attributes for internal usage - for this purpose, this is not used and always 0x0000 |
External attributes | 4 | UInt32 | Attributed for external usage - for this purpose, this is not used and always 0x00000000 |
Offset of file entry | 4 | UInt32 | The offset inside the file where the fileentry to this central directory entry starts |
Filename | * | String | The filename in UTF-8 encoding |
Extra data | * | Special | Extra data, e.g., for creation time, attributes and more - for this purpose, not used |
Comment | * | String | A comment for the described file |
End of Central Directory Entry
This entry only occurs once - at least for this purpose - directly stacked on the last central directory entry.
Name | Length | Data type | Description |
Signature | 4 | Signature | A central directory entry signature consisting of "PK " followed by the bytes 05 and 06 |
---|---|---|---|
Disk index | 2 | UInt16 | The index of this disk - for this purpose, I do not use multiple disks so this is always 0x0000 |
Start disk | 2 | UInt16 | The disk index this central directory starts on - for this purpose, I do not use multiple disks so this is always 0x0000 |
File count, disk | 2 | UInt16 | The number of files on this disk - for this purpose, this is always the total count of included files |
File count, central dir | 2 | UInt16 | The number of files in this central directory - for this purpose, this is always the total count of included files |
Size | 4 | UInt32 | The size of the central directory, excluding this entry |
Offset | 4 | UInt32 | The offset of the first central directory entry on this disk - for this purpose, this is always the offset of the first central directory entry in this file |
Comment length | 2 | UInt16 | The length of the archive comment |
Comment | * | String | The archive comment |
Using the Code
The code is a PHP class named BjSZipper
which includes static and instance functionality depending on the method you choose to use. In both cases, only file information is stored in memory, the file data is streamed just-in-time.
1. Collect Information Then Send (Instance)
This method uses an instance of the class, collects information for each file to send (including calculating CRC-32 checksums) and then starts to send the archive. The profit for the user is that he get's a progress bar because the client get's to know the archive size in advance. The downside is a slightly later start of the download after requesting it - especially if there are a lot of or big files to process.
Methods
__construct($zipName = "download.zip", $comment = "")
The constructor of the BjSZipper
. Takes two parameters:
$zipName
- the filename of the ZIP archive sent to the client, optional, default is "download.zip"$comment
- An archive comment, optional, default is empty
AddDir($path, $recursive = true, $filter = null)
Prepares a path and its content for including into the zip archive. Paths are stored relative to $path
to the archive root. Takes three parameters:
$path
- a directory path to take the files from$recursive
- abool
, iftrue
the directory is scanned recursively, optional, default istrue
$filter
- a Regular Expression for files to include, optional, by default all files are included
AddFile($file, $name = null, $relativePath = "", $comment = "")
Prepares a single file to be included into the archive. Takes four arguments:
$file
- a full file path$name
- the name of the file in the archive, optional, default is the base name of the file$relativePath
- the path of the file inside the archive, optional, default is the archive root, use slash '/
' as path separator$comment
- a file comment, optional, default is empty
AddData($data, $name, $relativePath = '', $comment = '', $filetime = null)
Prepares a single file to be sent from raw data. Takes five parameters:
$data
- the raw data of the file, stored in memory$name
- the name of the file in the archive$relativePath
- the path of the file inside the archive, optional, default is the archive root, use slash '/
' as path separator$comment
- a file comment, optional, default is empty$filetime
- the last modification time of the file, optional, default is current time
Clear()
Resets the instance to start from scratch.
Send()
Sends the collected files in an assembled ZIP archive to the client.
Example
require_once('BjSZipper.php');
// Create a new instance
$zip = new BjSZipper('images.zip');
// Add files and data to send
$zip->AddDir(dirname(__FILE__), true, '/\.(jpg|jpeg)/i'); // All JPEGs recursively
$zip->AddFile('/var/www/html/testdata.bin'); // Just a normal file
$zip->AddData('All the JPEG images.', 'desc.txt'); // A raw text file
// Start sending the archive
$zip->Send();
2. Immediately Start Sending (Static)
This method uses a static approach. Each file is directly sent after collecting its data, file information is stored in memory for the final central directory. The profit is a faster reaction time for the client because the download starts immediately after the first file is processed, also the memory usage is slightly better as only archive relevant data is stored and in case raw data is added that is not kept for later sending. The downside is that the script cannot know the resulting archive size thus there will be no progress display for the client.
Methods
static Begin($zipName = 'downlaod.zip', $unlimitedTime = true)
Sends the download header to the client. Takes two parameters:
$zipName
- the filename of the archive presented to the client, optional, default is 'download.zip'$unlimitedTime
- iftrue
,set_time_limit(0)
is used to disable the PHP execution time limit, optional, default istrue
static SendFile($file, $name = null, $relativePath = '', $comment = '')
Appends a single file to the archive stream to the client. Takes four parameters:
$file
- the full path of the file$name
- the name of the file in the archive, optional, default is the file's base name$relativePath
- the path of the file relative to the archive root, delimiter is a slash '/
', optional, default is the archive root$comment
- a comment for this file, optional, default is empty
static SendDir($path, $recursive, $filter = null)
Appends all specified files from a directory to the archive stream to the client. All files are added relative to $path
to the archive root. Takes three parameters:
$path
- the full path of the directory to get the files from$recursive
- iftrue
, subdirectories are searched also, optional, default istrue
$filter
- a Regular Expression filtering files to add, optional, default is all files found
static SendData($data, $name, $relativePath = '', $comment = '', $filetime = null)
Appends a file from raw data to the archive stream to the client. Takes five parameters:
$data
- the raw data of the file to append$name
- the name of the file in the archive$relativePath
- the path of the file in the archive relative to the archive root, optional, default is the archive root$comment
- a comment for this file, optional, default is empty$filetime
- the file modification time for the file in the archive, optional, default is the current time
static End($comment = '')
Sends the central directory and end part to the client and thus ends the archive. Takes one parameter:
$comment
- a comment for the archive
Example
require_once('BjSZipper.php');
// Send the HTTP headers
BjSZipper::Begin('images.zip');
// Add files and data to send
BjSZipper::SendDir(dirname(__FILE__), true, '/\.(jpg|jpeg)/i'); // All JPEGs recursively
BjSZipper::SendFile('/var/www/html/testdata.bin'); // Just a normal file
BjSZipper::SendData('All the JPEG images.', 'desc.txt'); // A raw text file
// Send the archive directory and end the archive
BjSZipper::End();
Points of Interest
I wrote this code with the aim to get it to work - there are basically no security measures included and almost no exception handling. Please be aware of that when using this.
History
- Version 1.0: Instance and static functionality