Click here to Skip to main content
15,900,108 members

Welcome to the Lounge

   

For discussing anything related to a software developer's life but is not for programming questions. Got a programming question?

The Lounge is rated Safe For Work. If you're about to post something inappropriate for a shared office environment, then don't post it. No ads, no abuse, and no programming questions. Trolling, (political, climate, religious or whatever) will result in your account being removed.

 
GeneralRe: Why do TAR files always need to be decompressed twice? Pin
megaadam17-Jun-18 0:31
professionalmegaadam17-Jun-18 0:31 
GeneralRe: Why do TAR files always need to be decompressed twice? Pin
Maximilien17-Jun-18 1:11
Maximilien17-Jun-18 1:11 
GeneralRe: Why do TAR files always need to be decompressed twice? Pin
Keith Barrow17-Jun-18 23:28
professionalKeith Barrow17-Jun-18 23:28 
GeneralRe: Why do TAR files always need to be decompressed twice? Pin
svella18-Jun-18 2:43
svella18-Jun-18 2:43 
GeneralRe: Why do TAR files always need to be decompressed twice? Pin
vtokar18-Jun-18 4:37
vtokar18-Jun-18 4:37 
GeneralRe: Why do TAR files always need to be decompressed twice? Pin
Bard17-Jun-18 3:25
Bard17-Jun-18 3:25 
GeneralRe: Why do TAR files always need to be decompressed twice? Pin
enhzflep17-Jun-18 10:18
enhzflep17-Jun-18 10:18 
GeneralRe: Why do TAR files always need to be decompressed twice? Pin
kalberts17-Jun-18 22:12
kalberts17-Jun-18 22:12 
enhzflep wrote:
Formats like ZIP employ compression on a file-by-file basis. This is obviously prone to poor rates of compression as compared to a scheme that can compress the entire contents of an archive
If you experience significantly better compression by merging a lot of small files into one, either your average file size is extremely small (like in one classical Unix study showing that for the system as a whole, more than 80% of the files were less than 5 kbytes).
Or, you misinterpret data: it is not poorer compression, but more metadata, administrative information. One large file requires one descriptor, five thousand tiny files require five thousand descriptors. That is not poorer rate of compression, but similar to gathering the five thousand files into one even without compression: That would save the space of 4999 inodes, as well as the internal fragmentation loss - if file sizes are evenly distributed: half an allocation unit (/disk block) per file. You save space by making this huge file, but it has nothing to do with data compression.

If you want to make an exact comparison, you cannot compare the size of the .tar file to the size of the .tar.gz file. That sure would give you the compression rate of the .tar file, but to created the .tar file you had to add a noticable amount of metadata. So what you save by having only one file/compression descriptor, you partially loose to .tar administrative information.

I keep a number of 'archives' of many small files in .zip format, saving space due to the compression, of course, but also a lot is saved by not wasting 2 Kbyte on each file in internal fragmentation.

Another advantage of .zipping up these file groups: I frequently move the files between machines on USB sticks. Writing a few thousand files to a USB stick takes a lot of time to create the files. I guess that it has to do with USB stick writes not being cached, at least not to the same degree, and file creation requires lots of writes, even if the file contents is done in one single write. Writing a single .zip archive to a USB stick is several times faster than writing two thousand tiny files.

A similar situation: We run a fairly large build system, with about a hundred build agents. A build may be producing dozens, in some cases hundreds, of individual artifacts. On the central server, distributing these artifacts, the inode table exploded when each artifact was treated separately. We were forced to modify the builds to pack up related files into archives (usually a single one) to be saved centrally as an artifact of the build.

Most of these advantages comes from the archive file, whether compressed or not. Compression comes as an additional benefit.

When you use .tar.gz as a distribution format, having to untar and ungzip the entire collection is perfectly fine. When using a .zip file as an (often mostly or fully read-only) 'working' archive, extracting a single file quickly is essential. For my use, .tar.gz would be very cumbersome. Also, having the file system retrieving zipped files for applications that do not have unzipping built into the code is great. Of course: A self-explanatory user interface that doesn't require you to memorize a zillion of options and command words, can display the directory structure in the archive, and preview files, is also nice. The ability to encrypt files is valuable as well.

I haven't discovered any real disadvantage of .zip even as a distribution format, but for that purpose, .tar.gz is also fine. However, for daily work, I most certainly prefer a format that lets me access individual files in the archive without having to decrypt, untar and ungzip the entire archive.
GeneralRe: Why do TAR files always need to be decompressed twice? Pin
hpcoder217-Jun-18 20:08
hpcoder217-Jun-18 20:08 
GeneralRe: Why do TAR files always need to be decompressed twice? Pin
pmauriks17-Jun-18 20:54
pmauriks17-Jun-18 20:54 
GeneralRe: Why do TAR files always need to be decompressed twice? Pin
PhM3317-Jun-18 21:19
professionalPhM3317-Jun-18 21:19 
GeneralRe: Why do TAR files always need to be decompressed twice? Pin
Joop Eggen18-Jun-18 4:16
Joop Eggen18-Jun-18 4:16 
GeneralRe: Why do TAR files always need to be decompressed twice? Pin
rjmoses19-Jun-18 0:52
professionalrjmoses19-Jun-18 0:52 
GeneralFIFA World cup Montreal style! Pin
Maximilien16-Jun-18 12:53
Maximilien16-Jun-18 12:53 
GeneralRe: FIFA World cup Montreal style! Pin
OriginalGriff16-Jun-18 19:38
mveOriginalGriff16-Jun-18 19:38 
GeneralRe: FIFA World cup Montreal style! Pin
Munchies_Matt17-Jun-18 23:21
Munchies_Matt17-Jun-18 23:21 
GeneralQ&A Pet Peeves Pin
Eric Lynch16-Jun-18 7:34
Eric Lynch16-Jun-18 7:34 
GeneralRe: Q&A Pet Peeves Pin
PIEBALDconsult16-Jun-18 7:41
mvePIEBALDconsult16-Jun-18 7:41 
GeneralRe: Q&A Pet Peeves Pin
Dave Kreskowiak16-Jun-18 8:04
mveDave Kreskowiak16-Jun-18 8:04 
GeneralRe: Q&A Pet Peeves Pin
lopatir16-Jun-18 8:06
lopatir16-Jun-18 8:06 
GeneralRe: Q&A Pet Peeves Pin
BillWoodruff16-Jun-18 8:16
professionalBillWoodruff16-Jun-18 8:16 
GeneralRe: Q&A Pet Peeves Pin
kmoorevs16-Jun-18 8:30
kmoorevs16-Jun-18 8:30 
GeneralRe: Q&A Pet Peeves Pin
BillWoodruff16-Jun-18 19:04
professionalBillWoodruff16-Jun-18 19:04 
GeneralRe: Q&A Pet Peeves Pin
kmoorevs17-Jun-18 10:05
kmoorevs17-Jun-18 10:05 
GeneralRe: Q&A Pet Peeves Pin
Eddy Vluggen16-Jun-18 13:38
professionalEddy Vluggen16-Jun-18 13:38 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.


Straw Poll

Were you affected by the geomagnetic storms this past weekend?
Communication disruptions, electrified pipes, random unexplained blue-screens in Windows - the list of effects is terrifying.
  Results   482 votes