The Lounge is rated Safe For Work. If you're about to post something inappropriate for a shared office environment, then don't post it. No ads, no abuse, and no programming questions. Trolling, (political, climate, religious or whatever) will result in your account being removed.
I wonder how much slang has been influenced by Unix (or *nix, if you prefer). Before *nix, slang made some degree of sense to me, but then we got these absurdities like 'less' for displaying av file (yes, I know its history!), GNU, and a thousand 'funny' but made-up and totally meaningless (in the way they are used) names and terms. I see more and more of that creeping into non-computer slang as well: terms with no etymological background related to the application, but with a completely unrelated meaning that is absurd in the context.
Controlling the development of a natural language makes herding cats look like a task for five year olds.
Formats like ZIP employ compression on a file-by-file basis. This is obviously prone to poor rates of compression as compared to a scheme that can compress the entire contents of an archive, or in some cases, data that is contained within lots of small files.
The solution is to slap all of the files together first in a monolithic chunk. You then run compression on that chunk in the (almost always delivered) hope that you'll achieve a smaller output than if the compressed output of all the contained files was then glued together into a single chunk.
TAR - turn a bunch of files into one.
GZ - compress a file.
Formats like ZIP employ compression on a file-by-file basis. This is obviously prone to poor rates of compression as compared to a scheme that can compress the entire contents of an archive
If you experience significantly better compression by merging a lot of small files into one, either your average file size is extremely small (like in one classical Unix study showing that for the system as a whole, more than 80% of the files were less than 5 kbytes).
Or, you misinterpret data: it is not poorer compression, but more metadata, administrative information. One large file requires one descriptor, five thousand tiny files require five thousand descriptors. That is not poorer rate of compression, but similar to gathering the five thousand files into one even without compression: That would save the space of 4999 inodes, as well as the internal fragmentation loss - if file sizes are evenly distributed: half an allocation unit (/disk block) per file. You save space by making this huge file, but it has nothing to do with data compression.
If you want to make an exact comparison, you cannot compare the size of the .tar file to the size of the .tar.gz file. That sure would give you the compression rate of the .tar file, but to created the .tar file you had to add a noticable amount of metadata. So what you save by having only one file/compression descriptor, you partially loose to .tar administrative information.
I keep a number of 'archives' of many small files in .zip format, saving space due to the compression, of course, but also a lot is saved by not wasting 2 Kbyte on each file in internal fragmentation.
Another advantage of .zipping up these file groups: I frequently move the files between machines on USB sticks. Writing a few thousand files to a USB stick takes a lot of time to create the files. I guess that it has to do with USB stick writes not being cached, at least not to the same degree, and file creation requires lots of writes, even if the file contents is done in one single write. Writing a single .zip archive to a USB stick is several times faster than writing two thousand tiny files.
A similar situation: We run a fairly large build system, with about a hundred build agents. A build may be producing dozens, in some cases hundreds, of individual artifacts. On the central server, distributing these artifacts, the inode table exploded when each artifact was treated separately. We were forced to modify the builds to pack up related files into archives (usually a single one) to be saved centrally as an artifact of the build.
Most of these advantages comes from the archive file, whether compressed or not. Compression comes as an additional benefit.
When you use .tar.gz as a distribution format, having to untar and ungzip the entire collection is perfectly fine. When using a .zip file as an (often mostly or fully read-only) 'working' archive, extracting a single file quickly is essential. For my use, .tar.gz would be very cumbersome. Also, having the file system retrieving zipped files for applications that do not have unzipping built into the code is great. Of course: A self-explanatory user interface that doesn't require you to memorize a zillion of options and command words, can display the directory structure in the archive, and preview files, is also nice. The ability to encrypt files is valuable as well.
I haven't discovered any real disadvantage of .zip even as a distribution format, but for that purpose, .tar.gz is also fine. However, for daily work, I most certainly prefer a format that lets me access individual files in the archive without having to decrypt, untar and ungzip the entire archive.
What are you downloading? Most source code bundles are just compressed with the .gz extension.
Maybe if there were a bunch of videos, jpeg or pdf files, which have inbuilt compression, then the uncompressed .tar file might be as short, if not shorter than the tar.gz file. But muscle memory will automatically add the z option to tar to invoke gzip to compress the output. Similar, there's little point is adding the -C option to scp when transferring a .tar.gz file.
is an elaborate way (albeit inefficient) of finding all files ending in ".c" in the current directory and all subdirectories that contain "draw" as any part of the file name to create a tar file which is then zipped to stdout to a file named "temp.tar.gz"
The point here is demonstrate hooking together small programs through piping.
So, because they probably found that many tar files got zipped, someone got the idea of combining zip into the tar command through "tar -cvfz temp.tar.gz filelist". A good idea.
This was intended to be a humorous discussion, but I clearly failed in that regard. Based on the responses, I seem to have unintentionally touched a nerve. Since I can't simply delete the post, and I do believe in transparency, I'll leave it unedited below. Needless to say, I won't be collating responses. I'll just take my lumps and move on...lesson learned
--- Original Message ---
I've been lurking around the CodeProject's Q&A forums a little bit lately and noticing a few trends. I thought it might be fun to share pet peeves...I'm coming down with a case of them lately
I truly don't want to discourage anyone from posting questions, so please be nice, keep it general, and don't call out anyone by name. Remember we all started out needing to learn.
So, to get it started, here are some of my pet peeves. What are yours?
"it didn't work" - Help us. What was "it"? And, how did it fail to meet your expectations?
"nothing happened" - Really, nothing? Could you please be less specific?
"help urgently needed" - What makes it urgent? Should I rush to help?
Also...no quote here, but the purity police rush in, leave an opinion, and utterly fail to answer the question.
I am curious to hear your experiences. In a few days, if there is any interest, I'll try to collate and rank similar observations.
Also, what's the consensus? Should I simply edit this post to include the results? Or, should I make a new post with the results?
I would like to respond to you, Eric, but I am not clear what you are talking about, and whether you are describing your own personal reactions, or describing what you have actually observed.
Rather than "lurking," why not lend a hand, and see how that goes ?
«... thank the gods that they have made you superior to those events which they have not placed within your own control, rendered you accountable for that only which is within you own control For what, then, have they made you responsible? For that which is alone in your own power—a right use of things as they appear.» Discourses of Epictetus Book I:12
I just left a comment on a week-old question that I offered a solution for with no feedback at all from the OP...and it's not the first time. The only time I goto QA is usually on the weekends when it gets boring here.
An interesting question might be 'how many questions have you personally asked in a forum/qa?' I've been here for 11 years and have posted (without looking) maybe 2 questions. Why so few? It's rare that I can't find what I'm looking for without bothering a bunch of strangers who probably have better things to do. I also wouldn't want to discourage anyone from posting questions, but it should be done as a last resort. C'mon, everyone should know how google works in this day and age.
I don't know about others, but I stay away from homework questions. That's what instructors/aides/tutors are paid for.
"Go forth into the source" - Neal Morse
Last Visit: 2-Jun-20 8:53 Last Update: 2-Jun-20 8:53