From: william E Davidsen (davidsen@ariel.crd.GE.COM)
Date: 10/19/92


From: davidsen@ariel.crd.GE.COM (william E Davidsen)
Subject: Re: Use of zip instead of compress
Date: 19 Oct 1992 17:56:04 GMT

In article <sheldon.719268323@pv1413.vincent.iastate.edu>, sheldon@iastate.edu (Steve Sheldon) writes:
| In <SUOPANKI.92Oct15215459@zombie.oulu.fi> suopanki@zombie.oulu.fi (Heikki Suopanki) writes:
|
|
| >I can't see any good reason why we shouldn't use a new and better
| >compressor for Linux. We have to choose one and make it a Linux standard.
| >If it's a part of standard distribution there will be no problems.
|
| I can see plenty. First off, which one are you going to choose? What
| determining factor are you going to use to choose one?
|
| >I personally never touch 'compress' in my Linux system, I always use zoo.
| >When I get new stuff I always convert everything to zoo at university
| >before I download or move stuff with floppies.
|
| So you go to all the trouble of uncompressing, untarring, and then zooing
| everything? Why?

  I had this argument with one of Richard Stahlman's hangers-on at a
presentation a year or so ago. He contended that a compressed tar was
"the UNIX way" (said in a tone of reverence). It was obvious that he had
no idea that an archive was not the same type of thing as a compressed
tar. Let's review why:

1. compression

  Given the same compression algorithm, compressing a collection of
related files is almost always faster and gives better compression.
Unfortunately no one seems to use anything but compress these days,
which is is reasonably fast but does a mediocre job. Therefore in almost
all cases the archive is smaller, and if an archiver is used to compress
a tar file every current archiver (zip,zoo,arj,lha,sqz) will do a better
job.

2. Speed

  An ongoing battle. Until recently compress was the fastest, with only
DWC being significantly faster. However, the recent versions of zip have
been almost as fast. Now that compress has been updated (ncompress from
c.s.reviewed) compress is fastest again.

3. Listings

  An archive has an uncompressed directory. A table of contents takes a
few ms. A compressed tar requires uncompression of the entire archive,
and takes orders of magnitude more CPU.

4. Adding files

  Even if you have a tar which will add to a save set, the file is
uncompressed, modified, and compressed. Slow, and takes a lot of disk
space.

5. Extracting files

  Just like listings, the archiver wins. The directory is searched and
only the needed file is extracted.

6. Deleting or Updating Files

  Another win for the archiver, a delete takes only the time to copy the
compressed data, never expanded. And some archivers, like zoo, allow you
to mark files as deleted and then pack the archive later. This allows
very fast operation if speed is more important than disk space.

7. Documentation

  Many archivers allow a comment on the archive indicating the contents,
and also comments on the individual files showing what the files contain
or do. Again, this is uncompressed and adds size as well as convenience.
That's also why an archive can often be compressed again with the same
archiver, reducing the directory and comment sizes.

  Since the objective of ftp archives is to use small disk space and
have fast download, I really think changing to an archive, or at least
use of a better compressor to compress the tars is technically
desirable. On an individual system the intended use of the data
determins if an archive is desirable or not.

  Sorry if I injected facts into this exchange of opinions, there are
real benefits to be gained from archives, and comments of the "you do
all that work to save 10% on the file size" are not productive.

-- 
bill davidsen, GE Corp. R&D Center; Box 8; Schenectady NY 12345
    Keyboard controller has been disabled, press F1 to continue.