UnZip is a free utility to process zipfiles, as these things are generally called. Zipfiles are actually archives of one or more other files, almost always compressed to save disk space and/or transmission time over the World Wide Web or your never-quite-fast-enough modem. In this regard they are quite similar to compressed tar archives, which are those files ending in .tar.Z , .tar.gz or .tgz that one finds on most Linux ftp sites and many CD-ROM distributions. But there is one major difference: whereas compressed tar archives bundle all of the files together and then compress the result as a single entity, zipfiles do just the opposite, compressing individual files before storing them in the archive. This isn't quite as efficient in terms of achieving the maximal overall compression, but it does allow you to list the archive's contents and to extract individual files without decompressing the whole mess.
$ unzip -l quake92p.zip
Archive: quake92p.zip
Length Date Time Name
------ ---- ---- ----
36064 06-25-96 13:18 DEICE.EXE
369135 06-27-96 03:51 QUAKE92P.1
2618 06-27-96 03:34 README.TXT
177 06-25-96 20:07 INSTALL.BAT
206 06-27-96 03:54 QUAKE92P.DAT
------ -------
408200 5 files
There you have it, short and sweet: each file's name (on the right), its uncompressed size, and the date and time of its last modification. For many of us, however, especially those long steeped in the terse intricacies of ls, this is a little too short and sweet. For fans of ls, or for anyone wishing to know more about the details of the archive, UnZip has an entire mode devoted to listing both useful and obscure zipfile information: ZipInfo mode, triggered via the -Z option. (On some systems the zipinfo command exists as a link to unzip and is synonymous with unzip -Z, but this is not true of Slackware distributions as of this writing.) We'll limit ourselves to a description of the default ZipInfo listing format:
$ unzip -Z quake92p.zip Archive: quake92p.zip 406075 bytes 5 files -rwxa-- 2.0 fat 36064 b- defN 25-Jun-96 13:18 DEICE.EXE -rw-a-- 2.0 fat 369135 b- stor 27-Jun-96 03:51 QUAKE92P.1 -rw-a-- 2.0 fat 2618 t- defN 27-Jun-96 03:34 README.TXT -rwxa-- 2.0 fat 177 t- defN 25-Jun-96 20:07 INSTALL.BAT -rw-a-- 2.0 fat 206 t- defN 27-Jun-96 03:54 QUAKE92P.DAT 5 files, 408200 bytes uncompressed, 405569 bytes compressed: 0.6%
The more astute amongst you will immediately recognize a certain resemblance with ls -l output. The differences stem both from the need to include different information and, in this case, from the fact that the files originated on a FAT file system (the one usually associated with DOS). On the left are the file permissions; all of them are read-write (as opposed to read-only) and have the archive bit set (sometimes used for DOS backups), and two of the files are listed as executables. The next column indicates the version of the archiver, probably PKZIP 2.04g or Zip 2.0.1, and the one after that is what tells us the files came from the (DOS) FAT file system. Next is the uncompressed file sizes; a column indicating, among other things, which files are most likely to be binary and which are probably text; the compression method used on each one; the timestamps; and the full filenames. In addition, the header line gives the archive name, its total size, and the total number of files in it; the trailer gives the number of files listed (in this case all of them), the total uncompressed and compressed data size of the listed files (not counting internal zipfile headers), and the compression ratio. Here the ratio is quite poor, mostly due to the fact that the largest file (QUAKE92P.1) is stored without any compression.
$ unzip quake92p Archive: quake92p.zip inflating: DEICE.EXE extracting: QUAKE92P.1 inflating: README.TXT inflating: INSTALL.BAT inflating: QUAKE92P.DAT
Here we've omitted the .zip suffix; UnZip first looks for the file quake92p and, not finding it, checks for quake92p.zip instead. But suppose we only wanted the README.TXT file? No problemo; anything (well, almost) after the zipfile name is taken to be the name of one of the enclosed files:
$ unzip quake92p README.TXT Archive: quake92p.zip inflating: README.TXT
Ah, but here you may notice a little snag. If you now edit this file in Linux with an editor like vi, you'll see what looks like ^M at the end of each and every line. Or, if you view the file with a pager like more, you'll discover that any line uncovered by the --More-- prompt gets erased immediately. These annoyances are due to the fact that DOS and its successors store text files with two end-of-line characters, CR and LF (a.k.a. carriage return and linefeed, respectively, or ^M and ^J, or ctrl-M and ctrl-J), rather than the more efficient single character (LF) used on all Unix systems. So when a Unix utility--like an editor or a pager or a compiler--looks at a DOS text file, it may behave a little oddly or die altogether.
Fortunately there's a simple solution: UnZip's -a option. Originally a mnemonic for ASCII conversion, the option these days is used for all sorts of text-file conversions. As a single-letter option it does its best to automatically convert files that are supposedly text, while leaving alone those that are marked binary. Be careful! Zip and PKZIP don't always guess correctly when creating the archive, particularly for certain classes of Windows files, and UnZip's ``text'' conversions are almost always irreversible. In other words, don't extract with auto-conversion and then delete the original zipfile without first making sure everything is OK. UnZip does indicate which files it thinks are text when auto-converting, however:
$ unzip -a quake92p Archive: quake92p.zip inflating: DEICE.EXE [binary] extracting: QUAKE92P.1 [binary] inflating: README.TXT [text] inflating: INSTALL.BAT [text] inflating: QUAKE92P.DAT [text]
In this case everything worked as intended. If, for some reason, Zip marked a text file as binary and you want to force text conversion, simply double the option: -aa .
But wait, there's more! The discriminating Linux user, happily accustomed to a file system that not only preserves the case of filenames but also distinguishes between names differing only in case, is not going to settle for a bunch of all-uppercase DOS filenames in his or her directories. Enter the -L option. If (and only if) the file came from a single-case file system like DOS FAT or VMS, unzip -L will convert it to lowercase upon extraction, thusly:
$ unzip -aL quake92p Archive: quake92p.zip inflating: deice.exe [binary] extracting: quake92p.1 [binary] inflating: readme.txt [text] inflating: install.bat [text] inflating: quake92p.dat [text]
``Ah,'' says you, ``much better. The world is a better place because of you.''
$ unzip -t quake92p
Archive: quake92p.zip
testing: DEICE.EXE OK
testing: QUAKE92P.1 OK
testing: README.TXT OK
testing: INSTALL.BAT OK
testing: QUAKE92P.DAT OK
No errors detected in compressed data of quake92p.zip.
OK, but here we only tested one, and the output is a little too verbose--we really only want the one-line summary for each archive. UnZip supports both a -q option for various levels of quietness (the more q's, the quieter) and the concept of wildcards, both for the internal files and for the zipfiles themselves:
$ unzip -tq \*.zip No errors detected in compressed data of arena2b-grr.zip. No errors detected in compressed data of PngSuite.zip. No errors detected in compressed data of libgr2-elf-install.zip. No errors detected in compressed data of ppmz-7.3.zip. arithc.c bad CRC e220fe9c (should be 1c24998c) At least one error was detected in macm.zip. No errors detected in compressed data of xfer-zip151.zip. No errors detected in compressed data of quake091.zip. No errors detected in compressed data of quake92p.zip. No errors detected in compressed data of p93b2200.zip. 8 archives were successfully processed. 1 archive had fatal errors.
First note that the wildcard character (`*') is escaped with a backslash (`\'). Most shells expand wildcards themselves, and if we allowed that, UnZip would see the command line as a list of archives; it would treat the first one as the zipfile name and the rest as files to be tested within the first one. By escaping the wildcard, we allow UnZip to do its own directory search and wildcard-matching--which, incidentally, has the advantage that Unix-style regular expressions (very powerful wildcards) can be used not only under Linux but under all of the operating systems for which UnZip ports exist, even plain old DOS. But we digress.
The other thing to notice is that--egad!--one of the archives has an error in it. Perhaps there was a transmission error, or maybe the original was damaged when it was created; either way, the file arithc.c in macm.zip is probably not going to be usable. It's always good to know these sorts of things sooner rather than later.
There are quite a few other options and modifiers not covered here; a full tutorial would occupy most of this magazine. Fortunately, the UnZip and ZipInfo man pages (man unzip and man zipinfo) contain a complete listing of all of the options and examples of the use of many of them. Unfortunately, Slackware 3.0 and earlier don't include the ZipInfo man page. But an abbreviated summary of ZipInfo's options is available by typing unzip -Z . Similarly, a summary of most of UnZip's options can be had simply by typing unzip with no parameters.
Note also that while Zip and gzip (sometimes called ``GNU zip'') have similar names, a similar heritage--Jean-loup Gailly and Mark Adler are the co-authors of the latter and are also long-standing members of the Info-ZIP group--and the same compression engine, the two programs are basically incompatible. The same goes for UnZip and gunzip. Jean-loup never foresaw the confusion that would arise from the similarity, and yours truly was too late in suggesting the obvious, sick alternative (feather*) to get the name changed. Sigh.
But on a more serious note...as of this writing, the current version of UnZip is 5.2, and 5.21 will be out by the time you read this. While everything discussed above works equally well with the previous version (5.12), there are various new features and other improvements that make 5.2 worth getting. You can find the latest public releases of source code and executables at UUNET's anonymous ftp site:
ftp://ftp.uu.net/pub/archiving/zip/
ftp://ftp.uu.net/pub/archiving/zip/UNIX/LINUX/
You can also find news, history, descriptions of certain weirdos, and pointers to other ftp sites around the world at the following web site:
http://quest.jpl.nasa.gov/Info-ZIP/
Finally, let us conclude with a quick look at the future. UnZip currently has two major deficiencies: no true support for multi-part archives, and no support for reading a zipfile from standard input (that is, as part of a pipe, like you can do with tar). The former is at the top of the to-do list for UnZip 6.0, which will be released by the end of this year or early in 1997 (we hope). In the meantime you can extract multi-part archives by concatenating the pieces together, using zip -F on the result, and then running UnZip on it as with any normal archive. Reading archives from standard input is probably further off due to the large structural changes required, but we recognize that this is a very useful feature to have in any Unix environment. Perhaps in version 6.1.