Zip Attacks
Zip Attacks
Michael Stay
AccessData Corporation
2500 N. University Ave. Ste. 200
Provo, UT 84606
staym@accessdata.com
Abstract. Biham and Kocher demonstrated that the PKZIP stream ci-
pher was weak and presented an attack requiring thirteen bytes of plain-
text. The deflate algorithm “zippers” now use to compress the plaintext
before encryption makes it difficult to get known plaintext. We consi-
der the problem of reducing the amount of known plaintext by finding
other ways to filter key guesses. In most cases we can reduce the amo-
unt of known plaintext from the archived file to two or three bytes,
depending on the zipper used and the number of files in the archive.
For the most popular zippers on the Internet, there is a fast attack
that does not require any information about the files in the archive;
instead, it gets doubly-encrypted plaintext by exploiting a weakness in
the pseudorandom-number generator.
1 Introduction
PKZIP is a compression / archival program created by Phil Katz. Katz had the
foresight to document his file format completely in the file APPNOTE.TXT,
distributed with every copy of PKZIP; there are now literally hundreds of “zip-
per” programs available, and the ZIP file format has become a de facto standard
on the Internet.
In [BK94] Biham and Kocher demonstrated that the PKZIP stream cipher
was weak and presented an attack requiring thirteen bytes of plaintext. Eight
bytes of the plaintext must be contiguous, and all of the bytes must be the
text that was encrypted, which is usually compressed data. [K92] shows that
the compression method used at the time, implode, produces many predictable
bytes suitable for mounting the attack.
Most zippers available today implement only one of the compression methods
defined in APPNOTE.TXT, called deflate. Deflate uses Huffman coding followed
by a variant of Lempel-Ziv. Once the dictionary reaches a certain size, the process
starts over. Since the Huffman codes for any of the data depend on a great deal of
surrounding data, one is forced to guess the plaintext unless one has the original
data. The difficulty of getting known plaintext was one reason Phil Zimmerman
decided to use deflate in PGP [PGP98]. Practically speaking, if one has enough
of the original file to get the thirteen bytes of plaintext required for the attack
in [BK94], one has enough to break the encryption almost instantly.
Without the original file, all is not lost; we have the file’s type as indicated by
its extension, and we have its size. The ZIP file format requires at least one byte
of known plaintext for filtering incorrect passwords. Most zippers also encrypt
output from a pseudorandom number generator that is vulnerable to attack.
It is the author’s opinion that the only reason the PKZIP cipher has held
up so well in light of [BK94] is the high entropy of the data produced by the
deflate algorithm and the related difficulty of getting enough plaintext. This
paper treats the question of how far we can reduce the plaintext requirement
and still break the cipher with a practical amount of work.
Fig. 1. The PKZIP stream cipher. CRC-32, TLCG, CRC-32, truncated pseudo-square.
We can break a file with work equivalent to encrypting around 238 bytes and
negligible memory. We need a total of thirteen bytes of known plaintext: eight
for the attack, and five to filter the 238 keys that remain. This is an upper bound;
each additional byte of plaintext eliminates approximately one list (see [BK94],
fig. 1).
[BK94] throws away six bits in key17 . By using them, we can reduce the plaintext
requirement to twelve bytes at the cost of increasing the work factor by four.
If we have more than one file in the archive, we can make the reasonable as-
sumption that they were encrypted with the same password. Zippers encrypt at
least one check byte into every encrypted file to verify that the user entered the
correct password. Once we have the complete internal state of the cipher, we
can run it backwards to the beginning of the file and read out key0, key1, and
key2. Since this state is the same at the beginning of each file (it only depends
on the password), we can decrypt the check byte in each file and use it to filter
with instead of known plaintext from a single file. This also works if the files are
in different archives, but have the same password.
If the file was created in a zipper with two checksum bytes, we can break
the file with work equivalent to encrypting 11 ∗ 240 ≈ 243 bytes. We need two
checksum bytes followed by only four more known plaintext bytes in one file, and
three other files in the archive (six check bytes) to filter the 240 possible keys.
The factor of eleven in the estimate above is due to the fact that to decrypt the
checksum byte, we must decrypt the first eleven bytes of the random header.
If there is only one checksum byte per archived file, we can break the cipher
with the nearly the same amount of work, but we need seven files in the archive
and five bytes of known plaintext in addition to the checksum byte in the first
file.
The limited diffusion of the internal state prompts us to ask how much of the
state we need to guess to process one byte. If it is small enough, we can guess it
and filter out keys that won’t work with our known stream bytes, then proceed
to the next part.
It turns out that we can get by with as few as 23 bits (See Figure 2.) Note
that we don’t need to guess 16 bits of key00 to calculate the low byte of key01 :
if we distribute the XOR in the definition of crc32(), we see that we only need
to guess 8 bits of crc32(key00 , 0):
130 M. Stay
Now we distribute the multiplication across the addition in the next step:
We separate the equation into parts we know (A) and parts we need to guess
(B), and find we need to guess nine bits, including a possible carry bit. Note
that since we know the low bits of (LSB(key01 ) ∗ 0x08088405), the carry bit
will usually give us more than one bit of information in the form of an upper or
lower bound on the rest of key10 ∗ 0x08088405) that we haven’t guessed yet.
Given a stream byte si+1 , we can find sixty four values for bits 2..15 of key2i .
It’s easy to see why: fourteen bits of key2i produce eight bits of si+1 , so there
are six left over. We can create a table of 256 x 64 bytes such that given si+1 and
bits 10..15 of key2i , we can look up bits 2..9 of key2i . We call this the preimage
table.
We guess bits 10..15 of crc32(key20 , 0) and use s2 , the preimage table, and
crctab[M SB(key11 )] to find bits 2..9 of crc32(key20 , 0). We end up with 223 key
guesses.
To find the next part of the internal state, we have to guess about the same
amount. This guess is not illustrated, but we basically guess about eight more bits
of information in each of the three keys. The only complicated part is separating
what we know about key1 from what we don’t.
We guess bits 8..15 of crc32(key00 , 0) directly; the next guess involving key1
is a little more complicated:
MSB(key12 ) = MSB( (key11 + LSB(key02 )) * 0x08088405 + 1 )
Again, (A) is known and we have to guess (B) nine bits, including a possible carry
bit. The carry bit establishes an upper or lower bound on (key10 ∗ 0xD4652819).
We end this filter by guessing bits 16..23 and bits 0..1 of crc32(key20 , 0) and
calculating a stream byte. We have guessed 27 more bits, but the output byte
has to match s3 , so we expect 223+27−8 = 242 key guesses to pass this filter.
At this point, we have guessed 24 bits of crc32(key20 , 0) and we know s1 .
From this we can calculate, on average, one full value of key20 . There are also
only around 213 possibilities for key10 due to the restrictions from the carry
bits. So the third stage consists of guessing bits 16..23 of crc32(key00 , 0) and
running through the 213 possible values for key10 . We expect 242+13+8−8 = 255
key pieces to pass this filter.
Finally, we guess the last eight bits of key00 and we have a complete internal
state. We will have 263 complete keys to filter with other bytes, whether they
132 M. Stay
are in the archived file or in checksum bytes in other files. The cost is approxi-
mately the same as encrypting 263 bytes under the stream cipher. The plaintext
requirement is four bytes total; at least one of these may come from the file’s
own check byte(s).
This is 128 times faster than guessing three stream bytes and using [BK94].
determine uniquely the seed that was used in the random number generator, and
therefore every ri,j .
Let us emphasize that we do not have known plaintext at this point, in the
sense that [BK94] requires. The random bytes were encrypted twice, so we do
not know the actual output of the stream cipher during the first and second
encryption. What we can derive is the XOR of these stream bytes.
there are five files in the archive that were encrypted consecutively as described
above. Decrypting a file created with this kind of weak PRNG usually takes
under two hours on a 500 MHz Pentium II. One can then take the three keys
and use [BK94]’s second algorithm to derive a password, if one desires, although
the three keys suffice to decrypt the files.
Table 1. PKZip Attack Complexity. Files are assumed to have been archived with two
checksum bytes.
5 Conclusion
The PKZIP stream cipher is very weak. The deflate algorithm makes it harder
to get plaintext, but in most cases we can reduce the plaintext requirement to
the point where one can guess enough plaintext based on file type and size alone.
The most popular zippers on the internet are also susceptible to an attack that
runs in two hours on a single PC based on known plaintext provided by the
application and independent of the archived files themselves.
References
[BK94] Biham, Eli and Paul Kocher. “A Known Plaintext Attack on the PKZIP
Stream Cipher.” Fast Software Encryption 2, Proceedings of the Leuven
Workshop, LNCS 1008, December 1994.
[DL] http://download.cnet.com/downloads/0,10151,0-10097-106-0-1-5,00.
html?tag=st.dl.10097 106 1.lst.lst&
[IZ] ftp://ftp.freesoftware.com/pub/infozip/
[K92] Kocher, Paul. ZIPCRACK 2.00 Documentation. 1992.
http://www.bokler.com/bokler/zipcrack.txt
[PKZ] http://www.pkware.com
[PGP98] User’s Guide, Version 6.0. Network Associates, Inc., 1998. p.145.
http://www.nai.com
[WZ] http://www.winzip.com