-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Here's a <100 KB file that the gzip
module decompresses in ~200 ms. But if we use urllib3.request.GzipDecoder
to decompress it, then it burns CPU for >10 seconds.
In [51]: evil = gzip.compress(b"\x00" * 1032 * 40) * 1350
In [52]: len(evil)
Out[52]: 99900
In [53]: %time x = gzip.decompress(evil)
CPU times: user 230 ms, sys: 11.9 ms, total: 242 ms
Wall time: 240 ms
In [54]: %time x = urllib3.response.GzipDecoder().decompress(evil)
CPU times: user 5.87 s, sys: 7.73 s, total: 13.6 s
Wall time: 13.6 s
Since urllib3 attempts to decode gzip files by default, this means a malicious server can easily cause urllib3-based clients to waste tons of CPU time.
The problem is that this is a gzip file with lots and lots of members concatenated together. When urllib3 encounters such a file, it decodes each member in sequence, and accumulates the result into a bytes
object via repeated calls to +=
.
On a bytes
object, each call to +=
is O(n), so this loop is accidentally-quadratic.
If we make ret
a bytearray
instead, it fixes the problem:
In [62]: %time x = MyGzipDecoder().decompress(evil)
CPU times: user 167 ms, sys: 8.41 ms, total: 175 ms
Wall time: 174 ms
In this test, the only thing I changed is to replace the line ret = b""
with ret = bytearray()
. A real fix would probably want to avoid returning bytearray objects to the user, so I guess you'd either want to accumulate a list-of-bytes and call join
at the end, or else accumulate in a bytearray and then convert back to bytes
at the end?
Even after this fix I think there's technically still some quadratic behavior in the way we pass .unused_data
from one decompression object to the next, but at least that's quadratic in the size of the compressed file, rather than the uncompressed file? I'm not sure if this is triggerable in practice. If we want to be extra careful, we could put an upper bound on how much data we feed into self._obj.decompress
on each pass through the loop.
I haven't hit this in the real world; I just noticed it by accident when looking at the code.
I don't think this is a particularly serious vulnerability – gzip decompression inherently allows some amount of DoS (e.g. by sending a file that expands by a factor of 1000 to use up lots of memory). But it is a real issue, and I guess if someone wants to go get a CVE I guess it probably qualifies.