Skip to content

urllib3.response.GzipDecoder is accidentally quadratic, which allows a malicious server to DoS urllib3 clients #1467

@njsmith

Description

@njsmith

Here's a <100 KB file that the gzip module decompresses in ~200 ms. But if we use urllib3.request.GzipDecoder to decompress it, then it burns CPU for >10 seconds.

In [51]: evil = gzip.compress(b"\x00" * 1032 * 40) * 1350                                      

In [52]: len(evil)                                                                             
Out[52]: 99900

In [53]: %time x = gzip.decompress(evil)                                                       
CPU times: user 230 ms, sys: 11.9 ms, total: 242 ms
Wall time: 240 ms

In [54]: %time x = urllib3.response.GzipDecoder().decompress(evil)                             
CPU times: user 5.87 s, sys: 7.73 s, total: 13.6 s
Wall time: 13.6 s

Since urllib3 attempts to decode gzip files by default, this means a malicious server can easily cause urllib3-based clients to waste tons of CPU time.

The problem is that this is a gzip file with lots and lots of members concatenated together. When urllib3 encounters such a file, it decodes each member in sequence, and accumulates the result into a bytes object via repeated calls to +=.

On a bytes object, each call to += is O(n), so this loop is accidentally-quadratic.

If we make ret a bytearray instead, it fixes the problem:

In [62]: %time x = MyGzipDecoder().decompress(evil)                                            
CPU times: user 167 ms, sys: 8.41 ms, total: 175 ms
Wall time: 174 ms

In this test, the only thing I changed is to replace the line ret = b"" with ret = bytearray(). A real fix would probably want to avoid returning bytearray objects to the user, so I guess you'd either want to accumulate a list-of-bytes and call join at the end, or else accumulate in a bytearray and then convert back to bytes at the end?

Even after this fix I think there's technically still some quadratic behavior in the way we pass .unused_data from one decompression object to the next, but at least that's quadratic in the size of the compressed file, rather than the uncompressed file? I'm not sure if this is triggerable in practice. If we want to be extra careful, we could put an upper bound on how much data we feed into self._obj.decompress on each pass through the loop.

I haven't hit this in the real world; I just noticed it by accident when looking at the code.

I don't think this is a particularly serious vulnerability – gzip decompression inherently allows some amount of DoS (e.g. by sending a file that expands by a factor of 1000 to use up lots of memory). But it is a real issue, and I guess if someone wants to go get a CVE I guess it probably qualifies.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    pFad - Phonifier reborn

    Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

    Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


    Alternative Proxies:

    Alternative Proxy

    pFad Proxy

    pFad v3 Proxy

    pFad v4 Proxy