urllib3.response.GzipDecoder is accidentally quadratic, which allows a malicious server to DoS urllib3 clients

Here's a <100 KB file that the `gzip` module decompresses in ~200 ms. But if we use `urllib3.request.GzipDecoder` to decompress it, then it burns CPU for >10 seconds.

```
In [51]: evil = gzip.compress(b"\x00" * 1032 * 40) * 1350                                      

In [52]: len(evil)                                                                             
Out[52]: 99900

In [53]: %time x = gzip.decompress(evil)                                                       
CPU times: user 230 ms, sys: 11.9 ms, total: 242 ms
Wall time: 240 ms

In [54]: %time x = urllib3.response.GzipDecoder().decompress(evil)                             
CPU times: user 5.87 s, sys: 7.73 s, total: 13.6 s
Wall time: 13.6 s
```

Since urllib3 attempts to decode gzip files by default, this means a malicious server can easily cause urllib3-based clients to waste tons of CPU time.

The problem is that this is a gzip file with lots and lots of members concatenated together. When urllib3 encounters such a file, it decodes each member in sequence, and accumulates the result [into a `bytes` object](https://github.com/urllib3/urllib3/blob/f8d1c787d9b02a70d66ddbde9c99061d9073d54a/src/urllib3/response.py#L72) via [repeated calls to `+=`](https://github.com/urllib3/urllib3/blob/f8d1c787d9b02a70d66ddbde9c99061d9073d54a/src/urllib3/response.py#L77).

On a `bytes` object, each call to `+=` is O(n), so this loop is accidentally-quadratic.

If we make `ret` a `bytearray` instead, it fixes the problem:

```
In [62]: %time x = MyGzipDecoder().decompress(evil)                                            
CPU times: user 167 ms, sys: 8.41 ms, total: 175 ms
Wall time: 174 ms
```

In this test, the only thing I changed is to replace the line `ret = b""` with `ret = bytearray()`. A real fix would probably want to avoid returning bytearray objects to the user, so I guess you'd either want to accumulate a list-of-bytes and call `join` at the end, or else accumulate in a bytearray and then convert back to `bytes` at the end?

Even after this fix I think there's technically still some quadratic behavior in the way we pass `.unused_data` from one decompression object to the next, but at least that's quadratic in the size of the compressed file, rather than the uncompressed file? I'm not sure if this is triggerable in practice. If we want to be extra careful, we could put an upper bound on how much data we feed into `self._obj.decompress` on each pass through the loop.

I haven't hit this in the real world; I just noticed it by accident when looking at the code.

I don't think this is a particularly serious vulnerability – gzip decompression inherently allows some amount of DoS (e.g. by sending a file that expands by a factor of 1000 to use up lots of memory). But it is a real issue, and I guess if someone wants to go get a CVE I guess it probably qualifies.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

urllib3.response.GzipDecoder is accidentally quadratic, which allows a malicious server to DoS urllib3 clients #1467

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Uh oh!

urllib3.response.GzipDecoder is accidentally quadratic, which allows a malicious server to DoS urllib3 clients #1467

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.