Skip to content

py/objarray: Fix use-after-free if extending a slice from itself. #14029

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 16 additions & 4 deletions py/objarray.c
Original file line number Diff line number Diff line change
Expand Up @@ -424,6 +424,13 @@ static mp_obj_t array_extend(mp_obj_t self_in, mp_obj_t arg_in) {
if (self->free < len) {
self->items = m_renew(byte, self->items, (self->len + self->free) * sz, (self->len + len) * sz);
self->free = 0;

if (self_in == arg_in) {
// Get arg_bufinfo again in case self->items has moved
//
// (Note not possible to handle case that arg_in is a memoryview into self)
mp_get_buffer_raise(arg_in, &arg_bufinfo, MP_BUFFER_READ);
}
} else {
self->free -= len;
}
Expand Down Expand Up @@ -456,7 +463,8 @@ static mp_obj_t array_subscr(mp_obj_t self_in, mp_obj_t index_in, mp_obj_t value
#if MICROPY_PY_ARRAY_SLICE_ASSIGN
// Assign
size_t src_len;
void *src_items;
uint8_t *src_items;
size_t src_offs = 0;
size_t item_sz = mp_binary_get_size('@', o->typecode & TYPECODE_MASK, NULL);
if (mp_obj_is_obj(value) && MP_OBJ_TYPE_GET_SLOT_OR_NULL(((mp_obj_base_t *)MP_OBJ_TO_PTR(value))->type, subscr) == array_subscr) {
// value is array, bytearray or memoryview
Expand All @@ -469,7 +477,7 @@ static mp_obj_t array_subscr(mp_obj_t self_in, mp_obj_t index_in, mp_obj_t value
src_items = src_slice->items;
#if MICROPY_PY_BUILTINS_MEMORYVIEW
if (mp_obj_is_type(value, &mp_type_memoryview)) {
src_items = (uint8_t *)src_items + (src_slice->memview_offset * item_sz);
src_offs = src_slice->memview_offset * item_sz;
}
#endif
} else if (mp_obj_is_type(value, &mp_type_bytes)) {
Expand Down Expand Up @@ -504,13 +512,17 @@ static mp_obj_t array_subscr(mp_obj_t self_in, mp_obj_t index_in, mp_obj_t value
// TODO: alloc policy; at the moment we go conservative
o->items = m_renew(byte, o->items, (o->len + o->free) * item_sz, (o->len + len_adj) * item_sz);
o->free = len_adj;
// m_renew may have moved o->items
if (src_items == dest_items) {
src_items = o->items;
}
dest_items = o->items;
}
mp_seq_replace_slice_grow_inplace(dest_items, o->len,
slice.start, slice.stop, src_items, src_len, len_adj, item_sz);
slice.start, slice.stop, src_items + src_offs, src_len, len_adj, item_sz);
} else {
mp_seq_replace_slice_no_grow(dest_items, o->len,
slice.start, slice.stop, src_items, src_len, item_sz);
slice.start, slice.stop, src_items + src_offs, src_len, item_sz);
// Clear "freed" elements at the end of list
// TODO: This is actually only needed for typecode=='O'
mp_seq_clear(dest_items, o->len + len_adj, o->len, item_sz);
Expand Down
9 changes: 8 additions & 1 deletion tests/basics/bytearray_add.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,11 @@

# this inplace add tests the code when the buffer doesn't need to be increased
b = bytearray()
b += b''
b += b""

# extend a bytearray from itself
b = bytearray(b"abcdefgh")
for _ in range(4):
c = bytearray(b) # extra allocation, as above
b.extend(b)
print(b)
8 changes: 8 additions & 0 deletions tests/basics/bytearray_add_self.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# add a bytearray to itself
# This is not supported by CPython as of 3.11.18.

b = bytearray(b"123456789")
for _ in range(4):
c = bytearray(b) # extra allocation increases chance 'b' has to relocate
b += b
print(b)
1 change: 1 addition & 0 deletions tests/basics/bytearray_add_self.py.exp
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
bytearray(b'123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789')
18 changes: 12 additions & 6 deletions tests/basics/bytearray_slice_assign.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
l[1:3] = bytearray()
print(l)
l = bytearray(x)
#del l[1:3]
# del l[1:3]
print(l)

l = bytearray(x)
Expand All @@ -28,7 +28,7 @@
l[:3] = bytearray()
print(l)
l = bytearray(x)
#del l[:3]
# del l[:3]
print(l)

l = bytearray(x)
Expand All @@ -38,7 +38,7 @@
l[:-3] = bytearray()
print(l)
l = bytearray(x)
#del l[:-3]
# del l[:-3]
print(l)

# slice assignment that extends the array
Expand All @@ -61,8 +61,14 @@
print(b)

# Growth of bytearray via slice extension
b = bytearray(b'12345678')
b.append(57) # expand and add a bit of unused space at end of the bytearray
b = bytearray(b"12345678")
b.append(57) # expand and add a bit of unused space at end of the bytearray
for i in range(400):
b[-1:] = b'ab' # grow slowly into the unused space
b[-1:] = b"ab" # grow slowly into the unused space
print(len(b), b)

# Growth of bytearray via slice extension from itself
b = bytearray(b"1234567")
for i in range(3):
b[-1:] = b
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there needs to be an additional test that extends a bytearray by a memoryview of itself, where that memoryview is offset by non-zero. That will test the case where src_items was realloc'd by the GC and src_offs is non-zero.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I'd missed there's no existing coverage for this.

Interestingly, this turns up a twist: CPython doesn't support resizing any buffer object which has an active memoryview into it. This code:

# Growth of bytearray via slice extension of a memoryview to itself
b = bytearray(b"1234567")
m = memoryview(b)[2:5]
for _ in range(3):
    b[-1:] = m
print(len(b), b)

Triggers:

Traceback (most recent call last):
  File "/home/gus/ry/george/micropython/tests/basics/bytearray_slice_assign.py", line 80, in <module>
    b[-1:] = m
    ~^^^^^
BufferError: Existing exports of data: object cannot be re-sized

Some explanation here.

You're no doubt more familiar with this than me, but AFAIK the related memory safety issue also exists in MicroPython - if there's a memoryview into a buffer object and an unrelated resize moves the buffer then the memoryview will point to invalid memory. My understanding is this a known (but undocumented?) limitation of memoryviews in Micropython.

In this particular case (buffer extended by memoryview to itself) this patch makes it memory safe, so perhaps I can make a separate test file for this with its own .exp? Plus add some documentation around the difference to CPython. What do you recommend?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there's a memoryview into a buffer object and an unrelated resize moves the buffer then the memoryview will point to invalid memory.

The memoryview will still point to "valid" memory, and importantly memory that won't be GC'd because the memoryview object retains a pointer to the head of the buffer.

In this particular case (buffer extended by memoryview to itself) this patch makes it memory safe, so perhaps I can make a separate test file for this with its own .exp? Plus add some documentation around the difference to CPython. What do you recommend?

Yes, that all sounds good: separate test with .exp, and cpydiff documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The memoryview will still point to "valid" memory, and importantly memory that won't be GC'd because the memoryview object retains a pointer to the head of the buffer.

That's a pretty elegant solution, but I don't think this is how it works at the moment. As per the bug this PR is fixing, resizing calls m_renew which calls through to gc_realloc. If resizing in place fails, gc_realloc allocates a new buffer, copies into it, and then frees the old one with gc_free. So any remaining pointers into that buffer from other memoryviews will become pointers to freed blocks.

Maybe it should work in the way you describe, where the old buffer is left for the GC to pick up once it's no longer used?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If resizing in place fails, gc_realloc allocates a new buffer, copies into it, and then frees the old one with gc_free. So any remaining pointers into that buffer from other memoryviews will become pointers to freed blocks.

Ahh, indeed! That's a problem. But that's kind of separate to this PR, so maybe we can think about that and fix that separately?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up to you. If we were to change resizing to not explicitly free the old buffer then this PR wouldn't be necessary (no use after free would exist here, as the old buffer hadn't been freed yet).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were to change resizing to not explicitly free the old buffer then this PR wouldn't be necessary

Yes, but I feel like that's quite a big change that doesn't benefit many cases. In most cases the code won't be resizing a memoryview and it's definitely beneficial (??) to explicitly free memory doing a realloc.

Also, are there other places in the code (probably confined to py/objarray.c) where we'd also need to be careful not to free memory that may be pointed to by a memoryview?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the PR to address the cases with bytearrays. The cases with memoryviews are much more complex, as we discussed offline. Have added a commit here to document the requirement of not resizing a bytearray that has a memoryview pointing to it, at least until we have a better fix.

print(len(b), b)
12 changes: 12 additions & 0 deletions tests/cpydiff/types_memoryview_invalid.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
"""
categories: Types,memoryview
description: memoryview can become invalid if its target is resized
cause: CPython prevents a ``bytearray`` or ``io.bytesIO`` object from changing size while there is a ``memoryview`` object that references it. MicroPython requires the programmer to manually ensure that an object is not resized while any ``memoryview`` references it.

In the worst case scenario, resizing an object which is the target of a memoryview can cause the memoryview(s) to reference invalid freed memory (a use-after-free bug) and corrupt the MicroPython runtime.
workaround: Do not change the size of any ``bytearray`` or ``io.bytesIO`` object that has a ``memoryview`` assigned to it.
"""
b = bytearray(b"abcdefg")
m = memoryview(b)
b.extend(b"hijklmnop")
print(b, bytes(m))
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy