-
-
Notifications
You must be signed in to change notification settings - Fork 8.3k
py/objarray: Fix use-after-free if extending a slice from itself. #14029
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
dpgeorge
merged 2 commits into
micropython:master
from
projectgus:bugfix/realloc_array_subscr
Apr 22, 2024
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
# add a bytearray to itself | ||
# This is not supported by CPython as of 3.11.18. | ||
|
||
b = bytearray(b"123456789") | ||
for _ in range(4): | ||
c = bytearray(b) # extra allocation increases chance 'b' has to relocate | ||
b += b | ||
print(b) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
bytearray(b'123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789') |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
""" | ||
categories: Types,memoryview | ||
description: memoryview can become invalid if its target is resized | ||
cause: CPython prevents a ``bytearray`` or ``io.bytesIO`` object from changing size while there is a ``memoryview`` object that references it. MicroPython requires the programmer to manually ensure that an object is not resized while any ``memoryview`` references it. | ||
|
||
In the worst case scenario, resizing an object which is the target of a memoryview can cause the memoryview(s) to reference invalid freed memory (a use-after-free bug) and corrupt the MicroPython runtime. | ||
workaround: Do not change the size of any ``bytearray`` or ``io.bytesIO`` object that has a ``memoryview`` assigned to it. | ||
""" | ||
b = bytearray(b"abcdefg") | ||
m = memoryview(b) | ||
b.extend(b"hijklmnop") | ||
print(b, bytes(m)) |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there needs to be an additional test that extends a bytearray by a memoryview of itself, where that memoryview is offset by non-zero. That will test the case where
src_items
was realloc'd by the GC andsrc_offs
is non-zero.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I'd missed there's no existing coverage for this.
Interestingly, this turns up a twist: CPython doesn't support resizing any buffer object which has an active memoryview into it. This code:
Triggers:
Some explanation here.
You're no doubt more familiar with this than me, but AFAIK the related memory safety issue also exists in MicroPython - if there's a memoryview into a buffer object and an unrelated resize moves the buffer then the memoryview will point to invalid memory. My understanding is this a known (but undocumented?) limitation of memoryviews in Micropython.
In this particular case (buffer extended by memoryview to itself) this patch makes it memory safe, so perhaps I can make a separate test file for this with its own .exp? Plus add some documentation around the difference to CPython. What do you recommend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The memoryview will still point to "valid" memory, and importantly memory that won't be GC'd because the memoryview object retains a pointer to the head of the buffer.
Yes, that all sounds good: separate test with .exp, and cpydiff documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a pretty elegant solution, but I don't think this is how it works at the moment. As per the bug this PR is fixing, resizing calls
m_renew
which calls through togc_realloc
. If resizing in place fails,gc_realloc
allocates a new buffer, copies into it, and then frees the old one withgc_free
. So any remaining pointers into that buffer from other memoryviews will become pointers to freed blocks.Maybe it should work in the way you describe, where the old buffer is left for the GC to pick up once it's no longer used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, indeed! That's a problem. But that's kind of separate to this PR, so maybe we can think about that and fix that separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up to you. If we were to change resizing to not explicitly free the old buffer then this PR wouldn't be necessary (no use after free would exist here, as the old buffer hadn't been freed yet).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I feel like that's quite a big change that doesn't benefit many cases. In most cases the code won't be resizing a memoryview and it's definitely beneficial (??) to explicitly free memory doing a realloc.
Also, are there other places in the code (probably confined to
py/objarray.c
) where we'd also need to be careful not to free memory that may be pointed to by a memoryview?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the PR to address the cases with bytearrays. The cases with memoryviews are much more complex, as we discussed offline. Have added a commit here to document the requirement of not resizing a bytearray that has a memoryview pointing to it, at least until we have a better fix.