Skip to content

gh-132461: Fix crash in OrderedDict.setdefault when key has unstable hash #132462

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

dura0ok
Copy link
Contributor

@dura0ok dura0ok commented Apr 13, 2025

Replaced assertion with a runtime check in OrderedDict_setdefault_impl to gracefully handle keys with changing hash values. If such a key is detected, a TypeError is raised instead of aborting the interpreter.

@bedevere-app
Copy link

bedevere-app bot commented Apr 13, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@dura0ok dura0ok changed the title Fix crash in OrderedDict.setdefault when key has unstable hash gh-132461: Fix crash in OrderedDict.setdefault when key has unstable hash Apr 13, 2025
@picnixz picnixz requested a review from rhettinger April 13, 2025 07:08
Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a regression test and a NEWS entry as well.

@bedevere-app
Copy link

bedevere-app bot commented Apr 13, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

Replaced assertion with a runtime check in OrderedDict_setdefault_impl
to gracefully handle keys with changing hash values. If such a key is
detected, a TypeError is raised instead of aborting the interpreter.
@bedevere-app
Copy link

bedevere-app bot commented Apr 13, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

- Add test_unstable_hash_should_not_abort_issue132461 to ensure that
  OrderedDict handles keys with unstable hashes gracefully.
- Simulates randomized hashes using a custom metaclass and verifies that
  an exception is raised rather than a crash occurring.
- References pythongh-132461 to track and validate regression coverage.
@bedevere-app
Copy link

bedevere-app bot commented Apr 13, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@dura0ok
Copy link
Contributor Author

dura0ok commented Apr 13, 2025

Please add a regression test and a NEWS entry as well.

Added.

Comment on lines +735 to +747
large_num = 2**64

class WeirdBase(abc.ABCMeta):
def __hash__(self):
return randint(0, large_num)

class weird_bytes(bytes, metaclass=WeirdBase):
pass

obj = self.OrderedDict()
with self.assertRaises(Exception):
for x in range(100):
obj.setdefault(weird_bytes, None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
large_num = 2**64
class WeirdBase(abc.ABCMeta):
def __hash__(self):
return randint(0, large_num)
class weird_bytes(bytes, metaclass=WeirdBase):
pass
obj = self.OrderedDict()
with self.assertRaises(Exception):
for x in range(100):
obj.setdefault(weird_bytes, None)
class A:
c = 0
def __hash__(self):
self.c += 1
return self.c
a = A()
obj = self.OrderedDict()
msg = "OrderedDict key appears to change its hash value over time"
with self.assertRaisesRegex(RuntimeError, msg):
for _ in range(100):
obj.setdefault(a, None)

We can also remove abc dependency and randint import.

Copy link
Contributor Author

@dura0ok dura0ok Apr 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

part of test

        with self.assertRaises(KeyError):
            for x in range(100):
                obj.setdefault(weird_bytes, None)

error

ERROR: test_unstable_hash_should_not_abort_issue132461 (test.test_ordered_dict.CPythonOrderedDictTests.test_unstable_hash_should_not_abort_issue132461)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/learning/cpython/Lib/test/test_ordered_dict.py", line 747, in test_unstable_hash_should_not_abort_issue132461
    obj.setdefault(weird_bytes, None)
    ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
RuntimeError: OrderedDict key appears to change its hash value over time

fixed part of test

        with self.assertRaises(RuntimeError):
            for x in range(100):
                obj.setdefault(weird_bytes, None)
ERROR: test_unstable_hash_should_not_abort_issue132461 (test.test_ordered_dict.CPythonOrderedDictSubclassTests.test_unstable_hash_should_not_abort_issue132461)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/learning/cpython/Lib/test/test_ordered_dict.py", line 747, in test_unstable_hash_should_not_abort_issue132461
    obj.setdefault(weird_bytes, None)
    ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
KeyError: <class 'test.test_ordered_dict.OrderedDictTests.test_unstable_hash_should_not_abort_issue132461.<locals>.weird_bytes'>

======================================================================
ERROR: test_unstable_hash_should_not_abort_issue132461 (test.test_ordered_dict.PurePythonOrderedDictSubclassTests.test_unstable_hash_should_not_abort_issue132461)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/learning/cpython/Lib/test/test_ordered_dict.py", line 747, in test_unstable_hash_should_not_abort_issue132461
    obj.setdefault(weird_bytes, None)
    ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
  File "/learning/cpython/Lib/collections/__init__.py", line 274, in setdefault
    return self[key]
           ~~~~^^^^^
KeyError: <class 'test.test_ordered_dict.OrderedDictTests.test_unstable_hash_should_not_abort_issue132461.<locals>.weird_bytes'>

======================================================================
ERROR: test_unstable_hash_should_not_abort_issue132461 (test.test_ordered_dict.PurePythonOrderedDictTests.test_unstable_hash_should_not_abort_issue132461)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/learning/cpython/Lib/test/test_ordered_dict.py", line 747, in test_unstable_hash_should_not_abort_issue132461
    obj.setdefault(weird_bytes, None)
    ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
  File "/learning/cpython/Lib/collections/__init__.py", line 274, in setdefault
    return self[key]
           ~~~~^^^^^
KeyError: <class 'test.test_ordered_dict.OrderedDictTests.test_unstable_hash_should_not_abort_issue132461.<locals>.weird_bytes'>

Very strange

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, we should only catch RuntimeError and not arbitrary exceptions. The assertion failing may be an indication that something else is wrong.

@picnixz
Copy link
Member

picnixz commented Apr 13, 2025

Please, add the NEWS entry mentioning the bugfix.

@picnixz
Copy link
Member

picnixz commented Apr 13, 2025

Note that the Python implementation would not crash so the test should be only meant for the C implementation.

@dura0ok
Copy link
Contributor Author

dura0ok commented Apr 13, 2025

Note that the Python implementation would not crash so the test should be only meant for the C implementation.

I believe that the behavior should be the same. Why only for the C interface?

@dura0ok
Copy link
Contributor Author

dura0ok commented Apr 13, 2025

Please, add the NEWS entry mentioning the bugfix.

How i can do it? Please, help.

@picnixz
Copy link
Member

picnixz commented Apr 13, 2025

I believe that the behavior should be the same. Why only for the C interface?

The behavior is not the same. The pure python implementation does not use the same implementation and does not have the same construction (but the functionalities are identical; it's just that some bugs will only appear in the C version)

How i can do it? Please, help.

Using https://pypi.org/project/blurb/. See #132462 (comment) for instance.

@rhettinger rhettinger removed their request for review April 22, 2025 04:51
@emmatyping
Copy link
Member

@dura0ok would you like to continue this PR?

@dura0ok
Copy link
Contributor Author

dura0ok commented May 26, 2025

@emmatyping yes.

@dura0ok
Copy link
Contributor Author

dura0ok commented Jun 2, 2025

Hi! While the added assertion and error check prevent crashes when keys have unstable hash values, the current solution still leaves us in a difficult position.

To correctly and reliably detect that a key with the same equality (eq) but a different hash (hash) has already been inserted, we would need to store the original hash for each key in the dict. This introduces overhead — both in terms of memory and performance — just to protect against a relatively rare "misbehavior" (i.e., keys with unstable hashes).

It feels like this adds non-trivial cost to all users just to catch a developer error. Is there a better trade-off here? Maybe this should just remain undefined behavior, or at most be a debug-only check?

I’m unsure how to proceed — any suggestions would be appreciated.


class WeirdBase(abc.ABCMeta):
def __hash__(self):
return randint(0, large_num)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to return a new one on each call, otherwise this test will sometimes be flaky, when random will produce the same number twice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy