-
-
Notifications
You must be signed in to change notification settings - Fork 32.5k
bpo-30038: fix race condition in signal delivery + wakeup fd #1082
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-30038: fix race condition in signal delivery + wakeup fd #1082
Conversation
@njsmith, thanks for your PR! By analyzing the history of the files in this pull request, we identified @rosslagerwall, @taleinat and @tiran to be potential reviewers. |
@Haypo may also be relevant, as AFAICT he was the main author of |
3 week ping. This sounds complicated, but it's pretty straightforward really: if you're using a condition variable, e.g. to important a queue, you set the new value then signal the condition to wake up anyone who was waiting. If you do it in the other order, your queue might deadlock. Similarly, here we need to set the variable letting the main thread know that a signal has arrived before we wake it up. The patch is just swapping the order of these two things. Unfortunately, like any race condition, it's difficult to test, but if anyone has a Windows build setup then there's a reproducer in the linked bpo issue. And even if not, the current approach is obviously broken by inspection. |
Modules/signalmodule.c
Outdated
} | ||
|
||
/* And then write to the wakeup fd *after* setting all the globals and | ||
doing the Py_AddPendingCall (bpo-30038) */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to see here the rationale for setting is_tripped before writing into the wake up fd. You already wrote it in the http://bugs.python.org/issue30038 and in the PR description. Just copy it here please ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Before, it was possible to get the following sequence of events (especially on Windows, where the C-level signal handler for SIGINT is run in a separate thread): - SIGINT arrives - trip_signal is called - trip_signal writes to the wakeup fd - the main thread wakes up from select()-or-equivalent - the main thread checks for pending signals, but doesn't see any - the main thread drains the wakeup fd - the main thread goes back to sleep - trip_signal sets is_tripped=1 and calls Py_AddPendingCall to notify the main thread the it should run the Python-level signal handler - the main thread doesn't notice because it's asleep This has been causing repeated failures in the Trio test suite: python-trio/trio#119
aabe7f1
to
e357d5d
Compare
Codecov Report
@@ Coverage Diff @@
## master #1082 +/- ##
==========================================
- Coverage 83.7% 82.69% -1.02%
==========================================
Files 1371 1432 +61
Lines 346665 353018 +6353
==========================================
+ Hits 290179 291920 +1741
- Misses 56486 61098 +4612
Continue to review full report at Codecov.
|
…2075) Before, it was possible to get the following sequence of events (especially on Windows, where the C-level signal handler for SIGINT is run in a separate thread): - SIGINT arrives - trip_signal is called - trip_signal writes to the wakeup fd - the main thread wakes up from select()-or-equivalent - the main thread checks for pending signals, but doesn't see any - the main thread drains the wakeup fd - the main thread goes back to sleep - trip_signal sets is_tripped=1 and calls Py_AddPendingCall to notify the main thread the it should run the Python-level signal handler - the main thread doesn't notice because it's asleep This has been causing repeated failures in the Trio test suite: python-trio/trio#119 (cherry picked from commit 4ae0149)
Before, it was possible to get the following sequence of
events (especially on Windows, where the C-level signal handler for
SIGINT is run in a separate thread):
the main thread the it should run the Python-level signal handler
This has been causing repeated failures in the Trio test suite:
python-trio/trio#119