Skip to content

Deadlock in threaded application when using sys._current_frames #106883

@mmohrhard

Description

@mmohrhard

Bug report

When using sys._current_frames in a threaded application there is a high risk of a deadlock in the interpreter. Below is my analysis from one such hang. We have never seen this problem with Python 3.8 but experience it regularly with Python 3.11. I know that the sys._current_frames documentation warns about the function and one solution to this bug report could be to document that the function is not thread safe.

We have a huge number of threads in this application but I'll limit the output to the two threads that cause the deadlock.

We have a thread calling sys._current_frames with a Python stack trace:

Thread 118868 (idle): "Profile"
    _gc_callback (distributed/utils_perf.py:184)
    _watch (distributed/profile.py:261)
    run (threading.py:975)
    _bootstrap_inner (threading.py:1038)
    _bootstrap (threading.py:995)

and a corresponding C stack trace:

Thread 325 (Thread 0x7f8b992d6700 (LWP 118868) ""):
#0  0x00007f8ce83adde2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000004eae32 in PyCOND_TIMEDWAIT (us=<optimized out>, mut=<optimized out>, cond=0x879128 <_PyRuntime+392>) at /usr/local/src/conda/python-3.11.4/Python/condvar.h:73
#2  take_gil (tstate=0x7f8b992d54e0) at /usr/local/src/conda/python-3.11.4/Python/ceval_gil.h:250
#3  0x00000000005134bc in eval_frame_handle_pending (tstate=<optimized out>) at /usr/local/src/conda/python-3.11.4/Python/ceval.c:1211
#4  _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>, throwflag=<optimized out>) at /usr/local/src/conda/python-3.11.4/Python/ceval.c:1754
#5  0x000000000055451f in _PyEval_EvalFrame (throwflag=0, frame=0x7f8bb645e248, tstate=0x5c60c20) at /usr/local/src/conda/python-3.11.4/Include/internal/pycore_ceval.h:73                                                                    #6  _PyEval_Vector (kwnames=<optimized out>, argcount=<optimized out>, args=<optimized out>, locals=0x0, func=0x7f8cdc1d2340, tstate=0x5c60c20) at /usr/local/src/conda/python-3.11.4/Python/ceval.c:6439
#7  _PyFunction_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, stack=<optimized out>, func=<function at remote 0x7f8cdc1d2340>) at /usr/local/src/conda/python-3.11.4/Objects/call.c:393
#8  _PyObject_VectorcallTstate (tstate=0x5c60c20, callable=<function at remote 0x7f8cdc1d2340>, args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.4/Include/internal/pycore_call.h:92#9  0x0000000000554172 in method_vectorcall (method=method@entry=<method at remote 0x7f8bce070c00>, args=args@entry=0x7f8b992d57d0, nargsf=<optimized out>, kwnames=kwnames@entry=0x0) at /usr/local/src/conda/python-3.11.4/Objects/classobject.c:89
#10 0x00000000004fe378 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7f8b992d57d0, callable=<method at remote 0x7f8bce070c00>, tstate=0x5c60c20) at /usr/local/src/conda/python-3.11.4/Include/internal/pycore_call.h:92
#11 _PyObject_CallFunctionVa (is_size_t=0, va=0x7f8b992d57b0, format=<optimized out>, callable=<method at remote 0x7f8bce070c00>, tstate=0x5c60c20) at /usr/local/src/conda/python-3.11.4/Objects/call.c:536
#12 PyObject_CallFunction (callable=<method at remote 0x7f8bce070c00>, format=<optimized out>) at /usr/local/src/conda/python-3.11.4/Objects/call.c:558
#13 0x00000000005c23d2 in invoke_gc_callback (phase=0x654129 "stop", generation=<optimized out>, collected=<optimized out>, uncollectable=<optimized out>, tstate=<optimized out>, tstate=<optimized out>) at /usr/local/src/conda/python-3.11.4/Modules/gcmodule.c:1378
#14 0x00000000005c230a in gc_collect_with_callback (tstate=0x5c60c20, generation=0) at /usr/local/src/conda/python-3.11.4/Modules/gcmodule.c:1401
#15 0x000000000053b418 in _PyObject_GC_Link (op=set()) at /usr/local/src/conda/python-3.11.4/Modules/gcmodule.c:2270
#16 gc_alloc (presize=<optimized out>, basicsize=<optimized out>) at /usr/local/src/conda/python-3.11.4/Modules/gcmodule.c:2290
#17 _PyObject_GC_NewVar (nitems=<optimized out>, tp=0x869fe0 <PyFrame_Type>) at /usr/local/src/conda/python-3.11.4/Modules/gcmodule.c:2318
#18 _PyFrame_New_NoTrack (code=<optimized out>, code=<optimized out>) at /usr/local/src/conda/python-3.11.4/Objects/frameobject.c:1024
#19 _PyFrame_MakeAndSetFrameObject (frame=0x7f8b90e95a68) at /usr/local/src/conda/python-3.11.4/Python/frame.c:33
#20 0x00000000004e19df in _PyFrame_GetFrameObject (frame=0x7f8b90e95a68) at /usr/local/src/conda/python-3.11.4/Include/internal/pycore_frame.h:171                                                                                            #21 _PyThread_CurrentFrames () at /usr/local/src/conda/python-3.11.4/Python/pystate.c:1420
#22 0x00000000005061c1 in cfunction_vectorcall_NOARGS (func=<built-in method _current_frames of module object at remote 0x7f8ce8746ca0>, args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.4/Include/cpython/methodobject.h:52
#23 0x000000000051bde1 in _PyObject_VectorcallTstate (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<built-in method _current_frames of module object at remote 0x7f8ce8746ca0>, tstate=0x5c60c20) at /usr/local/src/conda/python-3.11.4/Include/internal/pycore_call.h:92
#24 PyObject_Vectorcall (callable=<built-in method _current_frames of module object at remote 0x7f8ce8746ca0>, args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.4/Objects/call.c:299
#25 0x000000000050ed83 in _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>, throwflag=<optimized out>) at /usr/local/src/conda/python-3.11.4/Python/ceval.c:4774
#26 0x0000000000534f13 in _PyEval_EvalFrame (throwflag=0, frame=0x7f8bb645e188, tstate=0x5c60c20) at /usr/local/src/conda/python-3.11.4/Include/internal/pycore_ceval.h:73
#27 _PyEval_Vector (kwnames=<optimized out>, argcount=0, args=0x7f8bce07e978, locals=0x0, func=<optimized out>, tstate=0x5c60c20) at /usr/local/src/conda/python-3.11.4/Python/ceval.c:6439 
#28 _PyFunction_Vectorcall (func=<optimized out>, stack=0x7f8bce07e978, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.4/Objects/call.c:393
#29 0x000000000053ef0d in _PyVectorcall_Call (kwargs=<optimized out>, tuple=<optimized out>, callable=<function at remote 0x7f8cdc121800>, func=0x534da0 <_PyFunction_Vectorcall>, tstate=0x5c60c20) at /usr/local/src/conda/python-3.11.4/Objects/call.c:257
#30 _PyObject_Call (kwargs=<optimized out>, args=<optimized out>, callable=<function at remote 0x7f8cdc121800>, tstate=0x5c60c20) at /usr/local/src/conda/python-3.11.4/Objects/call.c:328
#31 PyObject_Call (callable=<function at remote 0x7f8cdc121800>, args=<optimized out>, kwargs=<optimized out>) at /usr/local/src/conda/python-3.11.4/Objects/call.c:355
#32 0x000000000051290a in do_call_core (use_tracing=<optimized out>, kwdict={'thread_id': 140243172636480, 'interval': '10ms', 'cycle': '1000ms', 'log': <collections.deque at remote 0x7f8bc008b1f0>, 'omit': ('profile.py', 'selectors.py'), 'stop': <function at remote 0x7f8bc0058220>}, callargs=(), func=<function at remote 0x7f8cdc121800>, tstate=<optimized out>) at /usr/local/src/conda/python-3.11.4/Python/ceval.c:7357
#33 _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>, throwflag=<optimized out>) at /usr/local/src/conda/python-3.11.4/Python/ceval.c:5381
#34 0x000000000055451f in _PyEval_EvalFrame (throwflag=0, frame=0x7f8bb645e020, tstate=0x5c60c20) at /usr/local/src/conda/python-3.11.4/Include/internal/pycore_ceval.h:73
#35 _PyEval_Vector (kwnames=<optimized out>, argcount=<optimized out>, args=<optimized out>, locals=0x0, func=0x7f8ce0e41b20, tstate=0x5c60c20) at /usr/local/src/conda/python-3.11.4/Python/ceval.c:6439
#36 _PyFunction_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, stack=<optimized out>, func=<function at remote 0x7f8ce0e41b20>) at /usr/local/src/conda/python-3.11.4/Objects/call.c:393
#37 _PyObject_VectorcallTstate (tstate=0x5c60c20, callable=<function at remote 0x7f8ce0e41b20>, args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.4/Include/internal/pycore_call.h:92
#38 0x00000000005541f8 in method_vectorcall (method=<optimized out>, args=0x8875d0 <_PyRuntime+58928>, nargsf=<optimized out>, kwnames=0x0) at /usr/local/src/conda/python-3.11.4/Objects/classobject.c:67 
#39 0x0000000000627f6f in thread_run (boot_raw=0x7f8cc9c987b0) at /usr/local/src/conda/python-3.11.4/Modules/_threadmodule.c:1092
#40 0x00000000005f63d4 in pythread_wrapper (arg=<optimized out>) at /usr/local/src/conda/python-3.11.4/Python/thread_pthread.h:241
#41 0x00007f8ce83a9ea5 in start_thread () from /lib64/libpthread.so.0
#42 0x00007f8ce79c9b0d in clone () from /lib64/libc.so.6

So, this thread does not hold the GIL but seems to hold the runtime->interpreters.mutex

We have another thread that seems to hold the GIL:

Thread 388 (Thread 0x7f8b62f7f700 (LWP 120217) ""):
#0  0x00007f8ce83afb3b in do_futex_wait.constprop () from /lib64/libpthread.so.0
#1  0x00007f8ce83afbcf in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2  0x00007f8ce83afc6b in sem_wait@@GLIBC_2.2.5 () from /lib64/libpthread.so.0                                                                                                                                                                #3  0x00000000004ea969 in PyThread_acquire_lock_timed (intr_flag=<optimized out>, microseconds=<optimized out>, lock=<optimized out>) at /usr/local/src/conda/python-3.11.4/Python/thread_pthread.h:497
#4  PyThread_acquire_lock (lock=0x18bd040, waitflag=<optimized out>) at /usr/local/src/conda/python-3.11.4/Python/thread_pthread.h:751
#5  0x00000000005f7266 in tstate_delete_common (tstate=0x7f8cc417c250, gilstate=0x8791d8 <_PyRuntime+568>) at /usr/local/src/conda/python-3.11.4/Python/pystate.c:1080
#6  0x00000000005f6d93 in _PyThreadState_DeleteCurrent (tstate=0x7f8cc417c250) at /usr/local/src/conda/python-3.11.4/Python/pystate.c:1132
#7  0x00000000005f63d4 in pythread_wrapper (arg=<optimized out>) at /usr/local/src/conda/python-3.11.4/Python/thread_pthread.h:241
#8  0x00007f8ce83a9ea5 in start_thread () from /lib64/libpthread.so.0
#9  0x00007f8ce79c9b0d in clone () from /lib64/libc.so.6

See the native_thread_id in the following gdb output.

(gdb) p *PyThreadState_Get()
$10 = {prev = 0x7f8cc4269680, next = 0x7f8c94483340, interp = 0x8875d8 <_PyRuntime+58936>, _initialized = 1, _static = 0, recursion_remaining = 1000, recursion_limit = 1000, recursion_headroom = 0, tracing = 0, tracing_what = 0, 
  cframe = 0x7f8cc417c3a0, c_profilefunc = 0x0, c_tracefunc = 0x0, c_profileobj = 0x0, c_traceobj = 0x0, curexc_type = 0x0, curexc_value = 0x0, curexc_traceback = 0x0, exc_info = 0x7f8cc417c390, dict = 0x0, gilstate_counter = 1, 
  async_exc = 0x0, thread_id = 140236637599488, native_thread_id = 120217, trash_delete_nesting = 0, trash_delete_later = 0x0, on_delete = 0x628f10 <release_sentinel>, on_delete_data = 0x7f8b974b1ee0, 
  coroutine_origin_tracking_depth = 0, async_gen_firstiter = 0x0, async_gen_finalizer = 0x0, context = 0x0, context_ver = 1, id = 1085, trace_info = {code = 0x0, bounds = {ar_start = 0, ar_end = 0, ar_line = 0, opaque = {
        computed_line = 0, lo_next = 0x0, limit = 0x0}}}, datastack_chunk = 0x7f8b99522000, datastack_top = 0x7f8b99522020, datastack_limit = 0x7f8b99526000, exc_state = {exc_value = 0x0, previous_item = 0x0}, root_cframe = {
    use_tracing = 0 '\000', current_frame = 0x0, previous = 0x0}}

Your environment

Python 3.11.4 on a CentOS 7 machine with conda-forge binaries

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    3.11only security fixesinterpreter-core(Objects, Python, Grammar, and Parser dirs)type-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      pFad - Phonifier reborn

      Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

      Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


      Alternative Proxies:

      Alternative Proxy

      pFad Proxy

      pFad v3 Proxy

      pFad v4 Proxy