-
-
Notifications
You must be signed in to change notification settings - Fork 32.5k
gh-131253: free-threaded build support for pystats #137189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Allow the --enable-pystats build option to be used with free-threading. For the free-threaded builds, the stats structure is allocated per-thread and then periodically merged into a global stats structure (on thread exit or when the reporting function is called). Summary of changes: * introduce _Py_tss_stats thread-local variable. This is set when stats are on, replacing the _Py_stats global that's used in the non-free-threaded build. * replace _Py_stats references with _PyStats_GET() * move pystats logic from Python/specialize.c into Python/pystats.c * add some free-threaded specific stat counters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The management of the thread local variable seems complex. Can we keep the thread-local stats on PyThreadState
and have _PyStats_GET()
return something like _PyThreadState_GET()->stats
? How much would the extra indirection slow down the stats build?
#define _Py_STATS_COND_EXPR(cond, expr) \ | ||
do { \ | ||
PyStats *s = _PyStats_GET(); \ | ||
if (s != NULL && cond) { \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (s != NULL && cond) { \ | |
if (s != NULL && (cond)) { \ |
Yeah, it's maybe overly complex. I don't think there would be much performance difference since |
If you are considered about exposing it publicly, you can add to it |
Need to do a merge before reporting (I lost that bit of code on a re-factor). Fix various issues with data races. When merging from all threads, we need to stop-the-world to avoid races. When toggling on or off, also need to stop-the-world. Remove the need for locking for _PyStats_Attach().
After some profiling, it seems that having the |
Allow the
--enable-pystats
build option to be used with free-threading. For the free-threaded builds, the stats structure is allocated per-thread and then periodically merged into a per-interpter stats structure (on thread exit or when the reporting function is called).Summary of changes:
introduce
_Py_tss_stats
thread-local variable. This is set when stats are on, replacing the_Py_stats
global.replace
_Py_stats
references with_PyStats_GET()
move pystats logic from Python/specialize.c into Python/pystats.c
move the pystats global state into the interpreter structure
add some free-threaded specific stat counters
Notes and potential issues:
Adding a new thread-local variable would strictly not be necessary. I did so for two reasons. First, I think it should be
slightly faster than using
tstate
. Second, if we did add it to the tstate structure, I think it should properly go into_PyThreadStateImpl
. That seems tricky to do without some major header file re-organization. E.g. thePyStats
structure is needed byPy_INCREF()
andPy_DECREF()
and those don't have access to that structure. At least, I couldn't figure out a simple way to do it.The FTStats counts will need review. I wasn't sure about the naming or even how useful these might be. For
mutex_sleeps
, we need to determine what is the most useful thing to measure. I increment that count whenever we have to "spin" on a mutex.The logic related to
_PyStats_Attach()
,_PyStats_Detach
,_Py_StatsOn()
and_Py_StatsOff()
is intricate and I fear there could be bugs there. Trying to match the behavior of the default build, calling_Py_StatsOn()
will enable pystats recording for all threads immediately. I think calling_PyStats_Off()
should keep the current counts (not clear them).The verbose code for
print_*
andmerge_*
is not ideal. I considered making this data driven instead. However, I think that actually would add complexity, not reduce it.I plan on adding two levels of pystats recording: default and extended. The "default" level would be enabled by default (or at least could be) and would only include stats that are relatively cheap to record. The "extended" level would be like what the current
--enable-pystats
does (counting things like INCREF/DECREF, which is expensive).