Skip to content

gh-131253: free-threaded build support for pystats #137189

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

nascheme
Copy link
Member

@nascheme nascheme commented Jul 28, 2025

Allow the --enable-pystats build option to be used with free-threading. For the free-threaded builds, the stats structure is allocated per-thread and then periodically merged into a per-interpter stats structure (on thread exit or when the reporting function is called).

Summary of changes:

  • introduce _Py_tss_stats thread-local variable. This is set when stats are on, replacing the _Py_stats global.

  • replace _Py_stats references with _PyStats_GET()

  • move pystats logic from Python/specialize.c into Python/pystats.c

  • move the pystats global state into the interpreter structure

  • add some free-threaded specific stat counters

Notes and potential issues:

  • Adding a new thread-local variable would strictly not be necessary. I did so for two reasons. First, I think it should be
    slightly faster than using tstate. Second, if we did add it to the tstate structure, I think it should properly go into _PyThreadStateImpl. That seems tricky to do without some major header file re-organization. E.g. the PyStats structure is needed by Py_INCREF() and Py_DECREF() and those don't have access to that structure. At least, I couldn't figure out a simple way to do it.

  • The FTStats counts will need review. I wasn't sure about the naming or even how useful these might be. For mutex_sleeps, we need to determine what is the most useful thing to measure. I increment that count whenever we have to "spin" on a mutex.

  • The logic related to _PyStats_Attach(), _PyStats_Detach, _Py_StatsOn() and _Py_StatsOff() is intricate and I fear there could be bugs there. Trying to match the behavior of the default build, calling _Py_StatsOn() will enable pystats recording for all threads immediately. I think calling _PyStats_Off() should keep the current counts (not clear them).

  • The verbose code for print_* and merge_* is not ideal. I considered making this data driven instead. However, I think that actually would add complexity, not reduce it.

  • I plan on adding two levels of pystats recording: default and extended. The "default" level would be enabled by default (or at least could be) and would only include stats that are relatively cheap to record. The "extended" level would be like what the current --enable-pystats does (counting things like INCREF/DECREF, which is expensive).

Allow the --enable-pystats build option to be used with
free-threading.  For the free-threaded builds, the stats structure is
allocated per-thread and then periodically merged into a global stats
structure (on thread exit or when the reporting function is called).

Summary of changes:

* introduce _Py_tss_stats thread-local variable.  This is set
  when stats are on, replacing the _Py_stats global that's used
  in the non-free-threaded build.

* replace _Py_stats references with _PyStats_GET()

* move pystats logic from Python/specialize.c into Python/pystats.c

* add some free-threaded specific stat counters
@nascheme nascheme requested a review from colesbury July 30, 2025 19:17
Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The management of the thread local variable seems complex. Can we keep the thread-local stats on PyThreadState and have _PyStats_GET() return something like _PyThreadState_GET()->stats? How much would the extra indirection slow down the stats build?

#define _Py_STATS_COND_EXPR(cond, expr) \
do { \
PyStats *s = _PyStats_GET(); \
if (s != NULL && cond) { \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (s != NULL && cond) { \
if (s != NULL && (cond)) { \

@nascheme
Copy link
Member Author

The management of the thread local variable seems complex. Can we keep the thread-local stats on PyThreadState and have _PyStats_GET() return something like _PyThreadState_GET()->stats? How much would the extra indirection slow down the stats build?

Yeah, it's maybe overly complex. I don't think there would be much performance difference since tstate is already used so heavily. Is there a problem with adding a pystats entry to the public PyThreadState structure?

@colesbury
Copy link
Contributor

If you are considered about exposing it publicly, you can add to it _PyThreadStateImpl. Every PyThreadState* is actually a _PyThreadStateImpl and the extra fields are places after the publicly visible PyThreadState fields.

Need to do a merge before reporting (I lost that bit of code on a
re-factor).

Fix various issues with data races.  When merging from all threads, we
need to stop-the-world to avoid races.  When toggling on or off, also
need to stop-the-world.  Remove the need for locking for
_PyStats_Attach().
@nascheme
Copy link
Member Author

I don't think there would be much performance difference since tstate is already used so heavily.

After some profiling, it seems that having the pystats pointer in tstate, rather than it's own thread-local adds about 5% extra overhead. Obviously that would depend on a lot of factors. However, given that the full pystats recording in very heavyweight (increments stat counts on incref/decref, for example), I think that's a totally acceptable performance cost in exchange for some simpler code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy