-
-
Notifications
You must be signed in to change notification settings - Fork 32.5k
gh-135953: Implement sampling tool under profile.sample #135998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 1 commit
Commits
Show all changes
41 commits
Select commit
Hold shift + click to select a range
82092dd
Move core profiling module into a package
lkollar cd5c814
Add sampling profiler
lkollar b26d1fa
Add pstats.SampledStats to display sample results
lkollar 2e8e0a9
Add collapsed sample profiler output format
lkollar bf9e3fa
Add tests for sampling profiler
lkollar aeca768
fixup! Add tests for sampling profiler
lkollar 7a76f68
Improve CLI
pablogsal 97ba97e
Add more tests
pablogsal 543b13d
Format files
pablogsal a8f1bdd
Formatting improvementts
pablogsal 0440856
Fix small issues
pablogsal 65d60e9
Add news entry
pablogsal bf15570
Moar fixes
pablogsal 219670e
Use the new gil sampling
pablogsal f3dc377
Correct NEWS entry
pablogsal 3002ab8
Add docs
pablogsal 59137e4
Add what's new
pablogsal fe12677
Add what's new 2
pablogsal c28b0e0
Add what's new 2
pablogsal 0c7b1f1
fixup! Add docs
lkollar d7706e6
fixup! fixup! Add docs
lkollar 5c8d939
fixup! fixup! fixup! Add docs
lkollar b75aee1
fixup! Add what's new 2
lkollar acace5b
Fix free threading
pablogsal f50cfd7
Add sample.profile module reference
lkollar 8d5dc18
Merge branch 'main' into sampling-profiler
lkollar d534f1e
Fix whatsnew
lkollar 0360a72
Change profile.stack_collectors to singular
lkollar 4fa0832
Remove redundant len check
lkollar b42812e
Handle process exiting during sampling
lkollar 8d9cbfe
fixup! Handle process exiting during sampling
lkollar 260d934
Protect against permission errors
pablogsal f0de45a
Sync processes in test_sample_profiler
pablogsal dbe2c0a
More fixes
pablogsal bc43ec7
Skip unsupported platforms in profiler tests
lkollar 85f12d0
Add profile directory to Makefile.pre.in
lkollar a33d166
Raise on unsupported platforms in GetPyRuntimeAddress
lkollar 0235127
fixup! Skip unsupported platforms in profiler tests
lkollar 90260a6
Require subprocess support
pablogsal 5683b76
fixup! fixup! Skip unsupported platforms in profiler tests
lkollar 5a83439
Skip test on Android
lkollar File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Format files
- Loading branch information
commit 543b13de14e22ac42c5d1c5e0e9bd6af2ccf70b8
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies if these comments are not 100% accurate but I just had a quick scan of this source and the implementation of the unwinder. My initial reaction is that it is not clear what the data that is being collected is actually representing. Clearly this is not a CPU-time profile (the profiler is just counting stacks, but with a rough conversion factor that depends on the sampling rate one could turn those into physical time estimates, if one really wants to) because we don't know if the collected stacks were on CPU or not. In the build of CPython with the GIL, it looks like we're only sampling the thread that is holding the GIL. One may assume that a thread that holds the GIL is on CPU, but this does not need to be the case (indeed the application might have a "bug" whereby it's holding the GIL while it might actually release it), so the profiles one get are on-GIL profiles, which are not wall-time nor CPU-time profiles in general. I think it would be beneficial if the unwinder returned extra information, such as whether the stack was (likely) on CPU, whether its thread was (likely) holding the GIL. And if one wants a wall-time mode for the profiler, sampling just the on-GIL thread won't provide an accurate picture of where each thread is spending its wall-time.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed feedback - you raise very important points about the semantic clarity of what we’re actually measuring here. You’re absolutely right that the current state creates “on-GIL profiles” rather than true CPU-time or wall-time profiles, and that holding the GIL doesn’t guarantee CPU execution.
The missing point here is that the profiler is not finished yet and there are plenty of things we still need to finalize (we are also waiting for the PEP to be approved to put it into its final namespace). Right now we’re focused on getting the base infrastructure in place and working reliably across platforms (we don't even have yet the mode to run scripts or modules, only attaching).
What we had in mind here is something close to what you’re hinting at - in GIL mode, we want to avoid re-sampling threads that aren’t actually moving (i.e., would produce identical stack traces) so the idea is that the profiler only samples the thread with the GIL and signals that the other stacks are the same to the frontend . The frontend will then use the last samples to calculate the stats. For now we don’t have the signaling infrastructure for that, but it’s on the roadmap.
I’m particularly intrigued by your point about CPU detection. Do you have any concrete plan in mind for what you propose? Unless I am missing something there’s no good portable way to reliably determine if a thread is actually on-CPU from a remote process without sacrificing significant performance. On Linux we could theoretically examine kernel stacks from /proc or check the stat pseudohandle but that’s racy, likely slow, and doesn’t work on macOS or Windows. And I don't look forward to start calling NtQuerySystemInformation all over the place
Do you have thoughts on practical approaches for this? What’s your take on the best path forward for providing more semantic clarity about what the samples actually represent?
Also, PRs welcomed! 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah fair enough.
That's an interesting idea, but I struggle to see how this could work 🤔 If a thread is essentially always off-GIL then it will never be sampled. Or it could switch between idle functions but the stack would never get re-sampled by the profiler. For example, consider this case
The profiler will likely see
a
on the stack, but might miss the samples whereb
is on the stack because the thread will be off-GIL. Then in wall-time profiles it will look as ifa
was the only function running on that thread.Well this approach is already racy by nature since threads are not being stopped before taking samples so there could be all sorts of inconsistencies already. In Austin we have platform-dependent implementations of
py_thread__is_idle
that determine whether a thread is idle or not (and yes it usesNtQuerySystemInformation
on Windows). Surely it is racy, but I don't know of other ways of finding out about the thread status. In all my experiments the accuracy is pretty good, and the overhead not too bad. With simple stacks Austin can still sample at over 100 kHz, with the main overhead coming from the remote reads of datastack chunks.I think samples would have to include the CPU state of threads at the very least to provide both a wall- and CPU-time modes, which are pretty common for profilers. The GIL state might be added bonus for e.g. GIL contention analysis, figuring out if there is a lot of idle time spent with the GIL held etc...)
I'm currently short on bandwidth so I don't think I'll be able to contribute much in the short-term, but I'm more than happy to review PRs if needed and share more of my experience with developing Austin. Also I wonder if there isn't a better place to take this discussion to so to have all the details in one place?