Skip to content

gh-135953: Implement sampling tool under profile.sample #135998

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 41 commits into from
Jul 10, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
82092dd
Move core profiling module into a package
lkollar Jun 4, 2025
cd5c814
Add sampling profiler
lkollar Jul 3, 2025
b26d1fa
Add pstats.SampledStats to display sample results
lkollar Jun 22, 2025
2e8e0a9
Add collapsed sample profiler output format
lkollar Jun 22, 2025
bf9e3fa
Add tests for sampling profiler
lkollar Jun 19, 2025
aeca768
fixup! Add tests for sampling profiler
lkollar Jul 3, 2025
7a76f68
Improve CLI
pablogsal Jul 6, 2025
97ba97e
Add more tests
pablogsal Jul 6, 2025
543b13d
Format files
pablogsal Jul 6, 2025
a8f1bdd
Formatting improvementts
pablogsal Jul 6, 2025
0440856
Fix small issues
pablogsal Jul 6, 2025
65d60e9
Add news entry
pablogsal Jul 6, 2025
bf15570
Moar fixes
pablogsal Jul 6, 2025
219670e
Use the new gil sampling
pablogsal Jul 6, 2025
f3dc377
Correct NEWS entry
pablogsal Jul 6, 2025
3002ab8
Add docs
pablogsal Jul 6, 2025
59137e4
Add what's new
pablogsal Jul 6, 2025
fe12677
Add what's new 2
pablogsal Jul 6, 2025
c28b0e0
Add what's new 2
pablogsal Jul 6, 2025
0c7b1f1
fixup! Add docs
lkollar Jul 6, 2025
d7706e6
fixup! fixup! Add docs
lkollar Jul 6, 2025
5c8d939
fixup! fixup! fixup! Add docs
lkollar Jul 6, 2025
b75aee1
fixup! Add what's new 2
lkollar Jul 6, 2025
acace5b
Fix free threading
pablogsal Jul 7, 2025
f50cfd7
Add sample.profile module reference
lkollar Jul 7, 2025
8d5dc18
Merge branch 'main' into sampling-profiler
lkollar Jul 7, 2025
d534f1e
Fix whatsnew
lkollar Jul 9, 2025
0360a72
Change profile.stack_collectors to singular
lkollar Jul 9, 2025
4fa0832
Remove redundant len check
lkollar Jul 9, 2025
b42812e
Handle process exiting during sampling
lkollar Jul 9, 2025
8d9cbfe
fixup! Handle process exiting during sampling
lkollar Jul 9, 2025
260d934
Protect against permission errors
pablogsal Jul 9, 2025
f0de45a
Sync processes in test_sample_profiler
pablogsal Jul 9, 2025
dbe2c0a
More fixes
pablogsal Jul 9, 2025
bc43ec7
Skip unsupported platforms in profiler tests
lkollar Jul 10, 2025
85f12d0
Add profile directory to Makefile.pre.in
lkollar Jul 10, 2025
a33d166
Raise on unsupported platforms in GetPyRuntimeAddress
lkollar Jul 10, 2025
0235127
fixup! Skip unsupported platforms in profiler tests
lkollar Jul 10, 2025
90260a6
Require subprocess support
pablogsal Jul 10, 2025
5683b76
fixup! fixup! Skip unsupported platforms in profiler tests
lkollar Jul 10, 2025
5a83439
Skip test on Android
lkollar Jul 10, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Format files
  • Loading branch information
pablogsal committed Jul 6, 2025
commit 543b13de14e22ac42c5d1c5e0e9bd6af2ccf70b8
114 changes: 71 additions & 43 deletions Lib/profile/sample.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies if these comments are not 100% accurate but I just had a quick scan of this source and the implementation of the unwinder. My initial reaction is that it is not clear what the data that is being collected is actually representing. Clearly this is not a CPU-time profile (the profiler is just counting stacks, but with a rough conversion factor that depends on the sampling rate one could turn those into physical time estimates, if one really wants to) because we don't know if the collected stacks were on CPU or not. In the build of CPython with the GIL, it looks like we're only sampling the thread that is holding the GIL. One may assume that a thread that holds the GIL is on CPU, but this does not need to be the case (indeed the application might have a "bug" whereby it's holding the GIL while it might actually release it), so the profiles one get are on-GIL profiles, which are not wall-time nor CPU-time profiles in general. I think it would be beneficial if the unwinder returned extra information, such as whether the stack was (likely) on CPU, whether its thread was (likely) holding the GIL. And if one wants a wall-time mode for the profiler, sampling just the on-GIL thread won't provide an accurate picture of where each thread is spending its wall-time.

Copy link
Member

@pablogsal pablogsal Aug 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed feedback - you raise very important points about the semantic clarity of what we’re actually measuring here. You’re absolutely right that the current state creates “on-GIL profiles” rather than true CPU-time or wall-time profiles, and that holding the GIL doesn’t guarantee CPU execution.

The missing point here is that the profiler is not finished yet and there are plenty of things we still need to finalize (we are also waiting for the PEP to be approved to put it into its final namespace). Right now we’re focused on getting the base infrastructure in place and working reliably across platforms (we don't even have yet the mode to run scripts or modules, only attaching).

What we had in mind here is something close to what you’re hinting at - in GIL mode, we want to avoid re-sampling threads that aren’t actually moving (i.e., would produce identical stack traces) so the idea is that the profiler only samples the thread with the GIL and signals that the other stacks are the same to the frontend . The frontend will then use the last samples to calculate the stats. For now we don’t have the signaling infrastructure for that, but it’s on the roadmap.

I’m particularly intrigued by your point about CPU detection. Do you have any concrete plan in mind for what you propose? Unless I am missing something there’s no good portable way to reliably determine if a thread is actually on-CPU from a remote process without sacrificing significant performance. On Linux we could theoretically examine kernel stacks from /proc or check the stat pseudohandle but that’s racy, likely slow, and doesn’t work on macOS or Windows. And I don't look forward to start calling NtQuerySystemInformation all over the place

Do you have thoughts on practical approaches for this? What’s your take on the best path forward for providing more semantic clarity about what the samples actually represent?

Also, PRs welcomed!​​​​​​​​​​​​​​​​ 😉

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The missing point here is that the profiler is not finished yet and there are plenty of things we still need to finalize (we are also waiting for the PEP to be approved to put it into its final namespace).

Ah fair enough.

in GIL mode, we want to avoid re-sampling threads that aren’t actually moving

That's an interesting idea, but I struggle to see how this could work 🤔 If a thread is essentially always off-GIL then it will never be sampled. Or it could switch between idle functions but the stack would never get re-sampled by the profiler. For example, consider this case

def foo():
    a()  # on-CPU
    b()  # I/O-bound, off-GIL

thread = Thread(target=foo)

The profiler will likely see a on the stack, but might miss the samples where b is on the stack because the thread will be off-GIL. Then in wall-time profiles it will look as if a was the only function running on that thread.

I’m particularly intrigued by your point about CPU detection. Do you have any concrete plan in mind for what you propose? Unless I am missing something there’s no good portable way to reliably determine if a thread is actually on-CPU from a remote process without sacrificing significant performance. On Linux we could theoretically examine kernel stacks from /proc or check the stat pseudohandle but that’s racy, likely slow, and doesn’t work on macOS or Windows. And I don't look forward to start calling NtQuerySystemInformation all over the place

Well this approach is already racy by nature since threads are not being stopped before taking samples so there could be all sorts of inconsistencies already. In Austin we have platform-dependent implementations of py_thread__is_idle that determine whether a thread is idle or not (and yes it uses NtQuerySystemInformation on Windows). Surely it is racy, but I don't know of other ways of finding out about the thread status. In all my experiments the accuracy is pretty good, and the overhead not too bad. With simple stacks Austin can still sample at over 100 kHz, with the main overhead coming from the remote reads of datastack chunks.

Do you have thoughts on practical approaches for this? What’s your take on the best path forward for providing more semantic clarity about what the samples actually represent?

I think samples would have to include the CPU state of threads at the very least to provide both a wall- and CPU-time modes, which are pretty common for profilers. The GIL state might be added bonus for e.g. GIL contention analysis, figuring out if there is a lot of idle time spent with the GIL held etc...)

Also, PRs welcomed!​​​​​​​​​​​​​​​​ 😉

I'm currently short on bandwidth so I don't think I'll be able to contribute much in the short-term, but I'm more than happy to review PRs if needed and share more of my experience with developing Austin. Also I wonder if there isn't a better place to take this discussion to so to have all the details in one place?

Original file line number Diff line number Diff line change
Expand Up @@ -168,27 +168,53 @@ def _print_top_functions(stats_list, title, key_func, format_line, n=3):

# Aggregate stats by fully qualified function name (ignoring line numbers)
func_aggregated = {}
for func, prim_calls, total_calls, total_time, cumulative_time, callers in stats_list:
for (
func,
prim_calls,
total_calls,
total_time,
cumulative_time,
callers,
) in stats_list:
# Use filename:function_name as the key to get fully qualified name
qualified_name = f"{func[0]}:{func[2]}"
if qualified_name not in func_aggregated:
func_aggregated[qualified_name] = [0, 0, 0, 0] # prim_calls, total_calls, total_time, cumulative_time
func_aggregated[qualified_name] = [
0,
0,
0,
0,
] # prim_calls, total_calls, total_time, cumulative_time
func_aggregated[qualified_name][0] += prim_calls
func_aggregated[qualified_name][1] += total_calls
func_aggregated[qualified_name][2] += total_time
func_aggregated[qualified_name][3] += cumulative_time

# Convert aggregated data back to list format for processing
aggregated_stats = []
for qualified_name, (prim_calls, total_calls, total_time, cumulative_time) in func_aggregated.items():
for qualified_name, (
prim_calls,
total_calls,
total_time,
cumulative_time,
) in func_aggregated.items():
# Parse the qualified name back to filename and function name
if ":" in qualified_name:
filename, func_name = qualified_name.rsplit(":", 1)
else:
filename, func_name = "", qualified_name
# Create a dummy func tuple with filename and function name for display
dummy_func = (filename, "", func_name)
aggregated_stats.append((dummy_func, prim_calls, total_calls, total_time, cumulative_time, {}))
aggregated_stats.append(
(
dummy_func,
prim_calls,
total_calls,
total_time,
cumulative_time,
{},
)
)

# Most time-consuming functions (by total time)
def format_time_consuming(stat):
Expand Down Expand Up @@ -294,30 +320,29 @@ def sample(
else:
collector.export(filename)


def _validate_collapsed_format_args(args, parser):
# Check for incompatible pstats options
invalid_opts = []

# Get list of pstats-specific options
pstats_options = {
'sort': None,
'limit': None,
'no_summary': False
}
pstats_options = {"sort": None, "limit": None, "no_summary": False}

# Find the default values from the argument definitions
for action in parser._actions:
if action.dest in pstats_options and hasattr(action, 'default'):
if action.dest in pstats_options and hasattr(action, "default"):
pstats_options[action.dest] = action.default

# Check if any pstats-specific options were provided by comparing with defaults
for opt, default in pstats_options.items():
if getattr(args, opt) != default:
invalid_opts.append(opt.replace('no_', ''))
invalid_opts.append(opt.replace("no_", ""))

if invalid_opts:
parser.error(f"The following options are only valid with --pstats format: {', '.join(invalid_opts)}")

parser.error(
f"The following options are only valid with --pstats format: {', '.join(invalid_opts)}"
)

# Set default output filename for collapsed format
if not args.outfile:
args.outfile = f"collapsed.{args.pid}.txt"
Expand All @@ -329,14 +354,14 @@ def main():
description=(
"Sample a process's stack frames and generate profiling data.\n"
"Supports two output formats:\n"
" - pstats: Detailed profiling statistics with sorting options\n"
" - pstats: Detailed profiling statistics with sorting options\n"
" - collapsed: Stack traces for generating flamegraphs\n"
"\n"
"Examples:\n"
" # Profile process 1234 for 10 seconds with default settings\n"
" python -m profile.sample 1234\n"
"\n"
" # Profile with custom interval and duration, save to file\n"
" # Profile with custom interval and duration, save to file\n"
" python -m profile.sample -i 50 -d 30 -o profile.stats 1234\n"
"\n"
" # Generate collapsed stacks for flamegraph\n"
Expand All @@ -354,34 +379,33 @@ def main():
" # Profile all threads and save collapsed stacks\n"
" python -m profile.sample -a --collapsed -o stacks.txt 1234"
),
formatter_class=argparse.RawDescriptionHelpFormatter
formatter_class=argparse.RawDescriptionHelpFormatter,
)

# Required arguments
parser.add_argument(
"pid",
type=int,
help="Process ID to sample"
)
parser.add_argument("pid", type=int, help="Process ID to sample")

# Sampling options
sampling_group = parser.add_argument_group("Sampling configuration")
sampling_group.add_argument(
"-i", "--interval",
"-i",
"--interval",
type=int,
default=100,
help="Sampling interval in microseconds (default: 100)"
help="Sampling interval in microseconds (default: 100)",
)
sampling_group.add_argument(
"-d", "--duration",
"-d",
"--duration",
type=int,
default=10,
help="Sampling duration in seconds (default: 10)"
help="Sampling duration in seconds (default: 10)",
)
sampling_group.add_argument(
"-a", "--all-threads",
"-a",
"--all-threads",
action="store_true",
help="Sample all threads in the process instead of just the main thread"
help="Sample all threads in the process instead of just the main thread",
)

# Output format selection
Expand All @@ -393,20 +417,21 @@ def main():
const="pstats",
dest="format",
default="pstats",
help="Generate pstats output (default)"
help="Generate pstats output (default)",
)
output_format.add_argument(
"--collapsed",
action="store_const",
action="store_const",
const="collapsed",
dest="format",
help="Generate collapsed stack traces for flamegraphs"
help="Generate collapsed stack traces for flamegraphs",
)

output_group.add_argument(
"-o", "--outfile",
"-o",
"--outfile",
help="Save output to a file (if omitted, prints to stdout for pstats, "
"or saves to collapsed.<pid>.txt for collapsed format)"
"or saves to collapsed.<pid>.txt for collapsed format)",
)

# pstats-specific options
Expand All @@ -417,55 +442,56 @@ def main():
action="store_const",
const=0,
dest="sort",
help="Sort by number of calls"
help="Sort by number of calls",
)
sort_group.add_argument(
"--sort-time",
action="store_const",
const=1,
dest="sort",
help="Sort by total time"
help="Sort by total time",
)
sort_group.add_argument(
"--sort-cumulative",
action="store_const",
const=2,
dest="sort",
default=2,
help="Sort by cumulative time (default)"
help="Sort by cumulative time (default)",
)
sort_group.add_argument(
"--sort-percall",
action="store_const",
const=3,
dest="sort",
help="Sort by time per call"
help="Sort by time per call",
)
sort_group.add_argument(
"--sort-cumpercall",
action="store_const",
const=4,
dest="sort",
help="Sort by cumulative time per call"
help="Sort by cumulative time per call",
)
sort_group.add_argument(
"--sort-name",
action="store_const",
const=5,
dest="sort",
help="Sort by function name"
help="Sort by function name",
)

pstats_group.add_argument(
"-l", "--limit",
"-l",
"--limit",
type=int,
help="Limit the number of rows in the output",
default=15,
)
pstats_group.add_argument(
"--no-summary",
action="store_true",
help="Disable the summary section in the output"
help="Disable the summary section in the output",
)

args = parser.parse_args()
Expand All @@ -485,5 +511,7 @@ def main():
show_summary=not args.no_summary,
output_format=args.format,
)


if __name__ == "__main__":
main()
Loading
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy