Skip to content

Commit a667800

Browse files
authored
gh-136459: Add perf trampoline support for macOS (#136461)
1 parent b6d3242 commit a667800

File tree

10 files changed

+351
-27
lines changed

10 files changed

+351
-27
lines changed

Doc/c-api/perfmaps.rst

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,12 @@
55
Support for Perf Maps
66
----------------------
77

8-
On supported platforms (as of this writing, only Linux), the runtime can take
8+
On supported platforms (Linux and macOS), the runtime can take
99
advantage of *perf map files* to make Python functions visible to an external
10-
profiling tool (such as `perf <https://perf.wiki.kernel.org/index.php/Main_Page>`_).
11-
A running process may create a file in the ``/tmp`` directory, which contains entries
12-
that can map a section of executable code to a name. This interface is described in the
10+
profiling tool (such as `perf <https://perf.wiki.kernel.org/index.php/Main_Page>`_ or
11+
`samply <https://github.com/mstange/samply/>`_). A running process may create a
12+
file in the ``/tmp`` directory, which contains entries that can map a section
13+
of executable code to a name. This interface is described in the
1314
`documentation of the Linux Perf tool <https://git.kernel.org/pub/scm/linux/
1415
kernel/git/torvalds/linux.git/tree/tools/perf/Documentation/jit-interface.txt>`_.
1516

Doc/howto/perf_profiling.rst

Lines changed: 41 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -2,34 +2,35 @@
22

33
.. _perf_profiling:
44

5-
==============================================
6-
Python support for the Linux ``perf`` profiler
7-
==============================================
5+
========================================================
6+
Python support for the ``perf map`` compatible profilers
7+
========================================================
88

99
:author: Pablo Galindo
1010

11-
`The Linux perf profiler <https://perf.wiki.kernel.org>`_
12-
is a very powerful tool that allows you to profile and obtain
13-
information about the performance of your application.
14-
``perf`` also has a very vibrant ecosystem of tools
15-
that aid with the analysis of the data that it produces.
11+
`The Linux perf profiler <https://perf.wiki.kernel.org>`_ and
12+
`samply <https://github.com/mstange/samply>`_ are powerful tools that allow you to
13+
profile and obtain information about the performance of your application.
14+
Both tools have vibrant ecosystems that aid with the analysis of the data they produce.
1615

17-
The main problem with using the ``perf`` profiler with Python applications is that
18-
``perf`` only gets information about native symbols, that is, the names of
16+
The main problem with using these profilers with Python applications is that
17+
they only get information about native symbols, that is, the names of
1918
functions and procedures written in C. This means that the names and file names
20-
of Python functions in your code will not appear in the output of ``perf``.
19+
of Python functions in your code will not appear in the profiler output.
2120

2221
Since Python 3.12, the interpreter can run in a special mode that allows Python
23-
functions to appear in the output of the ``perf`` profiler. When this mode is
22+
functions to appear in the output of compatible profilers. When this mode is
2423
enabled, the interpreter will interpose a small piece of code compiled on the
25-
fly before the execution of every Python function and it will teach ``perf`` the
24+
fly before the execution of every Python function and it will teach the profiler the
2625
relationship between this piece of code and the associated Python function using
2726
:doc:`perf map files <../c-api/perfmaps>`.
2827

2928
.. note::
3029

31-
Support for the ``perf`` profiler is currently only available for Linux on
32-
select architectures. Check the output of the ``configure`` build step or
30+
Support for profiling is available on Linux and macOS on select architectures.
31+
Perf is available on Linux, while samply can be used on both Linux and macOS.
32+
samply support on macOS is available starting from Python 3.15.
33+
Check the output of the ``configure`` build step or
3334
check the output of ``python -m sysconfig | grep HAVE_PERF_TRAMPOLINE``
3435
to see if your system is supported.
3536

@@ -148,6 +149,31 @@ Instead, if we run the same experiment with ``perf`` support enabled we get:
148149
149150
150151
152+
Using the samply profiler
153+
-------------------------
154+
155+
samply is a modern profiler that can be used as an alternative to perf.
156+
It uses the same perf map files that Python generates, making it compatible
157+
with Python's profiling support. samply is particularly useful on macOS
158+
where perf is not available.
159+
160+
To use samply with Python, first install it following the instructions at
161+
https://github.com/mstange/samply, then run::
162+
163+
$ samply record PYTHONPERFSUPPORT=1 python my_script.py
164+
165+
This will open a web interface where you can analyze the profiling data
166+
interactively. The advantage of samply is that it provides a modern
167+
web-based interface for analyzing profiling data and works on both Linux
168+
and macOS.
169+
170+
On macOS, samply support requires Python 3.15 or later. Also on macOS, samply
171+
can't profile signed Python executables due to restrictions by macOS. You can
172+
profile with Python binaries that you've compiled yourself, or which are
173+
unsigned or locally-signed (such as anything installed by Homebrew). In
174+
order to attach to running processes on macOS, run ``samply setup`` once (and
175+
every time samply is updated) to self-sign the samply binary.
176+
151177
How to enable ``perf`` profiling support
152178
----------------------------------------
153179

Lib/test/test_perfmaps.py

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,20 @@
11
import os
2-
import sys
2+
import sysconfig
33
import unittest
44

55
try:
66
from _testinternalcapi import perf_map_state_teardown, write_perf_map_entry
77
except ImportError:
88
raise unittest.SkipTest("requires _testinternalcapi")
99

10+
def supports_trampoline_profiling():
11+
perf_trampoline = sysconfig.get_config_var("PY_HAVE_PERF_TRAMPOLINE")
12+
if not perf_trampoline:
13+
return False
14+
return int(perf_trampoline) == 1
1015

11-
if sys.platform != 'linux':
12-
raise unittest.SkipTest('Linux only')
13-
16+
if not supports_trampoline_profiling():
17+
raise unittest.SkipTest("perf trampoline profiling not supported")
1418

1519
class TestPerfMapWriting(unittest.TestCase):
1620
def test_write_perf_map_entry(self):

Lib/test/test_samply_profiler.py

Lines changed: 244 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,244 @@
1+
import unittest
2+
import subprocess
3+
import sys
4+
import sysconfig
5+
import os
6+
import pathlib
7+
from test import support
8+
from test.support.script_helper import (
9+
make_script,
10+
)
11+
from test.support.os_helper import temp_dir
12+
13+
14+
if not support.has_subprocess_support:
15+
raise unittest.SkipTest("test module requires subprocess")
16+
17+
if support.check_sanitizer(address=True, memory=True, ub=True, function=True):
18+
# gh-109580: Skip the test because it does crash randomly if Python is
19+
# built with ASAN.
20+
raise unittest.SkipTest("test crash randomly on ASAN/MSAN/UBSAN build")
21+
22+
23+
def supports_trampoline_profiling():
24+
perf_trampoline = sysconfig.get_config_var("PY_HAVE_PERF_TRAMPOLINE")
25+
if not perf_trampoline:
26+
return False
27+
return int(perf_trampoline) == 1
28+
29+
30+
if not supports_trampoline_profiling():
31+
raise unittest.SkipTest("perf trampoline profiling not supported")
32+
33+
34+
def samply_command_works():
35+
try:
36+
cmd = ["samply", "--help"]
37+
except (subprocess.SubprocessError, OSError):
38+
return False
39+
40+
# Check that we can run a simple samply run
41+
with temp_dir() as script_dir:
42+
try:
43+
output_file = script_dir + "/profile.json.gz"
44+
cmd = (
45+
"samply",
46+
"record",
47+
"--save-only",
48+
"--output",
49+
output_file,
50+
sys.executable,
51+
"-c",
52+
'print("hello")',
53+
)
54+
env = {**os.environ, "PYTHON_JIT": "0"}
55+
stdout = subprocess.check_output(
56+
cmd, cwd=script_dir, text=True, stderr=subprocess.STDOUT, env=env
57+
)
58+
except (subprocess.SubprocessError, OSError):
59+
return False
60+
61+
if "hello" not in stdout:
62+
return False
63+
64+
return True
65+
66+
67+
def run_samply(cwd, *args, **env_vars):
68+
env = os.environ.copy()
69+
if env_vars:
70+
env.update(env_vars)
71+
env["PYTHON_JIT"] = "0"
72+
output_file = cwd + "/profile.json.gz"
73+
base_cmd = (
74+
"samply",
75+
"record",
76+
"--save-only",
77+
"-o", output_file,
78+
)
79+
proc = subprocess.run(
80+
base_cmd + args,
81+
stdout=subprocess.PIPE,
82+
stderr=subprocess.PIPE,
83+
env=env,
84+
)
85+
if proc.returncode:
86+
print(proc.stderr, file=sys.stderr)
87+
raise ValueError(f"Samply failed with return code {proc.returncode}")
88+
89+
import gzip
90+
with gzip.open(output_file, mode="rt", encoding="utf-8") as f:
91+
return f.read()
92+
93+
94+
@unittest.skipUnless(samply_command_works(), "samply command doesn't work")
95+
class TestSamplyProfilerMixin:
96+
def run_samply(self, script_dir, perf_mode, script):
97+
raise NotImplementedError()
98+
99+
def test_python_calls_appear_in_the_stack_if_perf_activated(self):
100+
with temp_dir() as script_dir:
101+
code = """if 1:
102+
def foo(n):
103+
x = 0
104+
for i in range(n):
105+
x += i
106+
107+
def bar(n):
108+
foo(n)
109+
110+
def baz(n):
111+
bar(n)
112+
113+
baz(10000000)
114+
"""
115+
script = make_script(script_dir, "perftest", code)
116+
output = self.run_samply(script_dir, script)
117+
118+
self.assertIn(f"py::foo:{script}", output)
119+
self.assertIn(f"py::bar:{script}", output)
120+
self.assertIn(f"py::baz:{script}", output)
121+
122+
def test_python_calls_do_not_appear_in_the_stack_if_perf_deactivated(self):
123+
with temp_dir() as script_dir:
124+
code = """if 1:
125+
def foo(n):
126+
x = 0
127+
for i in range(n):
128+
x += i
129+
130+
def bar(n):
131+
foo(n)
132+
133+
def baz(n):
134+
bar(n)
135+
136+
baz(10000000)
137+
"""
138+
script = make_script(script_dir, "perftest", code)
139+
output = self.run_samply(
140+
script_dir, script, activate_trampoline=False
141+
)
142+
143+
self.assertNotIn(f"py::foo:{script}", output)
144+
self.assertNotIn(f"py::bar:{script}", output)
145+
self.assertNotIn(f"py::baz:{script}", output)
146+
147+
148+
@unittest.skipUnless(samply_command_works(), "samply command doesn't work")
149+
class TestSamplyProfiler(unittest.TestCase, TestSamplyProfilerMixin):
150+
def run_samply(self, script_dir, script, activate_trampoline=True):
151+
if activate_trampoline:
152+
return run_samply(script_dir, sys.executable, "-Xperf", script)
153+
return run_samply(script_dir, sys.executable, script)
154+
155+
def setUp(self):
156+
super().setUp()
157+
self.perf_files = set(pathlib.Path("/tmp/").glob("perf-*.map"))
158+
159+
def tearDown(self) -> None:
160+
super().tearDown()
161+
files_to_delete = (
162+
set(pathlib.Path("/tmp/").glob("perf-*.map")) - self.perf_files
163+
)
164+
for file in files_to_delete:
165+
file.unlink()
166+
167+
def test_pre_fork_compile(self):
168+
code = """if 1:
169+
import sys
170+
import os
171+
import sysconfig
172+
from _testinternalcapi import (
173+
compile_perf_trampoline_entry,
174+
perf_trampoline_set_persist_after_fork,
175+
)
176+
177+
def foo_fork():
178+
pass
179+
180+
def bar_fork():
181+
foo_fork()
182+
183+
def foo():
184+
import time; time.sleep(1)
185+
186+
def bar():
187+
foo()
188+
189+
def compile_trampolines_for_all_functions():
190+
perf_trampoline_set_persist_after_fork(1)
191+
for _, obj in globals().items():
192+
if callable(obj) and hasattr(obj, '__code__'):
193+
compile_perf_trampoline_entry(obj.__code__)
194+
195+
if __name__ == "__main__":
196+
compile_trampolines_for_all_functions()
197+
pid = os.fork()
198+
if pid == 0:
199+
print(os.getpid())
200+
bar_fork()
201+
else:
202+
bar()
203+
"""
204+
205+
with temp_dir() as script_dir:
206+
script = make_script(script_dir, "perftest", code)
207+
env = {**os.environ, "PYTHON_JIT": "0"}
208+
with subprocess.Popen(
209+
[sys.executable, "-Xperf", script],
210+
universal_newlines=True,
211+
stderr=subprocess.PIPE,
212+
stdout=subprocess.PIPE,
213+
env=env,
214+
) as process:
215+
stdout, stderr = process.communicate()
216+
217+
self.assertEqual(process.returncode, 0)
218+
self.assertNotIn("Error:", stderr)
219+
child_pid = int(stdout.strip())
220+
perf_file = pathlib.Path(f"/tmp/perf-{process.pid}.map")
221+
perf_child_file = pathlib.Path(f"/tmp/perf-{child_pid}.map")
222+
self.assertTrue(perf_file.exists())
223+
self.assertTrue(perf_child_file.exists())
224+
225+
perf_file_contents = perf_file.read_text()
226+
self.assertIn(f"py::foo:{script}", perf_file_contents)
227+
self.assertIn(f"py::bar:{script}", perf_file_contents)
228+
self.assertIn(f"py::foo_fork:{script}", perf_file_contents)
229+
self.assertIn(f"py::bar_fork:{script}", perf_file_contents)
230+
231+
child_perf_file_contents = perf_child_file.read_text()
232+
self.assertIn(f"py::foo_fork:{script}", child_perf_file_contents)
233+
self.assertIn(f"py::bar_fork:{script}", child_perf_file_contents)
234+
235+
# Pre-compiled perf-map entries of a forked process must be
236+
# identical in both the parent and child perf-map files.
237+
perf_file_lines = perf_file_contents.split("\n")
238+
for line in perf_file_lines:
239+
if f"py::foo_fork:{script}" in line or f"py::bar_fork:{script}" in line:
240+
self.assertIn(line, child_perf_file_contents)
241+
242+
243+
if __name__ == "__main__":
244+
unittest.main()

Misc/ACKS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@ Ray Allen
4343
Billy G. Allie
4444
Jamiel Almeida
4545
Kevin Altis
46+
Nazım Can Altınova
4647
Samy Lahfa
4748
Skyler Leigh Amador
4849
Joe Amenta
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
Add support for perf trampoline on macOS, to allow profilers wit JIT map
2+
support to read Python calls. While profiling, ``PYTHONPERFSUPPORT=1`` can
3+
be appended to enable the trampoline.

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy