Skip to content

gh-136459: Add perf trampoline support for macOS #136461

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jul 22, 2025
Prev Previous commit
Next Next commit
Update the perf profiling doc to include samply
  • Loading branch information
canova committed Jul 17, 2025
commit 8a20a4b860a4f8abe8030569a58b07248f1c5ff3
51 changes: 36 additions & 15 deletions Doc/howto/perf_profiling.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,34 +2,35 @@

.. _perf_profiling:

==============================================
Python support for the Linux ``perf`` profiler
==============================================
========================================================
Python support for the ``perf map`` compatible profilers
========================================================

:author: Pablo Galindo

`The Linux perf profiler <https://perf.wiki.kernel.org>`_
is a very powerful tool that allows you to profile and obtain
information about the performance of your application.
``perf`` also has a very vibrant ecosystem of tools
that aid with the analysis of the data that it produces.
`The Linux perf profiler <https://perf.wiki.kernel.org>`_ and
`samply <https://github.com/mstange/samply>`_ are powerful tools that allow you to
profile and obtain information about the performance of your application.
Both tools have vibrant ecosystems that aid with the analysis of the data they produce.

The main problem with using the ``perf`` profiler with Python applications is that
``perf`` only gets information about native symbols, that is, the names of
The main problem with using these profilers with Python applications is that
they only get information about native symbols, that is, the names of
functions and procedures written in C. This means that the names and file names
of Python functions in your code will not appear in the output of ``perf``.
of Python functions in your code will not appear in the profiler output.

Since Python 3.12, the interpreter can run in a special mode that allows Python
functions to appear in the output of the ``perf`` profiler. When this mode is
functions to appear in the output of compatible profilers. When this mode is
enabled, the interpreter will interpose a small piece of code compiled on the
fly before the execution of every Python function and it will teach ``perf`` the
fly before the execution of every Python function and it will teach the profiler the
relationship between this piece of code and the associated Python function using
:doc:`perf map files <../c-api/perfmaps>`.

.. note::

Support for the ``perf`` profiler is currently only available for Linux on
select architectures. Check the output of the ``configure`` build step or
Support for profiling is available on Linux and macOS on select architectures.
``perf`` is available on Linux, while ``samply`` can be used on both Linux and macOS.
``samply`` support on macOS is available starting from Python 3.14.
Check the output of the ``configure`` build step or
check the output of ``python -m sysconfig | grep HAVE_PERF_TRAMPOLINE``
to see if your system is supported.

Expand Down Expand Up @@ -148,6 +149,26 @@ Instead, if we run the same experiment with ``perf`` support enabled we get:



Using ``samply`` profiler
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are going to need a bit more here. For example, simply supports both perf modes so we need clarification on when tho use them and what are the recommendations. How to read the flamegraphs etc etc

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to break these discussions out into a separate PR? It doesn't seem useful to delay landing trampoline support for this.

Copy link
Member

@pablogsal pablogsal Jul 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem useful to delay landing trampoline support for this.

Is there any rush? This will go into 3.15 anyway and that's going to be released October 2026. We still need to figure out the buildbot situation which will take some time...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy to separate this into a different PR, though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I was hoping that we could maybe enable it on 3.14. Considering that the code was there since 3.12, and it's mostly putting lots of ifdefs here and there (minus samply and documentation part). I suspect that updating the documentation will take longer. But I'm not familiar with the release process.

Copy link
Member

@pablogsal pablogsal Jul 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I was hoping that we could maybe enable it on 3.14.

No way unfortunately as we are 3 betas past beta freeze. It's up to the release manager to decide (CC @hugovk) but we have a strict policy for this I am afraid and no new features can be added past beta freeze.

Copy link
Member

@pablogsal pablogsal Jul 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hugovk Checking just in case although I assume the answer is "no" but would you consider adding this to 3.14 given that this is a new platform and the code is gated by ifdefs? This will allow people on macOS to profile their code using a native profiler, which would be very useful for investigating performance in Python+compiled code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some context for this: this would allow people on macOS to profile free threaded Python using samply, so maybe there is a case to allow it in 3.14 but I am still unsure. Up to you @hugovk

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're a bit too close to RC to make an exception for this.

Copy link
Contributor Author

@canova canova Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aww, I'm a bit sad to see that I have to wait for more than 1 year to be able to profile performance of python scripts on macOS. I know some people were excited to use it on macOS at Mozilla to improve the performance of our build system. But thanks for taking time and answering!

I could argue that this PR is extending support of an existing feature, and not exactly adding a new feature 😄 And this code is not executed as long as PYTHONPERFSUPPORT=1 or -X perf is not passed, so I think it's low risk. But I understand that it's close to RC too.

(thanks for the doc suggestions, I'll update the PR soon!)

-------------------------

``samply`` is a modern profiler that can be used as an alternative to ``perf``.
It uses the same perf map files that Python generates, making it compatible
with Python's profiling support. ``samply`` is particularly useful on macOS
where ``perf`` is not available.

To use ``samply`` with Python, first install it following the instructions at
https://github.com/mstange/samply, then run::

$ samply record PYTHONPERFSUPPORT=1 python my_script.py

This will open a web interface where you can analyze the profiling data
interactively. The advantage of ``samply`` is that it provides a modern
web-based interface for analyzing profiling data and works on both Linux
and macOS.

On macOS, ``samply`` support requires Python 3.14 or later.

How to enable ``perf`` profiling support
----------------------------------------

Expand Down
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy