-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
gh-101525: Use only safe identical code folding with BOLT #134642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
"Identical code folding" (ICF) is the feature of an optimizer to find that two functions have the same code and that they can therefore be deduplicated in the binary. While this is usually safe, it can cause observable behavior differences if the program relies on the fact that the two functions have different addresses. CPython relies on this in (at least) Objects/typeobject.c, which defines two functions wrap_binaryfunc() and wrap_binaryfunc_l() with the same implementation, and stores their addresses in the slotdefs array. If these two functions have the same address, update_one_slot() in that file will fill in slots it shouldn't, causing, for instances, classes defined in Python that inherit from some built-in types to misbehave. As of LLVM 20 (llvm/llvm-project#116275), BOLT has a "safe ICF" mode, where it looks to see if there are any uses of a function symbol outside function calls (e.g., relocations in data sections) and skips ICF on such functions. The intent is that this avoids observable behavior differences but still saves storage as much as possible. This version is about two months old at the time of writing. To support older LLVM versions, we have to turn off ICF entirely. This problem was previously noticed for Windows/MSVC in python#53093 (and again in python#24098), where the default behavior of PGO is to enable ICF (which they expand to "identical COMDAT folding") and we had to turn it off.
[py_cv_bolt_icf_safe=no | ||
${LLVM_BOLT} -icf=safe -o conftest.bolt conftest$EXEEXT >&AS_MESSAGE_LOG_FD 2>&1 dnl | ||
&& py_cv_bolt_icf_safe=yes], | ||
[AC_MSG_FAILURE([could not compile empty test program])]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[AC_MSG_FAILURE([could not compile empty test program])]) | |
[AC_MSG_FAILURE([could not link empty test program with -icf=safe])]) |
It'd be good to be a bit more descriptive than "compilation failed".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The autoconf syntax here is a little confusing. This message isn't from the attempt to optimize with -icf=safe
, it's from actually compiling the test program. BOLT is a post-link optimizer, so we give it a compiled and linked executable as an argument. (There is also ICF support in many linkers, to make things more confusing.) I'm using autoconf's function for running test-compiles for the side effect of creating a binary to pass to BOLT, but we need to handle the case where the compile, for whatever reason, fails. I don't expect anyone to hit this error because if you couldn't compile programs you probably couldn't get this far in ./configure anyway, but it's good to have something for the "can't happen" case.
The pseudo-Python equivalent of this check is
py_bolt_icf_flag = "-icf=safe"
@autoconf.cache_check(
"py_cv_bolt_icf_safe",
f"whether {LLVM_BOLT} supports safe identical code folding",
)
def check_bolt_icf_safe():
saved_cflags, saved_ldflags = CFLAGS, LDFLAGS
CFLAGS, LDFLAGS = CFLAGS_NODIST, LDFLAGS_NODIST
if autoconf.test_link(autoconf.make_program("", "")):
py_cv_bolt_icf_safe = False
p = subprocess.run(
[LLVM_BOLT, "-icf=safe" "-o", "conftest.bolt", f"conftest{EXEEXT}"],
stdout=autoconf.message_log, stderr=autoconf.message_log,
)
if p.returncode == 0:
py_cv_bolt_icf_safe = True
else:
raise RuntimeError("could not compile empty test program")
CFLAGS, LDFLAGS = saved_cflags, saved_ldflags
return py_cv_bolt_icf_safe
if not check_bolt_icf_safe():
py_bolt_icf_flag = ""
It does occur to me, writing this out, that you could theoretically hit this failure case if your compiler and linker don't support the flags needed to produce a BOLT-able binary (Wl,--emit-relocs -fno-pie -no-pie
), because this is the first time we're asking autoconf to use those flags in a test-compile. I have no idea why you would ask for BOLT if you don't have a vaguely compatible compiler, though, and you'd previously get a random compile error at some later point in the process, so I don't think this is making anything worse... but we could change the message to "could not compile and link test program with flags needed for BOLT" or something.
I would also accept the argument that I need wider indentation or some comments or something to make the autoconf readable.
astral-sh/uv#13610 reported a misbehavior that is the result of a subclass of str incorrectly having its ->tp_as_number->nb_add slot filled in with the value of PyUnicode_Type->tp_as_sequence->sq_concat. There are some times when this is an appropriate thing to do iwhen subclassing, but this is not one of them. The logic to prevent it in this case relies on two helper functions in the file, wrap_binaryfunc and wrap_binaryfunc_l, having different addresses, even though they contain identical code. For some reason BOLT does not do this optimization in the shared library (even though those are static functions and not exported), so we only started seeing this in the static build. BOLT in LLVM 20+ supports "safe" code folding, which uses heuristics about relocations to determine whether a function's address is used in any way other than a call. This seems to be enough to fix the issue. Add a patch to switch to -icf=safe, submitted upstream as python/cpython#134642
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for digging this issue, Now BOLT build be more stable :)
LGTM.
I like this change and it's real issue: astral-sh/python-build-standalone#622
But I am not sure this is best autoconf that we add, I will wait @erlend-aasland 's review also.
Erlend, the change looks good. Can you review, focusing on the autoconf code itself?
saved_cflags="$CFLAGS" | ||
saved_ldflags="$LDFLAGS" | ||
CFLAGS="$CFLAGS_NODIST" | ||
LDFLAGS="$LDFLAGS_NODIST" | ||
AC_LINK_IFELSE( | ||
[AC_LANG_PROGRAM([[]], [[]])], | ||
[py_cv_bolt_icf_safe=no | ||
${LLVM_BOLT} -icf=safe -o conftest.bolt conftest$EXEEXT >&AS_MESSAGE_LOG_FD 2>&1 dnl | ||
&& py_cv_bolt_icf_safe=yes], | ||
[AC_MSG_FAILURE([could not compile empty test program])]) | ||
CFLAGS="$saved_cflags" | ||
LDFLAGS="$saved_ldflags" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using our WITH_SAVE_ENV
macro; it automatically saves and restores CFLAGS
, CPPFLAGS
, LDFLAGS
, and LIBS
:
saved_cflags="$CFLAGS" | |
saved_ldflags="$LDFLAGS" | |
CFLAGS="$CFLAGS_NODIST" | |
LDFLAGS="$LDFLAGS_NODIST" | |
AC_LINK_IFELSE( | |
[AC_LANG_PROGRAM([[]], [[]])], | |
[py_cv_bolt_icf_safe=no | |
${LLVM_BOLT} -icf=safe -o conftest.bolt conftest$EXEEXT >&AS_MESSAGE_LOG_FD 2>&1 dnl | |
&& py_cv_bolt_icf_safe=yes], | |
[AC_MSG_FAILURE([could not compile empty test program])]) | |
CFLAGS="$saved_cflags" | |
LDFLAGS="$saved_ldflags" | |
WITH_SAVE_ENV([ | |
CFLAGS="$CFLAGS_NODIST" | |
LDFLAGS="$LDFLAGS_NODIST" | |
AC_LINK_IFELSE( | |
[AC_LANG_PROGRAM([], [])], | |
[py_cv_bolt_icf_safe=no | |
${LLVM_BOLT} -icf=safe -o conftest.bolt conftest$EXEEXT >&AS_MESSAGE_LOG_FD 2>&1 dnl | |
&& py_cv_bolt_icf_safe=yes], | |
[AC_MSG_FAILURE([could not compile empty test program])]) | |
]) |
Moreover, the double brackets ([[]]
) are not needed; a single pair ([]
) is sufficient.
"Identical code folding" (ICF) is the feature of an optimizer to find that two functions have the same code and that they can therefore be deduplicated in the binary. While this is usually safe, it can cause observable behavior differences if the program relies on the fact that the two functions have different addresses.
CPython relies on this in (at least) Objects/typeobject.c, which defines two functions wrap_binaryfunc() and wrap_binaryfunc_l() with the same implementation, and stores their addresses in the slotdefs array. If these two functions have the same address, update_one_slot() in that file will fill in slots it shouldn't, causing, for instances, classes defined in Python that inherit from some built-in types to misbehave.
As of LLVM 20 (llvm/llvm-project#116275), BOLT has a "safe ICF" mode, where it looks to see if there are any uses of a function symbol outside function calls (e.g., relocations in data sections) and skips ICF on such functions. The intent is that this avoids observable behavior differences but still saves storage as much as possible.
This version is about two months old at the time of writing. To support older LLVM versions, we have to turn off ICF entirely.
This problem was previously noticed for Windows/MSVC in #53093 (and again in #24098), where the default behavior of PGO is to enable ICF (which they expand to "identical COMDAT folding") and we had to turn it off.