Skip to content

core: Add MICROPY_USE_GCC_MUL_OVERFLOW_INTRINSIC. #17754

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jepler
Copy link
Contributor

@jepler jepler commented Jul 23, 2025

And enable it on platforms where I am aware an efficient 32x32->64 bit multiply instruction exists.

Summary

In the discussion of #17734 I became aware there was some existing use of the builtin overflow intrinsics, particularly for the longlong build.

This PR tests using it in place of mp_small_int_mul_overflow.

Testing

I ran the testsuite locally (64-bit standard build). However, I don't know if the testsuite adequately checks multiplications "at the boundary" of the short integer range.

I also did some investigating and found a check for riscv, x86/x86_64, and arm that seems to capture the "is there a suitable multiply instruction". A check for xtensa is missing but could be beneficial.

I think there might be a modest performance benefit (avoiding multiple divisions per multiplication) but I did not attempt to measure it.

Trade-offs and Alternatives

I am not happy with the structure of how this ended up implemented, particularly for the int*int multiply in mp_binary_op. It's more complicated than I would like due to the fact that mp_small_int_mul_overflow also implicitly checks for SMALL_INT_FITS while __builtin_mul_overflow just checks if the C type (e.g., mp_int_t) overflows. However, if/when tests pass & code size comes in smaller, it may be worth looking for a way to structure the change that's acceptable that still gets the size benefit.

Copy link

codecov bot commented Jul 23, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.38%. Comparing base (096ff8b) to head (19000d6).

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #17754      +/-   ##
==========================================
- Coverage   98.38%   98.38%   -0.01%     
==========================================
  Files         171      171              
  Lines       22239    22224      -15     
==========================================
- Hits        21880    21865      -15     
  Misses        359      359              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link

github-actions bot commented Jul 23, 2025

Code size report:

   bare-arm:  -144 -0.254% 
minimal x86:  -281 -0.150% 
   unix x64:  -248 -0.029% standard
      stm32:  -132 -0.034% PYBV10
     mimxrt:  -136 -0.036% TEENSY40
        rp2:    +0 +0.000% RPI_PICO_W
       samd:  -144 -0.054% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:  -102 -0.023% VIRT_RV32

@jepler
Copy link
Contributor Author

jepler commented Jul 23, 2025

I'm surprised the code size on rv32 is unchanged. I must have the wrong preprocessor check. Locally I found that __riscv_m was defined for riscv CPUs that have the "m" (multiply) instruction set extension. However, my compiler is broken and can't actually link an executable, so I don't trust it very far. (riscv64-unknown-elf-gcc (12.2.0-14+11+b1) 12.2.0 from debian stable, different from both ubuntu 22.04 and 24.04)

rp2 (rp2040) is expected to not change; the new code path is not enabled on Cortex M0 CPUs.

@dpgeorge dpgeorge added the py-core Relates to py/ directory in source label Jul 24, 2025
@dpgeorge
Copy link
Member

That's a very nice code size decrease!

(I have a MicroPython project running on a very small MCU which has run out of space, even using LTO, and I'll definitely be applying this patch to it.)

And enable it on platforms where I am aware an efficient
32x32->64 bit multiply instruction exists.

Signed-off-by: Jeff Epler <jepler@gmail.com>
@jepler jepler force-pushed the gcc-intrinsic-mul-overflow branch from 7d6f557 to 1334ce7 Compare July 24, 2025 12:16
(note: this should probably end up squashed)

Most MCUs apart from Cortex-M0 with Thumb 1 have an instruction
for computing the "high part" of a multiplication (e.g., the upper
32 bits of a 32x32 multiply).

When they do, gcc uses this to implement a small and fast
overflow check using the __builtin_mul_overflow intrinsic, which
is preferable to the guard division method used in smallint.c.

However, in contrast to the previous mp_small_int_mul_overflow
routine, which checks that the result fits not only within mp_int_t
but is SMALL_INT_FITS(), __builtin_mul_overflow only checks for
overflow of the C type. As a result, a slight change in the code
flow is needed for MP_BINARY_OP_MULTIPLY.

Other sites using mp_small_int_mul_overflow already had the
result value flow through to a SMALL_INT_FITS check so they didn't
need any additional changes.

Signed-off-by: Jeff Epler <jepler@gmail.com>
@jepler jepler force-pushed the gcc-intrinsic-mul-overflow branch from 1334ce7 to 19000d6 Compare July 24, 2025 14:08
@jepler
Copy link
Contributor Author

jepler commented Jul 24, 2025

Any suggestion how to structure this change better?

@jepler jepler marked this pull request as ready for review July 29, 2025 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
py-core Relates to py/ directory in source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy