core: Add MICROPY_USE_GCC_MUL_OVERFLOW_INTRINSIC. #17754

jepler · 2025-07-23T21:36:55Z

And enable it on platforms where I am aware an efficient 32x32->64 bit multiply instruction exists.

Summary

In the discussion of #17734 I became aware there was some existing use of the builtin overflow intrinsics, particularly for the longlong build.

This PR tests using it in place of mp_small_int_mul_overflow.

Testing

I ran the testsuite locally (64-bit standard build). However, I don't know if the testsuite adequately checks multiplications "at the boundary" of the short integer range.

I also did some investigating and found a check for riscv, x86/x86_64, and arm that seems to capture the "is there a suitable multiply instruction". A check for xtensa is missing but could be beneficial.

I think there might be a modest performance benefit (avoiding multiple divisions per multiplication) but I did not attempt to measure it.

Trade-offs and Alternatives

I am not happy with the structure of how this ended up implemented, particularly for the int*int multiply in mp_binary_op. It's more complicated than I would like due to the fact that mp_small_int_mul_overflow also implicitly checks for SMALL_INT_FITS while __builtin_mul_overflow just checks if the C type (e.g., mp_int_t) overflows. However, if/when tests pass & code size comes in smaller, it may be worth looking for a way to structure the change that's acceptable that still gets the size benefit.

codecov · 2025-07-23T21:47:25Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.38%. Comparing base (096ff8b) to head (19000d6).

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #17754      +/-   ##
==========================================
- Coverage   98.38%   98.38%   -0.01%     
==========================================
  Files         171      171              
  Lines       22239    22224      -15     
==========================================
- Hits        21880    21865      -15     
  Misses        359      359

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

github-actions · 2025-07-23T21:50:13Z

Code size report:

   bare-arm:  -144 -0.254% 
minimal x86:  -281 -0.150% 
   unix x64:  -248 -0.029% standard
      stm32:  -132 -0.034% PYBV10
     mimxrt:  -136 -0.036% TEENSY40
        rp2:    +0 +0.000% RPI_PICO_W
       samd:  -144 -0.054% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:  -102 -0.023% VIRT_RV32

jepler · 2025-07-23T22:07:45Z

I'm surprised the code size on rv32 is unchanged. I must have the wrong preprocessor check. Locally I found that __riscv_m was defined for riscv CPUs that have the "m" (multiply) instruction set extension. However, my compiler is broken and can't actually link an executable, so I don't trust it very far. (riscv64-unknown-elf-gcc (12.2.0-14+11+b1) 12.2.0 from debian stable, different from both ubuntu 22.04 and 24.04)

rp2 (rp2040) is expected to not change; the new code path is not enabled on Cortex M0 CPUs.

dpgeorge · 2025-07-24T00:50:13Z

That's a very nice code size decrease!

(I have a MicroPython project running on a very small MCU which has run out of space, even using LTO, and I'll definitely be applying this patch to it.)

And enable it on platforms where I am aware an efficient 32x32->64 bit multiply instruction exists. Signed-off-by: Jeff Epler <jepler@gmail.com>

(note: this should probably end up squashed) Most MCUs apart from Cortex-M0 with Thumb 1 have an instruction for computing the "high part" of a multiplication (e.g., the upper 32 bits of a 32x32 multiply). When they do, gcc uses this to implement a small and fast overflow check using the __builtin_mul_overflow intrinsic, which is preferable to the guard division method used in smallint.c. However, in contrast to the previous mp_small_int_mul_overflow routine, which checks that the result fits not only within mp_int_t but is SMALL_INT_FITS(), __builtin_mul_overflow only checks for overflow of the C type. As a result, a slight change in the code flow is needed for MP_BINARY_OP_MULTIPLY. Other sites using mp_small_int_mul_overflow already had the result value flow through to a SMALL_INT_FITS check so they didn't need any additional changes. Signed-off-by: Jeff Epler <jepler@gmail.com>

jepler · 2025-07-24T14:09:04Z

Any suggestion how to structure this change better?

dpgeorge added the py-core Relates to py/ directory in source label Jul 24, 2025

core: Add MICROPY_USE_GCC_MUL_OVERFLOW_INTRINSIC.

bf5312b

And enable it on platforms where I am aware an efficient 32x32->64 bit multiply instruction exists. Signed-off-by: Jeff Epler <jepler@gmail.com>

jepler force-pushed the gcc-intrinsic-mul-overflow branch from 7d6f557 to 1334ce7 Compare July 24, 2025 12:16

jepler force-pushed the gcc-intrinsic-mul-overflow branch from 1334ce7 to 19000d6 Compare July 24, 2025 14:08

jepler marked this pull request as ready for review July 29, 2025 13:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

core: Add MICROPY_USE_GCC_MUL_OVERFLOW_INTRINSIC. #17754

core: Add MICROPY_USE_GCC_MUL_OVERFLOW_INTRINSIC. #17754

Uh oh!

jepler commented Jul 23, 2025

Uh oh!

codecov bot commented Jul 23, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jul 23, 2025 •

edited

Loading

Uh oh!

jepler commented Jul 23, 2025

Uh oh!

dpgeorge commented Jul 24, 2025

Uh oh!

jepler commented Jul 24, 2025

Uh oh!

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Uh oh!

core: Add MICROPY_USE_GCC_MUL_OVERFLOW_INTRINSIC. #17754

Are you sure you want to change the base?

core: Add MICROPY_USE_GCC_MUL_OVERFLOW_INTRINSIC. #17754

Uh oh!

Conversation

jepler commented Jul 23, 2025

Summary

Testing

Trade-offs and Alternatives

Uh oh!

codecov bot commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jepler commented Jul 23, 2025

Uh oh!

dpgeorge commented Jul 24, 2025

Uh oh!

jepler commented Jul 24, 2025

Uh oh!

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

codecov bot commented Jul 23, 2025 •

edited

Loading

github-actions bot commented Jul 23, 2025 •

edited

Loading