Skip to content

py: Fixes and test coverage for 64-bit big integer representations. #16953

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

projectgus
Copy link
Contributor

@projectgus projectgus commented Mar 18, 2025

Summary

As pointed out by @yoctopuce in this discussion and #16932, there is poor test coverage for the optional 64-bit bigint representation and this representation has some bugs.

This PR aims to improve this:

  1. Add test naming convention specifically for 64-bit big integers. These will run if there is any big integer support (either 64-bit or arbitrary precision).
  2. Duplicate the basic bigint tests to add int_64 equivalent.
  3. Fix bug where negative 64-bit integers were incorrectly parsed.
  4. Fix bug where 64-bit integer parsing produced invalid results if the buffer wasn't null terminated and the byte after the buffer was a valid digit. Fixes Incorrect parse of large integers in LONGLONG mode #16932. This incorporates the tests submitted in tests/extmod/json_loads: Add test cases for LONGINT parse. #16931 which seem to be a reliable way to get a string buffer which fits this edge case. However, the new tests are moved to a separate file so that the json tests don't depend on bigint support.
  5. Add a longlong unix build variant that enables 64-bit long int mode. Mostly useful for CI testing.
  6. Fix saturating behaviour when 64-bit integer parsing overflows, now it fails instead. This needed further filtering of tests as the ffi_int tests all depend on parsing UINT64_MAX or similar. This happened to work before this check was added, I believe as they got cast back to uint64 when passing to the FFI interface.
  7. Raise OverflowError if an arithmetic operation overflows 64-bit signed integer. Uses built-ins on gcc & clang, hand-rolled checks on other compilers.
  8. Change mp_parse_num() to parse directly to long long in the 64-bit big integer configuration, saving code size.

Thanks to @yoctopuce for suggesting and demonstrating a bunch of these ideas, and improvements on the original version of this PR.

Testing

  • Ran unit tests for unix port 'standard', 'longlong', and a special build of 'standard' where the compiler built-in overflow functions were disabled (gcc 15.1 seems to produce the same code in this case, the compiler must recognise that the polyfill versions are equivalents - but doing this confirmed it). (Note: I haven't re-run the special build on the very latest version of this PR, but none of that code has changed.)
  • Ran unit tests for stm32 PYBD_SF2, NUCLEO_H723ZG with NANBOX=1, and esp32 ESP32_GENERIC_S3 - both default tests and --via-mpy. All passing.

Follow-up work (for new PRs)

  • Extend longlong config to also test MICROPY_OBJ_REPR_C.
  • Fix bug when adding a float to a 64-bit big integer.

Trade-offs and Alternatives

  • We could deprecate 64-bit big integers instead of improving support for it, but it does seem useful in small systems.

@projectgus projectgus added the py-core Relates to py/ directory in source label Mar 18, 2025
@projectgus projectgus force-pushed the bugfix/bigint_longlong_tests_parsing branch from ec883a8 to c6f3856 Compare March 18, 2025 06:59
Copy link

codecov bot commented Mar 18, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.44%. Comparing base (c72a3e5) to head (17fbc5a).
Report is 8 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #16953   +/-   ##
=======================================
  Coverage   98.44%   98.44%           
=======================================
  Files         171      171           
  Lines       22208    22208           
=======================================
  Hits        21863    21863           
  Misses        345      345           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link

github-actions bot commented Mar 18, 2025

Code size report:

   bare-arm:    +0 +0.000% 
minimal x86:   +57 +0.030% 
   unix x64:   +32 +0.004% standard
      stm32:    -8 -0.002% PYBV10
     mimxrt:    -8 -0.002% TEENSY40
        rp2:   -16 -0.002% RPI_PICO_W
       samd:    -8 -0.003% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:    +8 +0.002% VIRT_RV32

@projectgus projectgus force-pushed the bugfix/bigint_longlong_tests_parsing branch 2 times, most recently from 0dd8885 to 5d8e0c1 Compare March 18, 2025 23:36
@projectgus

This comment was marked as outdated.

@projectgus projectgus force-pushed the bugfix/bigint_longlong_tests_parsing branch 2 times, most recently from f8acf08 to c862fb1 Compare March 19, 2025 00:22
@projectgus

This comment was marked as outdated.

@yoctopuce

This comment was marked as outdated.

@projectgus
Copy link
Contributor Author

The code only fails when run through mpy-cross (test option --via-mpy) The cause is the very last test, which is causing an undetected overflow:

Ah of course, thanks!

@projectgus projectgus force-pushed the bugfix/bigint_longlong_tests_parsing branch from c862fb1 to aa0917b Compare March 19, 2025 07:14
@projectgus

This comment was marked as outdated.

@yoctopuce

This comment was marked as outdated.

@projectgus

This comment was marked as outdated.

@projectgus projectgus added this to the release-1.26.0 milestone Mar 26, 2025
@yoctopuce

This comment was marked as outdated.

@projectgus projectgus force-pushed the bugfix/bigint_longlong_tests_parsing branch 5 times, most recently from 8555e92 to 3b737ad Compare May 8, 2025 00:57
@projectgus projectgus force-pushed the bugfix/bigint_longlong_tests_parsing branch from fd40fb1 to 50ad726 Compare May 16, 2025 06:41
@projectgus projectgus force-pushed the bugfix/bigint_longlong_tests_parsing branch from 50ad726 to 9ed365d Compare June 19, 2025 02:21
@projectgus

This comment was marked as outdated.

@projectgus

This comment was marked as outdated.

@dpgeorge

This comment was marked as outdated.

@yoctopuce
Copy link
Contributor

The "adding float to 64-bit long int" case I haven't dealt with here as this PR is too big already, I'll push it in a follow-up (unless you're keen to submit the fix for that, @yoctopuce?)

Thank you for asking. Feel free to do as you prefer. Let's say that if I don't see the new PR coming once this one gets integrated into the master, I will submit it :-)

@projectgus projectgus force-pushed the bugfix/bigint_longlong_tests_parsing branch from 9ed365d to ba9e3fc Compare June 20, 2025 06:12
@projectgus
Copy link
Contributor Author

bare-arm: +8 +0.014%
minimal x86: +29 +0.015%
unix x64: +0 +0.000% standard
stm32: +12 +0.003% PYBV10
mimxrt: +8 +0.002% TEENSY40
rp2: +72 +0.008% RPI_PICO_W
samd: +8 +0.003% ADAFRUIT_ITSYBITSY_M4_EXPRESS
qemu rv32: -5 -0.001% VIRT_RV32

Drat, when I tested the code size improvement from unsigned mp_parse_num() with my compiler it was a benefit on all the ports I tested. But here it's mostly neutral, and a big size hit on rp2! 😞

@dpgeorge
Copy link
Member

Drat, when I tested the code size improvement from unsigned mp_parse_num() with my compiler it was a benefit on all the ports I tested. But here it's mostly neutral, and a big size hit on rp2! 😞

The issue is because mp_small_int_mul_overflow() now takes a third argument which is the address of a local variable, and that means the variable must now be on the stack, etc.

I've pushed a commit to try and improve the situation there, by adjusting which local has it's address taken. That improves things for most ports, but others (unix, esp8266) increase a little with my change. Still, I think that's a reasonable improvement.

That change to mp_small_int_mul_overflow() to add a third arg is actually not really needed, because that function is only used in py/runtime.c in the end. So it could be reverted. I tried that but it did not decrease code size compared to my first change mentioned above.

@projectgus projectgus force-pushed the bugfix/bigint_longlong_tests_parsing branch from 995e3dd to ed0974a Compare July 15, 2025 01:27
@projectgus
Copy link
Contributor Author

Alright, a bit of finagling and we've got a very minor code size reduction on all the bare metal ports! 🎉

I am not sure which part is causing the minor increase in unix port sizes, but it's small enough to possibly be noise.

I've rebased and simplified the commit history (removed the parsing fixes and improvements in objint_longlong.c that were later removed entirely). Have re-run tests locally and updated the PR description with the details.

@projectgus projectgus requested a review from dpgeorge July 15, 2025 01:58
@projectgus projectgus force-pushed the bugfix/bigint_longlong_tests_parsing branch from ed0974a to aa1477c Compare July 15, 2025 02:04
@dpgeorge
Copy link
Member

@projectgus I'm happy to squash my optimisation commit down into your commit (the one just prior to mine). I don't need the attribution for that.

@projectgus projectgus force-pushed the bugfix/bigint_longlong_tests_parsing branch from aa1477c to c8f23ee Compare July 17, 2025 03:50
@projectgus
Copy link
Contributor Author

@dpgeorge OK, have squashed! (And rebased agian also.)

These will run on all ports which support them, but importantly
they'll also run on ports that don't support arbitrary precision
but do support 64-bit long ints.

Includes some test workarounds to account for things which will overflow
once "long long" big integers overflow (added in follow-up commit):

- uctypes_array_load_store test was failing already, now won't parse.
- all the ffi_int tests contain 64-bit unsigned values, that won't parse
  as long long.

This work was funded through GitHub Sponsors.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
These tests cover the use of mp_obj_new_int_from_str_len when
mp_parse_num_integer overflows the SMALLINT limit, and also the case where
the value may not be null terminated.

Placed in a separate test file so that extmod/json test doesn't rely on
bigint support.

Signed-off-by: Yoctopuce dev <dev@yoctopuce.com>
Signed-off-by: Angus Gratton <angus@redyak.com.au>
Signed-off-by: Angus Gratton <angus@redyak.com.au>
Relies on arbitrary precision math, so won't run on a port which
has threads & limited bigint support.

This work was funded through GitHub Sponsors.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
The other performance tests run and pass with only 64-bit big integer
support.

This work was funded through GitHub Sponsors.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
Long long big integer support now raises an exception on overflow rather
than returning an undefined result.

Also adds an error when shifting by a negative value.

The new arithmetic checks are added in the misc.h header.

This work was funded through GitHub Sponsors.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
Makes it compatible with the __builtin_mul_overflow() syntax, used in
follow-up commit.

Includes optimisation in runtime.c to minimise the code size impact from
additional param.

Signed-off-by: Damien George <damien@micropython.org>
Signed-off-by: Angus Gratton <angus@redyak.com.au>
If big integer support is 'long long' then mp_parse_num_integer() can
parse to it directly instead of failing over from small int. This means
strtoll() is no longer pulled in, and fixes some bugs parsing long long
integers (i.e. can now parse negative values correctly, can now parse
values which aren't NULL terminated).

The (default) smallint parsing compiled code should stay the same here,
macros and a typedef are used to abstract some parts of it out.

When bigint is long long we parse to 'unsigned long long' first (to avoid
the code size hit of pulling in signed 64-bit math routines) and the
convert to signed at the end.

One tricky case this routine correctly overflows on is
int("9223372036854775808") which is one more than LLONG_MAX in decimal. No
unit test case added for this as it's too hard to detect 64-bit long
integer mode.

This work was funded through GitHub Sponsors.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
Copy link
Member

@dpgeorge dpgeorge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!

@dpgeorge dpgeorge force-pushed the bugfix/bigint_longlong_tests_parsing branch from 636b9bb to 17fbc5a Compare July 17, 2025 14:13
@dpgeorge dpgeorge merged commit 17fbc5a into micropython:master Jul 17, 2025
71 checks passed
@dpgeorge
Copy link
Member

Thanks @projectgus and @yoctopuce for your efforts here. Lots of small details have been addressed which is great!

@yoctopuce
Copy link
Contributor

Thanks @projectgus, you deserve all the credit. This is a really nice improvement for bare metal ports with limited ressources.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
py-core Relates to py/ directory in source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect parse of large integers in LONGLONG mode
3 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy