Handle truth values; speed up smallint checks #1879

dhalbert · 2019-05-12T04:21:24Z

Converts truth values to 0 and 1 appropriately. Add overflow checks for int to bytes conversions #1860 introduced a regression that made them not valid.
Separates out checking small int ranges to speed up the common cases.

@godlygeek You may want to take a look.

godlygeek · 2019-05-12T05:51:49Z

Agh, sorry about the regression. A test case should probably be added to cover it.

Is the fast path for 0 justified? It doesn't seem to save many instructions.

The long int path handles buffer sizes that aren't powers of two, and the short int path doesn't. This test case succeeds in Python 3 and fails with this patch:

(0x10000).to_bytes(3, 'little')

I don't fully understand the rationale of "using signed constants here... to avoid any unintended conversions", but note that 0xffffffff is an unsigned 32-bit integer constant on an LP64 machine. See the C99 standard, clause 6.4.4.1. Bizarrely, 4294967296 would be a signed long integer constant, despite having the same value - hex and decimal literals are apparentlythe standard apparently mandates that hex and decimal literals be handled differently... With a C11 compiler, you can verify that behavior with:

#include <stdio.h>
#include <stdio.h>

#define typename(x) _Generic((x), \
    int: "int", unsigned: "unsigned", long: "long", default: "other")

int main()
{
    printf("%s\n", typename(4294967295));
    printf("%s\n", typename(0xffffffff));
    printf("%s\n", typename(0xffffff));
}

[edited this by mistake instead of quoting it; restored it; github doesn't let you revert to a revision :( - @dhalbert]

godlygeek · 2019-05-12T05:59:11Z

In my first iteration of #1860 I had implemented this:

bool mp_binary_int_within_range(mp_int_t val, size_t nbytes, bool is_signed)
{
    if (!is_signed && val < 0) {
        // Negative numbers never fit in an unsigned value
        return false;
    }

    if (nbytes >= sizeof(val)) {
        // All non-negative N bit signed integers fit in an unsigned N bit integer.
        // This case prevents overflow below.
        return true;
    }

    if (is_signed) {
        mp_int_t edge = ((mp_int_t)1 << (nbytes * 8 - 1));
        return -edge <= val && val < edge;
    } else {
        mp_int_t edge = ((mp_int_t)1 << (nbytes * 8));
        return val < edge;
    }
}

I expect that's not as fast for the common case of power of two buffer sizes as what you've got here, but it does handle the edge case of ones that aren't.

dhalbert · 2019-05-12T13:15:15Z

Agh, sorry about the regression. A test case should probably be added to cover it.

I realized that after I submitted the PR, and I'll write one.

Is the fast path for 0 justified? It doesn't seem to save many instructions.

I was thinking that the most common case was zeroing a buffer, which may contain signed or unsigned values. The procedure call is quite a bit more expensive than the checks, so this may not be a big deal.

The long int path handles buffer sizes that aren't powers of two, and the short int path doesn't. This test case succeeds in Python 3 and fails with this patch:
(0x10000).to_bytes(3, 'little')

Good point, I missed that. I should add a test for that too. .to_bytes() is rare; storing values in a bytearray or array.array is much more common, and I was trying to optimize the checking for that. The checking should perhaps be split into two use cases.

I don't fully understand the rationale of "using signed constants here... to avoid any unintended conversions", but note that 0xffffffff is an unsigned 32-bit integer constant on an LP64 machine. See the C99 standard, clause 6.4.4.1. Bizarrely, 4294967296 would be a signed long integer constant, despite having the same value - the standard apparently mandates that hex and decimal literals be handled differently.

My intention was to make sure the whole expression was signed: the MAX_UINT* constants are suffixed with 'U'. I didn't know that thing about hex vs decimal, agh.

I'll work on this and resubmit. I looked at the CPython implementations originally to see if there was clever code there, but it's structured somewhat differently, and doesn't share the low-level code in the same way.

dhalbert · 2019-05-12T13:23:47Z

In my first iteration of #1860 I had implemented this:

...

I expect that's not as fast for the common case of power of two buffer sizes as what you've got here, but it does handle the edge case of ones that aren't.

I may use that for the to_bytes() case. I was trying to use a right-shift instead of a left-shift, because then you can check for 0. But it's not defined in C whether right shift is arithmetic or logical, and there didn't seem to be an easy way to test at compile time for what a particular compiler chooses.

dhalbert · 2019-05-12T15:43:24Z

@godlygeek Could you take another look? Thanks. I've reused your shifting code above, since it's going to be fast for smallints. I could add a fast path for byte-sized values, but this is probably fine. My major reason for splitting the smallint and longint cases was to not use bytecodes to do the basic checking in the smallint case.

I split the tests up so the array overflow test wouldn't be skipped completely if longints were turned off, and I added some more tests.

I'm sorry I didn't get to discuss this with you more thoroughly during the sprints. There was a lot going on.

godlygeek · 2019-05-12T17:02:44Z

I didn't know that thing about hex vs decimal, agh.

Neither did I until just now, heh... I spotted that 0xffffffff didn't fit into a signed 32-bit int, and then went looking for what that would actually do, and was surprised by what I learned.

@godlygeek Could you take another look? Thanks

Looks good to me!

I could add a fast path for byte-sized values

Hm, I think that's a better idea than the fast path for zero, actually! We can skip the shifts for stuff that fits in one byte by adding a fast path inside the if (signed) ... else blocks. It's less fast than just a compare against 0, but in exchange for 2 compares against constants (plus the bool check) we get substantially more values that can be fast pathed - including every valid call to bytearray.

My major reason for splitting the smallint and longint cases

I waffled on this myself (as evidenced by the earlier iteration I was able to resurrect, heh). I wasn't sure how to balance performance vs text segment size. You're in a much better position than me to evaluate the merits of that tradeoff.

I split the tests up so the array overflow test wouldn't be skipped completely if longints were turned off

Ooh, good catch.

I'm sorry I didn't get to discuss this with you more thoroughly during the sprints. There was a lot going on.

Pros of sprints: I have access to knowledgeable people. Cons of sprints: so does everyone else. 😉

dhalbert · 2019-05-12T18:49:21Z

I looked at the machine code, and it's pretty minimal.The shifting is only a few instructions. So I think I'll leave well enough alone for now. Thanks for all your help on fixing this long-standing regular Python incompatibility.

@tannewt, this is ready for review.

000123a8 <mp_small_int_buffer_overflow_check>:
   123a8:	b508      	push	{r3, lr}
   123aa:	b180      	cbz	r0, 123ce <mp_small_int_buffer_overflow_check+0x26>
   123ac:	b922      	cbnz	r2, 123b8 <mp_small_int_buffer_overflow_check+0x10>
   123ae:	2800      	cmp	r0, #0
   123b0:	da0e      	bge.n	123d0 <mp_small_int_buffer_overflow_check+0x28>
   123b2:	480a      	ldr	r0, [pc, #40]	; (123dc <mp_small_int_buffer_overflow_check+0x34>)
   123b4:	f008 fdce 	bl	1af54 <mp_raise_OverflowError_varg>
   123b8:	2903      	cmp	r1, #3
   123ba:	d808      	bhi.n	123ce <mp_small_int_buffer_overflow_check+0x26>
   123bc:	00cb      	lsls	r3, r1, #3
   123be:	1e5a      	subs	r2, r3, #1
   123c0:	2301      	movs	r3, #1
   123c2:	4093      	lsls	r3, r2
   123c4:	425a      	negs	r2, r3
   123c6:	4282      	cmp	r2, r0
   123c8:	dcf3      	bgt.n	123b2 <mp_small_int_buffer_overflow_check+0xa>
   123ca:	4298      	cmp	r0, r3
   123cc:	daf1      	bge.n	123b2 <mp_small_int_buffer_overflow_check+0xa>
   123ce:	bd08      	pop	{r3, pc}
   123d0:	2903      	cmp	r1, #3
   123d2:	d8fc      	bhi.n	123ce <mp_small_int_buffer_overflow_check+0x26>
   123d4:	00ca      	lsls	r2, r1, #3
   123d6:	2301      	movs	r3, #1
   123d8:	4093      	lsls	r3, r2
   123da:	e7f6      	b.n	123ca <mp_small_int_buffer_overflow_check+0x22>
   123dc:	1eee      	subs	r6, r5, #3
   123de:	0004      	movs	r4, r0

tannewt

Thank you for the fix @dhalbert and @godlygeek

Handle truth values; speed up smallint checks

d103ac1

dhalbert requested a review from tannewt May 12, 2019 04:21

use approx of original @godlygeek code for smallints; add tests

8664a65

tannewt approved these changes May 13, 2019

View reviewed changes

tannewt merged commit 589755e into adafruit:master May 13, 2019

godlygeek mentioned this pull request May 13, 2019

Format Strings for struct.pack_into() don't match signed/unsigned vars #1451

Closed

dhalbert deleted the bytearray-array-range-fixes branch May 28, 2019 13:25

projectgus mentioned this pull request Nov 27, 2024

py/objint,py/binary: Add int.to_bytes(signed) parameter, add common overflow checks. micropython/micropython#16311

Draft

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Handle truth values; speed up smallint checks #1879

Handle truth values; speed up smallint checks #1879

Uh oh!

dhalbert commented May 12, 2019

Uh oh!

godlygeek commented May 12, 2019 •

edited by dhalbert

Loading

Uh oh!

godlygeek commented May 12, 2019

Uh oh!

dhalbert commented May 12, 2019 •

edited

Loading

Uh oh!

dhalbert commented May 12, 2019 •

edited

Loading

Uh oh!

dhalbert commented May 12, 2019

Uh oh!

godlygeek commented May 12, 2019

Uh oh!

dhalbert commented May 12, 2019

Uh oh!

tannewt left a comment

Uh oh!

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Handle truth values; speed up smallint checks #1879

Handle truth values; speed up smallint checks #1879

Uh oh!

Conversation

dhalbert commented May 12, 2019

Uh oh!

godlygeek commented May 12, 2019 • edited by dhalbert Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

godlygeek commented May 12, 2019

Uh oh!

dhalbert commented May 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dhalbert commented May 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dhalbert commented May 12, 2019

Uh oh!

godlygeek commented May 12, 2019

Uh oh!

dhalbert commented May 12, 2019

Uh oh!

tannewt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

godlygeek commented May 12, 2019 •

edited by dhalbert

Loading

dhalbert commented May 12, 2019 •

edited

Loading

dhalbert commented May 12, 2019 •

edited

Loading