Skip to content

Handle truth values; speed up smallint checks #1879

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 13, 2019

Conversation

dhalbert
Copy link
Collaborator

Fixes #1875.

@godlygeek You may want to take a look.

@dhalbert dhalbert requested a review from tannewt May 12, 2019 04:21
@godlygeek
Copy link

godlygeek commented May 12, 2019

Agh, sorry about the regression. A test case should probably be added to cover it.

Is the fast path for 0 justified? It doesn't seem to save many instructions.

The long int path handles buffer sizes that aren't powers of two, and the short int path doesn't. This test case succeeds in Python 3 and fails with this patch:

(0x10000).to_bytes(3, 'little')

I don't fully understand the rationale of "using signed constants here... to avoid any unintended conversions", but note that 0xffffffff is an unsigned 32-bit integer constant on an LP64 machine. See the C99 standard, clause 6.4.4.1. Bizarrely, 4294967296 would be a signed long integer constant, despite having the same value - hex and decimal literals are apparentlythe standard apparently mandates that hex and decimal literals be handled differently... With a C11 compiler, you can verify that behavior with:

#include <stdio.h>
#include <stdio.h>

#define typename(x) _Generic((x), \
    int: "int", unsigned: "unsigned", long: "long", default: "other")

int main()
{
    printf("%s\n", typename(4294967295));
    printf("%s\n", typename(0xffffffff));
    printf("%s\n", typename(0xffffff));
}

[edited this by mistake instead of quoting it; restored it; github doesn't let you revert to a revision :( - @dhalbert]

@godlygeek
Copy link

In my first iteration of #1860 I had implemented this:

bool mp_binary_int_within_range(mp_int_t val, size_t nbytes, bool is_signed)
{
    if (!is_signed && val < 0) {
        // Negative numbers never fit in an unsigned value
        return false;
    }

    if (nbytes >= sizeof(val)) {
        // All non-negative N bit signed integers fit in an unsigned N bit integer.
        // This case prevents overflow below.
        return true;
    }

    if (is_signed) {
        mp_int_t edge = ((mp_int_t)1 << (nbytes * 8 - 1));
        return -edge <= val && val < edge;
    } else {
        mp_int_t edge = ((mp_int_t)1 << (nbytes * 8));
        return val < edge;
    }
}

I expect that's not as fast for the common case of power of two buffer sizes as what you've got here, but it does handle the edge case of ones that aren't.

@dhalbert
Copy link
Collaborator Author

dhalbert commented May 12, 2019

Agh, sorry about the regression. A test case should probably be added to cover it.

I realized that after I submitted the PR, and I'll write one.

Is the fast path for 0 justified? It doesn't seem to save many instructions.

I was thinking that the most common case was zeroing a buffer, which may contain signed or unsigned values. The procedure call is quite a bit more expensive than the checks, so this may not be a big deal.

The long int path handles buffer sizes that aren't powers of two, and the short int path doesn't. This test case succeeds in Python 3 and fails with this patch:

(0x10000).to_bytes(3, 'little')

Good point, I missed that. I should add a test for that too. .to_bytes() is rare; storing values in a bytearray or array.array is much more common, and I was trying to optimize the checking for that. The checking should perhaps be split into two use cases.

I don't fully understand the rationale of "using signed constants here... to avoid any unintended conversions", but note that 0xffffffff is an unsigned 32-bit integer constant on an LP64 machine. See the C99 standard, clause 6.4.4.1. Bizarrely, 4294967296 would be a signed long integer constant, despite having the same value - the standard apparently mandates that hex and decimal literals be handled differently.

My intention was to make sure the whole expression was signed: the MAX_UINT* constants are suffixed with 'U'. I didn't know that thing about hex vs decimal, agh.

I'll work on this and resubmit. I looked at the CPython implementations originally to see if there was clever code there, but it's structured somewhat differently, and doesn't share the low-level code in the same way.

@dhalbert
Copy link
Collaborator Author

dhalbert commented May 12, 2019

In my first iteration of #1860 I had implemented this:

...

I expect that's not as fast for the common case of power of two buffer sizes as what you've got here, but it does handle the edge case of ones that aren't.

I may use that for the to_bytes() case. I was trying to use a right-shift instead of a left-shift, because then you can check for 0. But it's not defined in C whether right shift is arithmetic or logical, and there didn't seem to be an easy way to test at compile time for what a particular compiler chooses.

@dhalbert
Copy link
Collaborator Author

@godlygeek Could you take another look? Thanks. I've reused your shifting code above, since it's going to be fast for smallints. I could add a fast path for byte-sized values, but this is probably fine. My major reason for splitting the smallint and longint cases was to not use bytecodes to do the basic checking in the smallint case.

I split the tests up so the array overflow test wouldn't be skipped completely if longints were turned off, and I added some more tests.

I'm sorry I didn't get to discuss this with you more thoroughly during the sprints. There was a lot going on.

@godlygeek
Copy link

I didn't know that thing about hex vs decimal, agh.

Neither did I until just now, heh... I spotted that 0xffffffff didn't fit into a signed 32-bit int, and then went looking for what that would actually do, and was surprised by what I learned.

@godlygeek Could you take another look? Thanks

Looks good to me!

I could add a fast path for byte-sized values

Hm, I think that's a better idea than the fast path for zero, actually! We can skip the shifts for stuff that fits in one byte by adding a fast path inside the if (signed) ... else blocks. It's less fast than just a compare against 0, but in exchange for 2 compares against constants (plus the bool check) we get substantially more values that can be fast pathed - including every valid call to bytearray.

My major reason for splitting the smallint and longint cases

I waffled on this myself (as evidenced by the earlier iteration I was able to resurrect, heh). I wasn't sure how to balance performance vs text segment size. You're in a much better position than me to evaluate the merits of that tradeoff.

I split the tests up so the array overflow test wouldn't be skipped completely if longints were turned off

Ooh, good catch.

I'm sorry I didn't get to discuss this with you more thoroughly during the sprints. There was a lot going on.

Pros of sprints: I have access to knowledgeable people. Cons of sprints: so does everyone else. 😉

@dhalbert
Copy link
Collaborator Author

I looked at the machine code, and it's pretty minimal.The shifting is only a few instructions. So I think I'll leave well enough alone for now. Thanks for all your help on fixing this long-standing regular Python incompatibility.

@tannewt, this is ready for review.

000123a8 <mp_small_int_buffer_overflow_check>:
   123a8:	b508      	push	{r3, lr}
   123aa:	b180      	cbz	r0, 123ce <mp_small_int_buffer_overflow_check+0x26>
   123ac:	b922      	cbnz	r2, 123b8 <mp_small_int_buffer_overflow_check+0x10>
   123ae:	2800      	cmp	r0, #0
   123b0:	da0e      	bge.n	123d0 <mp_small_int_buffer_overflow_check+0x28>
   123b2:	480a      	ldr	r0, [pc, #40]	; (123dc <mp_small_int_buffer_overflow_check+0x34>)
   123b4:	f008 fdce 	bl	1af54 <mp_raise_OverflowError_varg>
   123b8:	2903      	cmp	r1, #3
   123ba:	d808      	bhi.n	123ce <mp_small_int_buffer_overflow_check+0x26>
   123bc:	00cb      	lsls	r3, r1, #3
   123be:	1e5a      	subs	r2, r3, #1
   123c0:	2301      	movs	r3, #1
   123c2:	4093      	lsls	r3, r2
   123c4:	425a      	negs	r2, r3
   123c6:	4282      	cmp	r2, r0
   123c8:	dcf3      	bgt.n	123b2 <mp_small_int_buffer_overflow_check+0xa>
   123ca:	4298      	cmp	r0, r3
   123cc:	daf1      	bge.n	123b2 <mp_small_int_buffer_overflow_check+0xa>
   123ce:	bd08      	pop	{r3, pc}
   123d0:	2903      	cmp	r1, #3
   123d2:	d8fc      	bhi.n	123ce <mp_small_int_buffer_overflow_check+0x26>
   123d4:	00ca      	lsls	r2, r1, #3
   123d6:	2301      	movs	r3, #1
   123d8:	4093      	lsls	r3, r2
   123da:	e7f6      	b.n	123ca <mp_small_int_buffer_overflow_check+0x22>
   123dc:	1eee      	subs	r6, r5, #3
   123de:	0004      	movs	r4, r0

Copy link
Member

@tannewt tannewt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix @dhalbert and @godlygeek

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4.0.0 RC2 - Side-effects of https://github.com/adafruit/circuitpython/pull/1860 don't allow assigning various cross-types
3 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy