-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Handle truth values; speed up smallint checks #1879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle truth values; speed up smallint checks #1879
Conversation
Agh, sorry about the regression. A test case should probably be added to cover it. Is the fast path for 0 justified? It doesn't seem to save many instructions. The long int path handles buffer sizes that aren't powers of two, and the short int path doesn't. This test case succeeds in Python 3 and fails with this patch: (0x10000).to_bytes(3, 'little') I don't fully understand the rationale of "using signed constants here... to avoid any unintended conversions", but note that 0xffffffff is an unsigned 32-bit integer constant on an LP64 machine. See the C99 standard, clause 6.4.4.1. Bizarrely, 4294967296 would be a signed long integer constant, despite having the same value - hex and decimal literals are apparentlythe standard apparently mandates that hex and decimal literals be handled differently... With a C11 compiler, you can verify that behavior with: #include <stdio.h>
#include <stdio.h>
#define typename(x) _Generic((x), \
int: "int", unsigned: "unsigned", long: "long", default: "other")
int main()
{
printf("%s\n", typename(4294967295));
printf("%s\n", typename(0xffffffff));
printf("%s\n", typename(0xffffff));
} [edited this by mistake instead of quoting it; restored it; github doesn't let you revert to a revision :( - @dhalbert] |
In my first iteration of #1860 I had implemented this: bool mp_binary_int_within_range(mp_int_t val, size_t nbytes, bool is_signed)
{
if (!is_signed && val < 0) {
// Negative numbers never fit in an unsigned value
return false;
}
if (nbytes >= sizeof(val)) {
// All non-negative N bit signed integers fit in an unsigned N bit integer.
// This case prevents overflow below.
return true;
}
if (is_signed) {
mp_int_t edge = ((mp_int_t)1 << (nbytes * 8 - 1));
return -edge <= val && val < edge;
} else {
mp_int_t edge = ((mp_int_t)1 << (nbytes * 8));
return val < edge;
}
} I expect that's not as fast for the common case of power of two buffer sizes as what you've got here, but it does handle the edge case of ones that aren't. |
I realized that after I submitted the PR, and I'll write one.
I was thinking that the most common case was zeroing a buffer, which may contain signed or unsigned values. The procedure call is quite a bit more expensive than the checks, so this may not be a big deal.
Good point, I missed that. I should add a test for that too.
My intention was to make sure the whole expression was signed: the I'll work on this and resubmit. I looked at the CPython implementations originally to see if there was clever code there, but it's structured somewhat differently, and doesn't share the low-level code in the same way. |
I may use that for the |
@godlygeek Could you take another look? Thanks. I've reused your shifting code above, since it's going to be fast for smallints. I could add a fast path for byte-sized values, but this is probably fine. My major reason for splitting the smallint and longint cases was to not use bytecodes to do the basic checking in the smallint case. I split the tests up so the array overflow test wouldn't be skipped completely if longints were turned off, and I added some more tests. I'm sorry I didn't get to discuss this with you more thoroughly during the sprints. There was a lot going on. |
Neither did I until just now, heh... I spotted that
Looks good to me!
Hm, I think that's a better idea than the fast path for zero, actually! We can skip the shifts for stuff that fits in one byte by adding a fast path inside the
I waffled on this myself (as evidenced by the earlier iteration I was able to resurrect, heh). I wasn't sure how to balance performance vs text segment size. You're in a much better position than me to evaluate the merits of that tradeoff.
Ooh, good catch.
Pros of sprints: I have access to knowledgeable people. Cons of sprints: so does everyone else. 😉 |
I looked at the machine code, and it's pretty minimal.The shifting is only a few instructions. So I think I'll leave well enough alone for now. Thanks for all your help on fixing this long-standing regular Python incompatibility. @tannewt, this is ready for review.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the fix @dhalbert and @godlygeek
Fixes #1875.
@godlygeek You may want to take a look.