-
Notifications
You must be signed in to change notification settings - Fork 49
Description
The current IR generated for comparison operations requires quite a bit from the optimizer to generate efficient code. For a custom back end it would be better to generate more streamlined IR, since integer comparisons are hot operations, and the custom back end probably won't be smart enough to optimize the current IR.
Current IR for x <= y
(integers) looks like this:
r1 = x & 1
r2 = r1 == 0
r3 = y & 1
r4 = r3 == 0
r5 = r2 & r4
if r5 goto L1 else goto L2 :: bool
L1:
r6 = x <= y :: signed
r0 = r6
goto L3
L2:
r7 = CPyTagged_IsLt_(y, x)
r8 = !r7
r0 = r8
L3:
if r0 goto L4 else goto L5 :: bool
Instead, we could have something like this:
r1 = x & 1
r2 = r1 == 0
if r2 goto L1 else goto L3
L1:
r3 = y & 1
r4 = r3 == 0
if r4 goto L2 else goto L3
L2:
r5 = x <= y :: signed
if r5 goto L4 else goto L5
L3:
r6 = CPyTagged_IsLt_(y, x)
if r6 goto L5 else goto L4
The fast path is 8 ops instead of 10, and the overall number of ops is reduced from 13 to 10. I would expect these to translate to faster and smaller code, assuming a simple-minded optimizer. It also looks like these would help gcc generate better code, though clang can optimize the current IR well already.
The implementation would involve special casing comparison ops in IRBuilder.process_conditional
, similar to what we currently do for and
, or
and not
.