Skip to content

Commit d9bb8ef

Browse files
committed
Optimise non-native 128-bit addition in int128.h.
On platforms without native 128-bit integer support, simplify the test for carry in int128_add_uint64() by noting that the low-part addition is unsigned integer arithmetic, which is just modular arithmetic. Therefore the test for carry can simply be written as "new value < old value" (i.e., a test for modular wrap-around). This can then be made branchless so that on modern compilers it produces the same machine instructions as native 128-bit addition, making it significantly simpler and faster. Similarly, the test for carry in int128_add_int64() can be written in much the same way, but with an extra term to compensate for the sign of the value being added. Again, on modern compilers this leads to branchless code, often identical to the native 128-bit integer addition machine code. Author: Dean Rasheed <dean.a.rasheed@gmail.com> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Discussion: https://postgr.es/m/CAEZATCWgBMc9ZwKMYqQpaQz2X6gaamYRB+RnMsUNcdMcL2Mj_w@mail.gmail.com
1 parent 572c0f1 commit d9bb8ef

File tree

1 file changed

+16
-20
lines changed

1 file changed

+16
-20
lines changed

src/include/common/int128.h

Lines changed: 16 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -68,17 +68,17 @@ int128_add_uint64(INT128 *i128, uint64 v)
6868
#else
6969
/*
7070
* First add the value to the .lo part, then check to see if a carry needs
71-
* to be propagated into the .hi part. A carry is needed if both inputs
72-
* have high bits set, or if just one input has high bit set while the new
73-
* .lo part doesn't. Remember that .lo part is unsigned; we cast to
74-
* signed here just as a cheap way to check the high bit.
71+
* to be propagated into the .hi part. Since this is unsigned integer
72+
* arithmetic, which is just modular arithmetic, a carry is needed if the
73+
* new .lo part is less than the old .lo part (i.e., if modular
74+
* wrap-around occurred). Writing this in the form below, rather than
75+
* using an "if" statement causes modern compilers to produce branchless
76+
* machine code identical to the native code.
7577
*/
7678
uint64 oldlo = i128->lo;
7779

7880
i128->lo += v;
79-
if (((int64) v < 0 && (int64) oldlo < 0) ||
80-
(((int64) v < 0 || (int64) oldlo < 0) && (int64) i128->lo >= 0))
81-
i128->hi++;
81+
i128->hi += (i128->lo < oldlo);
8282
#endif
8383
}
8484

@@ -93,23 +93,19 @@ int128_add_int64(INT128 *i128, int64 v)
9393
#else
9494
/*
9595
* This is much like the above except that the carry logic differs for
96-
* negative v. Ordinarily we'd need to subtract 1 from the .hi part
97-
* (corresponding to adding the sign-extended bits of v to it); but if
98-
* there is a carry out of the .lo part, that cancels and we do nothing.
96+
* negative v -- we need to subtract 1 from the .hi part if the new .lo
97+
* value is greater than the old .lo value. That can be achieved without
98+
* any branching by adding the sign bit from v (v >> 63 = 0 or -1) to the
99+
* previous result (for negative v, if the new .lo value is less than the
100+
* old .lo value, the two terms cancel and we leave the .hi part
101+
* unchanged, otherwise we subtract 1 from the .hi part). With modern
102+
* compilers this often produces machine code identical to the native
103+
* code.
99104
*/
100105
uint64 oldlo = i128->lo;
101106

102107
i128->lo += v;
103-
if (v >= 0)
104-
{
105-
if ((int64) oldlo < 0 && (int64) i128->lo >= 0)
106-
i128->hi++;
107-
}
108-
else
109-
{
110-
if (!((int64) oldlo < 0 || (int64) i128->lo >= 0))
111-
i128->hi--;
112-
}
108+
i128->hi += (i128->lo < oldlo) + (v >> 63);
113109
#endif
114110
}
115111

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy