Skip to content

Commit 9c79e64

Browse files
committed
Frob numeric.c loop so that clang will auto-vectorize it too.
Experimentation shows that clang will auto-vectorize the critical multiplication loop if the termination condition is written "i2 < limit" rather than "i2 <= limit". This seems unbelievably stupid, but I've reproduced it on both clang 9.0.1 (RHEL8) and 11.0.3 (macOS Catalina). gcc doesn't care, so tweak the code to do it that way. Discussion: https://postgr.es/m/CAJ3gD9evtA_vBo+WMYMyT-u=keHX7-r8p2w7OSRfXf42LTwCZQ@mail.gmail.com
1 parent 87e6ed7 commit 9c79e64

File tree

1 file changed

+8
-6
lines changed

1 file changed

+8
-6
lines changed

src/backend/utils/adt/numeric.c

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -8191,7 +8191,6 @@ mul_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result,
81918191
int res_weight;
81928192
int maxdigits;
81938193
int *dig;
8194-
int *dig_i1_2;
81958194
int carry;
81968195
int maxdig;
81978196
int newdig;
@@ -8327,7 +8326,7 @@ mul_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result,
83278326
* Add the appropriate multiple of var2 into the accumulator.
83288327
*
83298328
* As above, digits of var2 can be ignored if they don't contribute,
8330-
* so we only include digits for which i1+i2+2 <= res_ndigits - 1.
8329+
* so we only include digits for which i1+i2+2 < res_ndigits.
83318330
*
83328331
* This inner loop is the performance bottleneck for multiplication,
83338332
* so we want to keep it simple enough so that it can be
@@ -8336,10 +8335,13 @@ mul_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result,
83368335
* Since we aren't propagating carries in this loop, the order does
83378336
* not matter.
83388337
*/
8339-
i = Min(var2ndigits - 1, res_ndigits - i1 - 3);
8340-
dig_i1_2 = &dig[i1 + 2];
8341-
for (i2 = 0; i2 <= i; i2++)
8342-
dig_i1_2[i2] += var1digit * var2digits[i2];
8338+
{
8339+
int i2limit = Min(var2ndigits, res_ndigits - i1 - 2);
8340+
int *dig_i1_2 = &dig[i1 + 2];
8341+
8342+
for (i2 = 0; i2 < i2limit; i2++)
8343+
dig_i1_2[i2] += var1digit * var2digits[i2];
8344+
}
83438345
}
83448346

83458347
/*

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy