Skip to content

Commit 8870917

Browse files
committed
Apply auto-vectorization to the inner loop of numeric multiplication.
Compile numeric.c with -ftree-vectorize where available, and adjust the innermost loop of mul_var() so that it is amenable to being auto-vectorized. (Mainly, that involves making it process the arrays left-to-right not right-to-left.) Applying -ftree-vectorize actually makes numeric.o smaller, at least with my compiler (gcc 8.3.1 on x86_64), and it's a little faster too. Independently of that, fixing the inner loop to be vectorizable also makes things a bit faster. But doing both is a huge win for multiplications with lots of digits. For me, the numeric regression test is the same speed to within measurement noise, but numeric_big is a full 45% faster. We also looked into applying -funroll-loops, but that makes numeric.o bloat quite a bit, and the additional speed improvement is very marginal. Amit Khandekar, reviewed and edited a little by me Discussion: https://postgr.es/m/CAJ3gD9evtA_vBo+WMYMyT-u=keHX7-r8p2w7OSRfXf42LTwCZQ@mail.gmail.com
1 parent 695de5d commit 8870917

File tree

2 files changed

+15
-3
lines changed

2 files changed

+15
-3
lines changed

src/backend/utils/adt/Makefile

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,9 @@ clean distclean maintainer-clean:
125125

126126
like.o: like.c like_match.c
127127

128+
# Some code in numeric.c benefits from auto-vectorization
129+
numeric.o: CFLAGS += ${CFLAGS_VECTORIZE}
130+
128131
varlena.o: varlena.c levenshtein.c
129132

130133
include $(top_srcdir)/src/backend/common.mk

src/backend/utils/adt/numeric.c

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8191,6 +8191,7 @@ mul_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result,
81918191
int res_weight;
81928192
int maxdigits;
81938193
int *dig;
8194+
int *dig_i1_2;
81948195
int carry;
81958196
int maxdig;
81968197
int newdig;
@@ -8327,10 +8328,18 @@ mul_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result,
83278328
*
83288329
* As above, digits of var2 can be ignored if they don't contribute,
83298330
* so we only include digits for which i1+i2+2 <= res_ndigits - 1.
8331+
*
8332+
* This inner loop is the performance bottleneck for multiplication,
8333+
* so we want to keep it simple enough so that it can be
8334+
* auto-vectorized. Accordingly, process the digits left-to-right
8335+
* even though schoolbook multiplication would suggest right-to-left.
8336+
* Since we aren't propagating carries in this loop, the order does
8337+
* not matter.
83308338
*/
8331-
for (i2 = Min(var2ndigits - 1, res_ndigits - i1 - 3), i = i1 + i2 + 2;
8332-
i2 >= 0; i2--)
8333-
dig[i--] += var1digit * var2digits[i2];
8339+
i = Min(var2ndigits - 1, res_ndigits - i1 - 3);
8340+
dig_i1_2 = &dig[i1 + 2];
8341+
for (i2 = 0; i2 <= i; i2++)
8342+
dig_i1_2[i2] += var1digit * var2digits[i2];
83348343
}
83358344

83368345
/*

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy