moving x86-64 feature baseline to SSE4.2? #27851

rgommers · 2024-11-25T18:50:40Z

As of today, the SIMD "baseline" that we compile for goes up to SSE3, and any higher features are opt-in and runtime dispatched. SSE3 has been the maximum assumed feature for quite a while. We haven't reviewed this choice recently. At some point in the past we determined a rule of thumb saying that we could drop support for a particular feature (or lack thereof) if support for it dropped below 0.5%. That seems to be the case now for systems without SSE4.1 and SSE4.2.

Here is the full list of dispatchable targets and the features we currently build for each one, in the format "headers: enabled target list, e.g.:

Generating multi-targets for "_umath_tests.dispatch.h" 
  Enabled targets: AVX2, SSE41, baseline

Full set of dispatchable targets:

Generating multi-targets for "_umath_tests.dispatch.h" 
  Enabled targets: AVX2, SSE41, baseline
Generating multi-targets for "argfunc.dispatch.h" 
  Enabled targets: AVX512_SKX, AVX2, SSE42, baseline
Generating multi-targets for "x86_simd_argsort.dispatch.h" 
  Enabled targets: AVX512_SKX, AVX2
Generating multi-targets for "x86_simd_qsort.dispatch.h" 
  Enabled targets: AVX512_SKX, AVX2
Generating multi-targets for "x86_simd_qsort_16bit.dispatch.h" 
  Enabled targets: AVX512_SPR, AVX512_ICL
Generating multi-targets for "highway_qsort.dispatch.h" 
  Enabled targets: 
Generating multi-targets for "highway_qsort_16bit.dispatch.h" 
  Enabled targets: 
Generating multi-targets for "loops_arithm_fp.dispatch.h" 
  Enabled targets: AVX2, baseline
Generating multi-targets for "loops_arithmetic.dispatch.h" 
  Enabled targets: AVX512_SKX, AVX512F, AVX2, SSE41, baseline
Generating multi-targets for "loops_comparison.dispatch.h" 
  Enabled targets: AVX512_SKX, AVX512F, AVX2, SSE42, baseline
Generating multi-targets for "loops_exponent_log.dispatch.h" 
  Enabled targets: AVX512_SKX, AVX512F, AVX2, baseline
Generating multi-targets for "loops_hyperbolic.dispatch.h" 
  Enabled targets: AVX512_SKX, AVX2, baseline
Generating multi-targets for "loops_logical.dispatch.h" 
  Enabled targets: AVX512_SKX, AVX2, baseline
Generating multi-targets for "loops_minmax.dispatch.h" 
  Enabled targets: AVX512_SKX, AVX2, baseline
Generating multi-targets for "loops_modulo.dispatch.h" 
  Enabled targets: baseline
Generating multi-targets for "loops_trigonometric.dispatch.h" 
  Enabled targets: AVX512_SKX, AVX2, baseline
Generating multi-targets for "loops_umath_fp.dispatch.h" 
  Enabled targets: AVX512_SKX, baseline
Generating multi-targets for "loops_unary.dispatch.h" 
  Enabled targets: AVX512_SKX, AVX2, baseline
Generating multi-targets for "loops_unary_fp.dispatch.h" 
  Enabled targets: SSE41, baseline
Generating multi-targets for "loops_unary_fp_le.dispatch.h" 
  Enabled targets: SSE41, baseline
Generating multi-targets for "loops_unary_complex.dispatch.h" 
  Enabled targets: AVX512F, AVX2, baseline
Generating multi-targets for "loops_autovec.dispatch.h" 
  Enabled targets: AVX2, baseline
Generating multi-targets for "_simd.dispatch.h" 
  Enabled targets: SSE42, AVX2, FMA3, AVX512F, AVX512_SKX, baseline

The most widely used data source for determining what hardware is out there is, I believe, https://store.steampowered.com/hwsurvey/?platform=combined. That currently says that SSE3 is at 100%, SSE4.1 at 99.78% and SSE4.2 at 99.70%. Meaning that if we bump the baseline up to SSE4.2, we'd only be dropping support for ~0.3% of systems with really old CPUs.

For more context, SSE4.2 was introduced in 2008, and even Windows 11 (v2024H2) now requires it (xref https://en.wikipedia.org/wiki/SSE4#SSE4.2).

Now the other side of this coin is - what do we gain by making this change? I haven't quantified each item, but the basic answer is:

Reduces build time on x86-64: 40% of build targets (206/517) on my 6 year Intel CPU with AVX512 are SIMD targets. We can trim off a decent fraction of those.
Reduces binary size: numpy/_core/_simd.so currently is 3.1 MB out of 39.9 MB on disk for a Linux release build. Looking at the multi-targets list higher up, it looks like we can trim that a fair bit.
Reduces number of variations that should be tested in CI (linux_simd.yml). Given the current config, we can't actually drop a job, but we do make the test coverage higher (there are current zero test configs for baseline + SSE4.1/2).

I'd suggest making the change in main this release cycle, meaning for numpy 2.3.0, which will probably be released in June 2025.

Hat tip to @itamarst for bringing up this topic (xref scientific-python/faster-scientific-python-ideas#11).

The text was updated successfully, but these errors were encountered:

charris · 2024-11-25T19:22:19Z

Just a note that my 11 year old Intel core i5 supports both sse4.1/2, and I think of it as a very old cpu. Yes, I will be moving to an AMD RYZEN 5 7600X soon, no machine lasts forever and monitors keep getting bigger.

seberg · 2024-11-29T11:42:09Z

Seems reasonable. This might even help clean up code a bit eventually because IIRC, SSE3 lacked quiet comparisons.
(But, only if relevant compiler versions stopped generating these incorrect instructions for scalar/auto-vectorized code.)

rgommers · 2025-05-17T18:52:09Z

An update half a year later, SSE4.2 support improved from 99.70% to 99.78%:

rgommers added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label Nov 25, 2024

lucascolley mentioned this issue Dec 3, 2024

ENH: Compile pocketfft with newer vector instructions? scipy/scipy#16984

Open

mattip mentioned this issue Dec 8, 2024

BUG: Building without SIMD instructions results in Illegal Instructions on old systems #27929

Closed

seiko2plus linked a pull request May 16, 2025 that will close this issue

ENH: Modulate dispatched x86 CPU features #28896

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

moving x86-64 feature baseline to SSE4.2? #27851

moving x86-64 feature baseline to SSE4.2? #27851

rgommers commented Nov 25, 2024

charris commented Nov 25, 2024

Uh oh!

seberg commented Nov 29, 2024

Uh oh!

rgommers commented May 17, 2025

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Uh oh!

moving x86-64 feature baseline to SSE4.2? #27851

moving x86-64 feature baseline to SSE4.2? #27851

Comments

rgommers commented Nov 25, 2024

charris commented Nov 25, 2024

Uh oh!

seberg commented Nov 29, 2024

Uh oh!

rgommers commented May 17, 2025

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.