Skip to content

Commit c6eed9a

Browse files
authored
Merge pull request #28622 from seiko2plus/hwy_wrapper
ENH, SIMD: Initial implementation of Highway wrapper
2 parents a88d014 + 275e45c commit c6eed9a

File tree

6 files changed

+489
-3
lines changed

6 files changed

+489
-3
lines changed

numpy/_core/src/common/simd/README.md

Lines changed: 266 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,266 @@
1+
# NumPy SIMD Wrapper for Highway
2+
3+
This directory contains a lightweight C++ wrapper over Google's [Highway](https://github.com/google/highway) SIMD library, designed specifically for NumPy's needs.
4+
5+
> **Note**: This directory also contains the C interface of universal intrinsics (under `simd.h`) which is no longer supported. The Highway wrapper described in this document should be used instead for all new SIMD code.
6+
7+
## Overview
8+
9+
The wrapper simplifies Highway's SIMD interface by eliminating class tags and using lane types directly, which can be deduced from arguments in most cases. This design makes the SIMD code more intuitive and easier to maintain while still leveraging Highway generic intrinsics.
10+
11+
## Architecture
12+
13+
The wrapper consists of two main headers:
14+
15+
1. `simd.hpp`: The main header that defines namespaces and includes configuration macros
16+
2. `simd.inc.hpp`: Implementation details included by `simd.hpp` multiple times for different namespaces
17+
18+
Additionally, this directory contains legacy C interface files for universal intrinsics (`simd.h` and related files) which are deprecated and should not be used for new code. All new SIMD code should use the Highway wrapper.
19+
20+
21+
## Usage
22+
23+
### Basic Usage
24+
25+
```cpp
26+
#include "simd/simd.hpp"
27+
28+
// Use np::simd for maximum width SIMD operations
29+
using namespace np::simd;
30+
float *data = /* ... */;
31+
Vec<float> v = LoadU(data);
32+
v = Add(v, v);
33+
StoreU(v, data);
34+
35+
// Use np::simd128 for fixed 128-bit SIMD operations
36+
using namespace np::simd128;
37+
Vec<float> v128 = LoadU(data);
38+
v128 = Add(v128, v128);
39+
StoreU(v128, data);
40+
```
41+
42+
### Checking for SIMD Support
43+
44+
```cpp
45+
#include "simd/simd.hpp"
46+
47+
// Check if SIMD is enabled
48+
#if NPY_HWY
49+
// SIMD code
50+
#else
51+
// Scalar fallback code
52+
#endif
53+
54+
// Check for float64 support
55+
#if NPY_HWY_F64
56+
// Use float64 SIMD operations
57+
#endif
58+
59+
// Check for FMA support
60+
#if NPY_HWY_FMA
61+
// Use FMA operations
62+
#endif
63+
```
64+
65+
## Type Support and Constraints
66+
67+
The wrapper provides type constraints to help with SFINAE (Substitution Failure Is Not An Error) and compile-time type checking:
68+
69+
- `kSupportLane<TLane>`: Determines whether the specified lane type is supported by the SIMD extension.
70+
```cpp
71+
// Base template - always defined, even when SIMD is not enabled (for SFINAE)
72+
template <typename TLane>
73+
constexpr bool kSupportLane = NPY_HWY != 0;
74+
template <>
75+
constexpr bool kSupportLane<double> = NPY_HWY_F64 != 0;
76+
```
77+
78+
- `kMaxLanes<TLane>`: Maximum number of lanes supported by the SIMD extension for the specified lane type.
79+
```cpp
80+
template <typename TLane>
81+
constexpr size_t kMaxLanes = HWY_MAX_LANES_D(_Tag<TLane>);
82+
```
83+
84+
```cpp
85+
#include "simd/simd.hpp"
86+
87+
// Check if float64 operations are supported
88+
if constexpr (np::simd::kSupportLane<double>) {
89+
// Use float64 operations
90+
}
91+
```
92+
93+
These constraints allow for compile-time checking of which lane types are supported, which can be used in SFINAE contexts to enable or disable functions based on type support.
94+
95+
## Available Operations
96+
97+
The wrapper provides the following common operations that are used in NumPy:
98+
99+
- Vector creation operations:
100+
- `Zero`: Returns a vector with all lanes set to zero
101+
- `Set`: Returns a vector with all lanes set to the given value
102+
- `Undefined`: Returns an uninitialized vector
103+
104+
- Memory operations:
105+
- `LoadU`: Unaligned load of a vector from memory
106+
- `StoreU`: Unaligned store of a vector to memory
107+
108+
- Vector information:
109+
- `Lanes`: Returns the number of vector lanes based on the lane type
110+
111+
- Type conversion:
112+
- `BitCast`: Reinterprets a vector to a different type without modifying the underlying data
113+
- `VecFromMask`: Converts a mask to a vector
114+
115+
- Comparison operations:
116+
- `Eq`: Element-wise equality comparison
117+
- `Le`: Element-wise less than or equal comparison
118+
- `Lt`: Element-wise less than comparison
119+
- `Gt`: Element-wise greater than comparison
120+
- `Ge`: Element-wise greater than or equal comparison
121+
122+
- Arithmetic operations:
123+
- `Add`: Element-wise addition
124+
- `Sub`: Element-wise subtraction
125+
- `Mul`: Element-wise multiplication
126+
- `Div`: Element-wise division
127+
- `Min`: Element-wise minimum
128+
- `Max`: Element-wise maximum
129+
- `Abs`: Element-wise absolute value
130+
- `Sqrt`: Element-wise square root
131+
132+
- Logical operations:
133+
- `And`: Bitwise AND
134+
- `Or`: Bitwise OR
135+
- `Xor`: Bitwise XOR
136+
- `AndNot`: Bitwise AND NOT (a & ~b)
137+
138+
Additional Highway operations can be accessed via the `hn` namespace alias inside the `simd` or `simd128` namespaces.
139+
140+
## Extending
141+
142+
To add more operations from Highway:
143+
144+
1. Import them in the `simd.inc.hpp` file using the `using` directive if they don't require a tag:
145+
```cpp
146+
// For operations that don't require a tag
147+
using hn::FunctionName;
148+
```
149+
150+
2. Define wrapper functions for intrinsics that require a class tag:
151+
```cpp
152+
// For operations that require a tag
153+
template <typename TLane>
154+
HWY_API ReturnType FunctionName(Args... args) {
155+
return hn::FunctionName(_Tag<TLane>(), args...);
156+
}
157+
```
158+
159+
3. Add appropriate documentation and SFINAE constraints if needed
160+
161+
162+
## Build Configuration
163+
164+
The SIMD wrapper automatically disables SIMD operations when optimizations are disabled:
165+
166+
- When `NPY_DISABLE_OPTIMIZATION` is defined, SIMD operations are disabled
167+
- SIMD is enabled only when the Highway target is not scalar (`HWY_TARGET != HWY_SCALAR`)
168+
and not EMU128 (`HWY_TARGET != HWY_EMU128`)
169+
170+
## Design Notes
171+
172+
1. **Why avoid Highway scalar operations?**
173+
- NumPy already provides kernels for scalar operations
174+
- Compilers can better optimize standard library implementations
175+
- Not all Highway intrinsics are fully supported in scalar mode
176+
- For strict IEEE 754 floating-point compliance requirements, direct scalar
177+
implementations offer more predictable behavior than EMU128
178+
179+
2. **Legacy Universal Intrinsics**
180+
- The older universal intrinsics C interface (in `simd.h` and accessible via `NPY_SIMD` macros) is deprecated
181+
- All new SIMD code should use this Highway-based wrapper (accessible via `NPY_HWY` macros)
182+
- The legacy code is maintained for compatibility but will eventually be removed
183+
184+
3. **Feature Detection Constants vs. Highway Constants**
185+
- NumPy-specific constants (`NPY_HWY_F16`, `NPY_HWY_F64`, `NPY_HWY_FMA`) provide additional safety beyond raw Highway constants
186+
- Highway constants (e.g., `HWY_HAVE_FLOAT16`) only check platform capabilities but don't consider NumPy's build configuration
187+
- Our constants combine both checks:
188+
```cpp
189+
#define NPY_HWY_F16 (NPY_HWY && HWY_HAVE_FLOAT16)
190+
```
191+
- This ensures SIMD features won't be used when:
192+
- Platform supports it but NumPy optimization is disabled via meson option:
193+
```
194+
option('disable-optimization', type: 'boolean', value: false,
195+
description: 'Disable CPU optimized code (dispatch,simd,unroll...)')
196+
```
197+
- Highway target is scalar (`HWY_TARGET == HWY_SCALAR`)
198+
- Using these constants ensures consistent behavior across different compilation settings
199+
- Without this additional layer, code might incorrectly try to use SIMD paths in scalar mode
200+
201+
4. **Namespace Design**
202+
- `np::simd`: Maximum width SIMD operations (scalable)
203+
- `np::simd128`: Fixed 128-bit SIMD operations
204+
- `hn`: Highway namespace alias (available within the SIMD namespaces)
205+
206+
5. **Why Namespaces and Why Not Just Use Highway Directly?**
207+
- Highway's design uses class tag types as template parameters (e.g., `Vec<ScalableTag<float>>`) when defining vector types
208+
- Many Highway functions require explicitly passing a tag instance as the first parameter
209+
- This class tag-based approach increases verbosity and complexity in user code
210+
- Our wrapper eliminates this by internally managing tags through namespaces, letting users directly use types e.g. `Vec<float>`
211+
- Simple example with raw Highway:
212+
```cpp
213+
// Highway's approach
214+
float *data = /* ... */;
215+
216+
namespace hn = hwy::HWY_NAMESPACE;
217+
using namespace hn;
218+
219+
// Full-width operations
220+
ScalableTag<float> df; // Create a tag instance
221+
Vec<decltype(df)> v = LoadU(df, data); // LoadU requires a tag instance
222+
StoreU(v, df, data); // StoreU requires a tag instance
223+
224+
// 128-bit operations
225+
Full128<float> df128; // Create a 128-bit tag instance
226+
Vec<decltype(df128)> v128 = LoadU(df128, data); // LoadU requires a tag instance
227+
StoreU(v128, df128, data); // StoreU requires a tag instance
228+
```
229+
230+
- Simple example with our wrapper:
231+
```cpp
232+
// Our wrapper approach
233+
float *data = /* ... */;
234+
235+
// Full-width operations
236+
using namespace np::simd;
237+
Vec<float> v = LoadU(data); // Full-width vector load
238+
StoreU(v, data);
239+
240+
// 128-bit operations
241+
using namespace np::simd128;
242+
Vec<float> v128 = LoadU(data); // 128-bit vector load
243+
StoreU(v128, data);
244+
```
245+
246+
- The namespaced approach simplifies code, reduces errors, and provides a more intuitive interface
247+
- It preserves all Highway operations benefits while reducing cognitive overhead
248+
249+
5. **Why Namespaces Are Essential for This Design?**
250+
- Namespaces allow us to define different internal tag types (`hn::ScalableTag<TLane>` in `np::simd` vs `hn::Full128<TLane>` in `np::simd128`)
251+
- This provides a consistent type-based interface (`Vec<float>`) without requiring users to manually create tags
252+
- Enables using the same function names (like `LoadU`) with different implementations based on SIMD width
253+
- Without namespaces, we'd have to either reintroduce tags (defeating the purpose of the wrapper) or create different function names for each variant (e.g., `LoadU` vs `LoadU128`)
254+
255+
6. **Template Type Parameters**
256+
- `TLane`: The scalar type for each vector lane (e.g., uint8_t, float, double)
257+
258+
259+
## Requirements
260+
261+
- C++17 or later
262+
- Google Highway library
263+
264+
## License
265+
266+
Same as NumPy's license

numpy/_core/src/common/simd/simd.hpp

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
#ifndef NUMPY__CORE_SRC_COMMON_SIMD_SIMD_HPP_
2+
#define NUMPY__CORE_SRC_COMMON_SIMD_SIMD_HPP_
3+
4+
/**
5+
* This header provides a thin wrapper over Google's Highway SIMD library.
6+
*
7+
* The wrapper aims to simplify the SIMD interface of Google's Highway by
8+
* get ride of its class tags and use lane types directly which can be deduced
9+
* from the args in most cases.
10+
*/
11+
/**
12+
* Since `NPY_SIMD` is only limited to NumPy C universal intrinsics,
13+
* `NPY_HWY` is defined to indicate the SIMD availability for Google's Highway
14+
* C++ code.
15+
*
16+
* Highway SIMD is only available when optimization is enabled.
17+
* When NPY_DISABLE_OPTIMIZATION is defined, SIMD operations are disabled
18+
* and the code falls back to scalar implementations.
19+
*/
20+
#ifndef NPY_DISABLE_OPTIMIZATION
21+
#include <hwy/highway.h>
22+
23+
/**
24+
* We avoid using Highway scalar operations for the following reasons:
25+
*
26+
* 1. NumPy already provides optimized kernels for scalar operations. Using these
27+
* existing implementations is more consistent with NumPy's architecture and
28+
* allows for compiler optimizations specific to standard library calls.
29+
*
30+
* 2. Not all Highway intrinsics are fully supported in scalar mode, which could
31+
* lead to compilation errors or unexpected behavior for certain operations.
32+
*
33+
* 3. For NumPy's strict IEEE 754 floating-point compliance requirements, direct scalar
34+
* implementations offer more predictable behavior than EMU128.
35+
*
36+
* Therefore, we only enable Highway SIMD when targeting actual SIMD instruction sets.
37+
*/
38+
#define NPY_HWY ((HWY_TARGET != HWY_SCALAR) && (HWY_TARGET != HWY_EMU128))
39+
40+
// Indicates if the SIMD operations are available for float16.
41+
#define NPY_HWY_F16 (NPY_HWY && HWY_HAVE_FLOAT16)
42+
// Note: Highway requires SIMD extentions with native float32 support, so we don't need
43+
// to check for it.
44+
45+
// Indicates if the SIMD operations are available for float64.
46+
#define NPY_HWY_F64 (NPY_HWY && HWY_HAVE_FLOAT64)
47+
48+
// Indicates if the SIMD floating operations are natively supports fma.
49+
#define NPY_HWY_FMA (NPY_HWY && HWY_NATIVE_FMA)
50+
51+
#else
52+
#define NPY_HWY 0
53+
#define NPY_HWY_F16 0
54+
#define NPY_HWY_F64 0
55+
#define NPY_HWY_FMA 0
56+
#endif
57+
58+
namespace np {
59+
60+
/// Represents the max SIMD width supported by the platform.
61+
namespace simd {
62+
#if NPY_HWY
63+
/// The highway namespace alias.
64+
/// We can not import all the symbols from the HWY_NAMESPACE because it will
65+
/// conflict with the existing symbols in the numpy namespace.
66+
namespace hn = hwy::HWY_NAMESPACE;
67+
// internaly used by the template header
68+
template <typename TLane>
69+
using _Tag = hn::ScalableTag<TLane>;
70+
#endif
71+
#include "simd.inc.hpp"
72+
} // namespace simd
73+
74+
/// Represents the 128-bit SIMD width.
75+
namespace simd128 {
76+
#if NPY_HWY
77+
namespace hn = hwy::HWY_NAMESPACE;
78+
template <typename TLane>
79+
using _Tag = hn::Full128<TLane>;
80+
#endif
81+
#include "simd.inc.hpp"
82+
} // namespace simd128
83+
84+
} // namespace np
85+
86+
#endif // NUMPY__CORE_SRC_COMMON_SIMD_SIMD_HPP_

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy