-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
Description
Describe the issue:
When using multiple threads that perform a np.dot
operations on 2d arrays, where the second argument is transposed, the result of the dot product can be incorrect. The incorrect values that is produced seem to be valid results from another threads operation.
Each thread is performing the np.dot
on it own independent arrays, so this shouldn't cause any threading problems.
This only seems to occur, if the arrays involved in the operation is big enough, for example a (50,000, 3) x (3, 3) operation doesn't cause this bug but a (100,000, 3) x (3, 3) operation does.
Reproduce the code example:
import threading
import numpy as np
# If set lower than ~50,000 the bug is not triggered
N_POINTS_PER_THREAD = 100000
# If True, the matrix is transposed before multiplication, this results in incorrect results
# If False, the matrix is used as is, which produces correct results
TRANSPOSE = True
unequal_condition = threading.Event()
def worker(thread_id):
points = np.ones((N_POINTS_PER_THREAD, 3), dtype=np.float64) * thread_id
matrix = np.identity(3, dtype=np.float64)
result = np.dot(points, matrix.T if TRANSPOSE else matrix)
if not np.all(result == thread_id):
unequal_condition.set()
print(f"Thread {thread_id} produced wrong results")
def run():
num_threads = 10
# Run worker threads
threads = []
for i in range(num_threads):
t = threading.Thread(target=worker, args=(i,))
threads.append(t)
t.start()
for t in threads:
t.join()
if unequal_condition.is_set():
print("Unequal condition was set, indicating at least one thread produced wrong results.")
return True
else:
print("All threads produced correct results.")
return False
if __name__ == "__main__":
num_runs = 5000
for i in range(1, num_runs + 1):
print(f"Run {i} of {num_runs}")
if run():
break
Error message:
Example output:
Run 1 of 5000
All threads produced correct results.
Run 2 of 5000
All threads produced correct results.
Run 3 of 5000
All threads produced correct results.
Run 4 of 5000
All threads produced correct results.
Run 5 of 5000
All threads produced correct results.
Run 6 of 5000
All threads produced correct results.
Run 7 of 5000
Thread 1 produced wrong results
Thread 3 produced wrong results
Unequal condition was set, indicating at least one thread produced wrong results.
Python and NumPy Versions:
2.3.1
3.13.0 (main, Oct 16 2024, 03:23:02) [Clang 18.1.8 ]
Runtime Environment:
[{'numpy_version': '2.3.1',
'python': '3.13.0 (main, Oct 16 2024, 03:23:02) [Clang 18.1.8 ]',
'uname': uname_result(system='Linux', node='NAOU100302', release='5.15.146.1-microsoft-standard-WSL2', version='#1 SMP Thu Jan 11 04:09:03 UTC 2024', machine='x86_64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2',
'AVX512F',
'AVX512CD',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL'],
'not_found': ['AVX512_KNL', 'AVX512_KNM', 'AVX512_SPR']}},
{'architecture': 'SkylakeX',
'filepath': '/home/ben/bug/.venv/lib/python3.13/site-packages/numpy.libs/libscipy_openblas64_-56d6093b.so',
'internal_api': 'openblas',
'num_threads': 16,
'prefix': 'libscipy_openblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.29'}]
Context for the issue:
We have implemented a workaround for this issue by using a threading.Lock
around the operations that we observed this bug on, but due to the nature of this bug we are unsure whether other similar operations we perform are also affected.