Skip to content

BUG: np.dot produces incorrect results when run concurrently #29391

@BenLewis-Seequent

Description

@BenLewis-Seequent

Describe the issue:

When using multiple threads that perform a np.dot operations on 2d arrays, where the second argument is transposed, the result of the dot product can be incorrect. The incorrect values that is produced seem to be valid results from another threads operation.

Each thread is performing the np.dot on it own independent arrays, so this shouldn't cause any threading problems.

This only seems to occur, if the arrays involved in the operation is big enough, for example a (50,000, 3) x (3, 3) operation doesn't cause this bug but a (100,000, 3) x (3, 3) operation does.

Reproduce the code example:

import threading

import numpy as np

# If set lower than ~50,000 the bug is not triggered
N_POINTS_PER_THREAD = 100000

# If True, the matrix is transposed before multiplication, this results in incorrect results
# If False, the matrix is used as is, which produces correct results
TRANSPOSE = True

unequal_condition = threading.Event()

def worker(thread_id):
    points = np.ones((N_POINTS_PER_THREAD, 3), dtype=np.float64) * thread_id
    matrix = np.identity(3, dtype=np.float64)
    result = np.dot(points, matrix.T if TRANSPOSE else matrix)
    if not np.all(result == thread_id):
        unequal_condition.set()
        print(f"Thread {thread_id} produced wrong results")


def run():
    num_threads = 10
    
    # Run worker threads
    threads = []
    for i in range(num_threads):
        t = threading.Thread(target=worker, args=(i,))

        threads.append(t)
        t.start()
    
    for t in threads:
        t.join()

    if unequal_condition.is_set():
        print("Unequal condition was set, indicating at least one thread produced wrong results.")
        return True
    else:
        print("All threads produced correct results.")
        return False

if __name__ == "__main__":
    num_runs = 5000
    for i in range(1, num_runs + 1):
        print(f"Run {i} of {num_runs}")
        if run():
            break

Error message:

Example output: 
Run 1 of 5000
All threads produced correct results.
Run 2 of 5000
All threads produced correct results.
Run 3 of 5000
All threads produced correct results.
Run 4 of 5000
All threads produced correct results.
Run 5 of 5000
All threads produced correct results.
Run 6 of 5000
All threads produced correct results.
Run 7 of 5000
Thread 1 produced wrong results
Thread 3 produced wrong results
Unequal condition was set, indicating at least one thread produced wrong results.

Python and NumPy Versions:

2.3.1
3.13.0 (main, Oct 16 2024, 03:23:02) [Clang 18.1.8 ]

Runtime Environment:

[{'numpy_version': '2.3.1',
'python': '3.13.0 (main, Oct 16 2024, 03:23:02) [Clang 18.1.8 ]',
'uname': uname_result(system='Linux', node='NAOU100302', release='5.15.146.1-microsoft-standard-WSL2', version='#1 SMP Thu Jan 11 04:09:03 UTC 2024', machine='x86_64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2',
'AVX512F',
'AVX512CD',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL'],
'not_found': ['AVX512_KNL', 'AVX512_KNM', 'AVX512_SPR']}},
{'architecture': 'SkylakeX',
'filepath': '/home/ben/bug/.venv/lib/python3.13/site-packages/numpy.libs/libscipy_openblas64_-56d6093b.so',
'internal_api': 'openblas',
'num_threads': 16,
'prefix': 'libscipy_openblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.29'}]

Context for the issue:

We have implemented a workaround for this issue by using a threading.Lock around the operations that we observed this bug on, but due to the nature of this bug we are unsure whether other similar operations we perform are also affected.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      pFad - Phonifier reborn

      Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

      Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


      Alternative Proxies:

      Alternative Proxy

      pFad Proxy

      pFad v3 Proxy

      pFad v4 Proxy