-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
Closed
Labels
Description
Describe the issue:
Hi,
I think there is an issue with the slicing of numpy arrays of type StringDType.
It is inconsistent with regular numpy slicing, and also with e.g. "<U30" fixed length string arrays.
Below you can find the self-contained example and corresponding expected and obtained outputs.
Not that this is only for "multi-index" slicing, e.g. [[5]]
, but doesnt occur for e.g. [5]
.
Reproduce the code example:
# /// script
# requires-python = "==3.11.11"
# dependencies = [
# "numpy==2.2.2",
# ]
# ///
import numpy as np
def main():
STRINGDTYPE_Array = np.array(
[
["AAAAAAAAAAAAAAAAA"],
["BBBBBBBBBBBBBBBBBBBBBBBBBBBBB"],
["CCCCCCCCCCCCCCCCC"],
["DDDDDDDDDDDDDDDDD"],
],
dtype=np.dtypes.StringDType,
)
U30_Array = np.array(
[
["AAAAAAAAAAAAAAAAA"],
["BBBBBBBBBBBBBBBBBBBBBBBBBBBBB"],
["CCCCCCCCCCCCCCCCC"],
["DDDDDDDDDDDDDDDDD"],
],
dtype="U30",
)
expected = []
for i in range(U30_Array.shape[0]):
expected.append(U30_Array[[i]])
obtained = []
for i in range(STRINGDTYPE_Array.shape[0]):
obtained.append(STRINGDTYPE_Array[[i]])
print(f"{expected=}")
print(f"{obtained=}")
if __name__ == "__main__":
main()
Error message:
expected=[array([['AAAAAAAAAAAAAAAAA']], dtype='<U30'), array([['BBBBBBBBBBBBBBBBBBBBBBBBBBBBB']], dtype='<U30'), array([['CCCCCCCCCCCCCCCCC']], dtype='<U30'), array([['DDDDDDDDDDDDDDDDD']], dtype='<U30')]
obtained=[array([['AAAAAAAAAAAAAAAAA']], dtype=StringDType()), array([['AAAAAAAAAAAAAAAAA\x1dBBBBBBBBBBB']], dtype=StringDType()), array([['AAAAAAAAAAAAAAAAA']], dtype=StringDType()), array([['AAAAAAAAAAAAAAAAA']], dtype=StringDType())]
Python and NumPy Versions:
print(numpy.version); print(sys.version)
2.2.2
3.11.11 | packaged by conda-forge | (main, Dec 5 2024, 14:17:24) [GCC 13.3.0]
Runtime Environment:
[{'numpy_version': '2.2.2',
'python': '3.11.11 | packaged by conda-forge | (main, Dec 5 2024, 14:17:24) '
'[GCC 13.3.0]',
'uname': uname_result(system='Linux', node='963b179a5b20', release='5.15.0-135-generic', version='#146-Ubuntu SMP Sat Feb 15 17:06:22 UTC 2025', machine='x86_64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2'],
'not_found': ['AVX512F',
'AVX512CD',
'AVX512_KNL',
'AVX512_KNM',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL',
'AVX512_SPR']}},
{'architecture': 'Haswell',
'filepath': '/opt/conda/lib/libopenblasp-r0.3.28.so',
'internal_api': 'openblas',
'num_threads': 20,
'prefix': 'libopenblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.28'}]
Context for the issue:
I have the suspicion that this inconsistent indexing may lead to issues in downstream libraries e.g. zarr-developers/zarr-python#3174