Skip to content

BUG: np.str_ treats '\x00' as the empty string. #28964

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lzlarryli opened this issue May 14, 2025 · 1 comment
Closed

BUG: np.str_ treats '\x00' as the empty string. #28964

lzlarryli opened this issue May 14, 2025 · 1 comment
Labels
00 - Bug 57 - Close? Issues which may be closable unless discussion continued

Comments

@lzlarryli
Copy link

lzlarryli commented May 14, 2025

Describe the issue:

'\x00' is a nonempty string but np.str_('\x00') is an empty string. This leads to strange behavior for parsing string arrays. For example,

In [1]: import numpy as np

In [2]: np.array(['\x00'])[0].encode('UTF-8'), ['\x00'][0].encode('UTF-8')
Out[2]: (b'', b'\x00')

In [3]: np.array(['\x01'])[0].encode('UTF-8'), ['\x01'][0].encode('UTF-8')
Out[3]: (b'\x01', b'\x01')

In [4]: np.array([''])[0].encode('UTF-8'), [''][0].encode('UTF-8')
Out[4]: (b'', b'')

Reproduce the code example:

import numpy as np
np.str_('\x00')

Error message:

np.str_('')  # This is wrong.

Python and NumPy Versions:

2.0.2
3.11.12 (main, Apr 9 2025, 08:55:54) [GCC 11.4.0]

Runtime Environment:

No response

Context for the issue:

No response

@ngoldbaum
Copy link
Member

NumPy fixed-width string DTypes are null-padded, so there is no way to tell the difference between '\x00' and an empty string. This is a fundamental shortcoming of padding with trailing nulls like this.

If you use StringDType you can avoid this problem:

>>> np.array(['\x00'], dtype='T')[0].encode('UTF-8'), ['\x00'][0].encode('UTF-8')
(b'\x00', b'\x00')

Unfortunately we probably can't change the default string DType to be StringDType until NumPy 3.0.

@ngoldbaum ngoldbaum added the 57 - Close? Issues which may be closable unless discussion continued label May 14, 2025
@seberg seberg closed this as not planned Won't fix, can't repro, duplicate, stale May 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00 - Bug 57 - Close? Issues which may be closable unless discussion continued
Projects
None yet
Development

No branches or pull requests

3 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy