Skip to content

BUG FIX: Using Series.str.fullmatch() and Series.str.match() with a compiled regex fails with arrow strings #61964

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

khemkaran10
Copy link
Contributor

@khemkaran10 khemkaran10 commented Jul 26, 2025

Fixes: #61952
After Fix:

DATA = ["applep", "bananap", "Cherryp", "DATEp", "eGGpLANTp", "123p", "23.45p"]
s=pd.Series(DATA)
s.str.fullmatch(re.compile(r"applep"))

Output:
0     True
1    False
2    False
3    False
4    False
5    False
6    False
dtype: bool
DATA = ["applep", "bananap", "Cherryp", "DATEp", "eGGpLANTp", "123p", "23.45p"]
sa=pd.Series(DATA, dtype="string[pyarrow]")
sa.str.match(re.compile(r"applep"))

Output:
0     True
1    False
2    False
3    False
4    False
5    False
6    False
dtype: boolean

@jorisvandenbossche jorisvandenbossche added this to the 2.3.2 milestone Jul 26, 2025
@jorisvandenbossche jorisvandenbossche added Strings String extension data type and string data Arrow pyarrow functionality labels Jul 26, 2025
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

It seems we don't actually document that we support a compiled regular expression, although it works in practice because we pass pat to re.compile() in the non-arrow version, and that works.
But so it would be good to update the documentation and typing then to reflect the fact that a compiled pattern is also supported.

@khemkaran10
Copy link
Contributor Author

@jorisvandenbossche Moved tests to pandas/tests/strings/test_find_replace.py and made a minor change to the docstring. I’m not sure what changes need to be made in docs. could you please provide more details?

@jorisvandenbossche
Copy link
Member

I’m not sure what changes need to be made in docs. could you please provide more details?

The suggestions of @yuanx749 are in the good direction

Comment on lines 312 to 315
if isinstance(pat, re.Pattern):
# GH#61952
pat = pat.pattern
if not pat.startswith("^"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes some typing issues (see https://github.com/pandas-dev/pandas/actions/runs/16615325282/job/47006632529?pr=61964).

The type checking does not understand that pat is now a string after doing pat.pattern ...
I suppose adding something like cast(str, pat) should fix it

See https://pandas.pydata.org/docs/dev/development/contributing_codebase.html#validating-type-hints

Copy link
Contributor Author

@khemkaran10 khemkaran10 Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche is it ok if I add str check in the if condition, I think this will also fix the issue?:
if isinstance(pat, str) and not pat.startswith("^"):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Using Series.str.fullmatch() and Series.str.match() with a compiled regex fails with arrow strings
3 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy