-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
BUG FIX: Using Series.str.fullmatch() and Series.str.match() with a compiled regex fails with arrow strings #61964
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
It seems we don't actually document that we support a compiled regular expression, although it works in practice because we pass pat
to re.compile()
in the non-arrow version, and that works.
But so it would be good to update the documentation and typing then to reflect the fact that a compiled pattern is also supported.
@jorisvandenbossche Moved tests to |
The suggestions of @yuanx749 are in the good direction |
if isinstance(pat, re.Pattern): | ||
# GH#61952 | ||
pat = pat.pattern | ||
if not pat.startswith("^"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This causes some typing issues (see https://github.com/pandas-dev/pandas/actions/runs/16615325282/job/47006632529?pr=61964).
The type checking does not understand that pat
is now a string after doing pat.pattern
...
I suppose adding something like cast(str, pat)
should fix it
See https://pandas.pydata.org/docs/dev/development/contributing_codebase.html#validating-type-hints
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche is it ok if I add str check in the if condition, I think this will also fix the issue?:
if isinstance(pat, str) and not pat.startswith("^"):
Fixes: #61952
After Fix:
Series.str.fullmatch()
andSeries.str.match()
with a compiled regex fails with arrow strings #61952