-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
gh-133546: Make re.Match
a well-rounded Sequence
type
#133549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@@ -1378,6 +1378,27 @@ when there is no match, you can test whether there was a match with a simple | |||
if match: | |||
process(match) | |||
|
|||
Match objects are proper :class:`~collections.abc.Sequence` types. You can access |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not true with this PR, Sequence has a number of other requirements (e.g. an index
and count
method).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, thanks! I added index
and count
by adapting the implementation of tuple.index
and tuple.count
. I also updated the unit test to cover all Sequence
mixin methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test coverage is sometimes insufficient and sometimes redundant with existing ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some additional feedback. For the test failures, let's add some assertWarns
. I don't think we should make some additional checks in C for the type of the operands (the issue is when you compare the input (which is a bytes) with the matched item (which are strings), hence a BytesWarning). So, instead, let's just expect the warning.
By the way, could you avoid using @picnixz
in your commits as I'm getting pinged? TiA.
Oh, sorry for the ping. Is it okay if I just leave out the cases with bytes as input? I can't seem to trigger the warning locally, even with |
Doc/library/re.rst
Outdated
.. method:: Match.index(value, start=0, stop=sys.maxsize, /) | ||
|
||
Return the index of the first occurrence of the value among the matched groups. | ||
|
||
Raises :exc:`ValueError` if the value is not present. | ||
|
||
.. versionadded:: next | ||
|
||
.. method:: Match.count(value, /) | ||
|
||
Return the number of occurrences of the value among the matched groups. | ||
|
||
.. versionadded:: next |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. method:: Match.index(value, start=0, stop=sys.maxsize, /) | |
Return the index of the first occurrence of the value among the matched groups. | |
Raises :exc:`ValueError` if the value is not present. | |
.. versionadded:: next | |
.. method:: Match.count(value, /) | |
Return the number of occurrences of the value among the matched groups. | |
.. versionadded:: next | |
.. method:: Match.index(value, start=0, stop=sys.maxsize, /) | |
Return the index of the first occurrence of the value among the matched groups. | |
Raises :exc:`ValueError` if the value is not present. | |
.. versionadded:: next | |
.. method:: Match.count(value, /) | |
Return the number of occurrences of the value among the matched groups. | |
.. versionadded:: next | |
Modules/_sre/sre.c
Outdated
|
||
if (index < 0 || index >= self->groups) { | ||
/* raise IndexError if we were given a bad group number */ | ||
PyErr_SetString(PyExc_IndexError, "no such group"); | ||
return NULL; | ||
} | ||
|
||
return match_getslice_by_index(self, index, Py_None); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (index < 0 || index >= self->groups) { | |
/* raise IndexError if we were given a bad group number */ | |
PyErr_SetString(PyExc_IndexError, "no such group"); | |
return NULL; | |
} | |
return match_getslice_by_index(self, index, Py_None); | |
if (index < 0 || index >= self->groups) { | |
PyErr_SetString(PyExc_IndexError, "no such group"); | |
return NULL; | |
} | |
return match_getslice_by_index(self, index, Py_None); |
static PyObject* | ||
match_getitem(PyObject *op, PyObject* name) | ||
match_item(PyObject *op, Py_ssize_t index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's name this one match_sq_item
maybe. After coming back to this PR, I got confused with match_getitem
. Or rename match_getitem
as match_subscript
. I previously said not to change the name for match_getitem
, but I think it's better if we do it anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree. I reverted match_getitem
back to my initial change match_subscript
to make things clearer.
Modules/_sre/sre.c
Outdated
PyObject* index = PyDict_GetItemWithError(self->pattern->groupindex, item); | ||
if (index && PyLong_Check(index)) { | ||
Py_ssize_t i = PyLong_AsSsize_t(index); | ||
if (!PyErr_Occurred()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (!PyErr_Occurred()) { | |
if (i != -1 || !PyErr_Occurred()) { |
Modules/_sre/sre.c
Outdated
Py_ssize_t start, Py_ssize_t stop) | ||
/*[clinic end generated code: output=846597f6f96f829c input=7f41b5a99e0ad88e]*/ | ||
{ | ||
PySlice_AdjustIndices(self->groups, &start, &stop, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PySlice_AdjustIndices(self->groups, &start, &stop, 1); | |
(void)PySlice_AdjustIndices(self->groups, &start, &stop, 1); |
Lib/test/test_re.py
Outdated
with self.assertRaises(StopIteration): | ||
next(it) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with self.assertRaises(StopIteration): | |
next(it) | |
self.assertRaises(StopIteration, next, it) |
Lib/test/test_re.py
Outdated
m = re.match(r"(a)(b)(c)", "abc") | ||
it = iter(m) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
m = re.match(r"(a)(b)(c)", "abc") | |
it = iter(m) | |
it = iter(re.match(r"(a)(b)(c)", "abc")) |
Lib/test/test_re.py
Outdated
self.assertRaises(ValueError, m.index, "abc", 1) | ||
self.assertEqual(m.index("a", 1), 1) | ||
self.assertEqual(m.index("b", 1), 2) | ||
self.assertEqual(m.index("c", 1), 3) | ||
self.assertRaises(ValueError, m.index, "123", 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.assertRaises(ValueError, m.index, "abc", 1) | |
self.assertEqual(m.index("a", 1), 1) | |
self.assertEqual(m.index("b", 1), 2) | |
self.assertEqual(m.index("c", 1), 3) | |
self.assertRaises(ValueError, m.index, "123", 1) | |
self.assertEqual(m.index("a", 1), 1) | |
self.assertEqual(m.index("b", 1), 2) | |
self.assertEqual(m.index("c", 1), 3) | |
self.assertRaises(ValueError, m.index, "abc", 1) | |
self.assertRaises(ValueError, m.index, "123", 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the other assertRaises
together as well
Lib/test/test_re.py
Outdated
next(it) | ||
|
||
def test_match_index(self): | ||
m = re.match(r"(a)(b)(c)", "abc") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's have a pattern where m.index(x, start) != start
and m.index(s) == 1
(duplicated matches)
self.assertEqual(day, "07") | ||
case _: | ||
self.fail() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure consensus exists for this on either the issue or the linked Discourse thread... A |
It's a bit quiet, but I guess there's not much to talk about. Serhiy Storchaka brought up a valuable point about the fact that back in the day there was some debate over the possible semantics of This PR is nothing more than painting by numbers. The API was just waiting for someone to spend a couple of hours to fill in the blanks. So it's not particularly exciting, and I doubt the topic will attract a lot of attention on Discourse, but personally I think this is the kind of polish and attention to detail that makes Python feel like home. And it looks like some other people and a core dev are on board with it too. That's enough for me to get started. Plus, working code can help ground discussions. That said, I'm still new to CPython's contributing process, so let me know if I should've done something differently. |
re.Match
a well-roundedSequence
type #133546📚 Documentation preview 📚: https://cpython-previews--133549.org.readthedocs.build/