Skip to content

gh-133546: Make re.Match a well-rounded Sequence type #133549

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

vberlier
Copy link
Contributor

@vberlier vberlier commented May 7, 2025

@@ -1378,6 +1378,27 @@ when there is no match, you can test whether there was a match with a simple
if match:
process(match)

Match objects are proper :class:`~collections.abc.Sequence` types. You can access
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not true with this PR, Sequence has a number of other requirements (e.g. an index and count method).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, thanks! I added index and count by adapting the implementation of tuple.index and tuple.count. I also updated the unit test to cover all Sequence mixin methods.

Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test coverage is sometimes insufficient and sometimes redundant with existing ones.

Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some additional feedback. For the test failures, let's add some assertWarns. I don't think we should make some additional checks in C for the type of the operands (the issue is when you compare the input (which is a bytes) with the matched item (which are strings), hence a BytesWarning). So, instead, let's just expect the warning.

By the way, could you avoid using @picnixz in your commits as I'm getting pinged? TiA.

@vberlier
Copy link
Contributor Author

Oh, sorry for the ping. Is it okay if I just leave out the cases with bytes as input? I can't seem to trigger the warning locally, even with ./python -Werror -m test.

@vberlier vberlier requested a review from picnixz May 20, 2025 17:29
Comment on lines 1593 to 1605
.. method:: Match.index(value, start=0, stop=sys.maxsize, /)

Return the index of the first occurrence of the value among the matched groups.

Raises :exc:`ValueError` if the value is not present.

.. versionadded:: next

.. method:: Match.count(value, /)

Return the number of occurrences of the value among the matched groups.

.. versionadded:: next
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.. method:: Match.index(value, start=0, stop=sys.maxsize, /)
Return the index of the first occurrence of the value among the matched groups.
Raises :exc:`ValueError` if the value is not present.
.. versionadded:: next
.. method:: Match.count(value, /)
Return the number of occurrences of the value among the matched groups.
.. versionadded:: next
.. method:: Match.index(value, start=0, stop=sys.maxsize, /)
Return the index of the first occurrence of the value among the matched groups.
Raises :exc:`ValueError` if the value is not present.
.. versionadded:: next
.. method:: Match.count(value, /)
Return the number of occurrences of the value among the matched groups.
.. versionadded:: next

Comment on lines 2445 to 2452

if (index < 0 || index >= self->groups) {
/* raise IndexError if we were given a bad group number */
PyErr_SetString(PyExc_IndexError, "no such group");
return NULL;
}

return match_getslice_by_index(self, index, Py_None);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (index < 0 || index >= self->groups) {
/* raise IndexError if we were given a bad group number */
PyErr_SetString(PyExc_IndexError, "no such group");
return NULL;
}
return match_getslice_by_index(self, index, Py_None);
if (index < 0 || index >= self->groups) {
PyErr_SetString(PyExc_IndexError, "no such group");
return NULL;
}
return match_getslice_by_index(self, index, Py_None);

Comment on lines 2441 to +2442
static PyObject*
match_getitem(PyObject *op, PyObject* name)
match_item(PyObject *op, Py_ssize_t index)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's name this one match_sq_item maybe. After coming back to this PR, I got confused with match_getitem. Or rename match_getitem as match_subscript. I previously said not to change the name for match_getitem, but I think it's better if we do it anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree. I reverted match_getitem back to my initial change match_subscript to make things clearer.

PyObject* index = PyDict_GetItemWithError(self->pattern->groupindex, item);
if (index && PyLong_Check(index)) {
Py_ssize_t i = PyLong_AsSsize_t(index);
if (!PyErr_Occurred()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!PyErr_Occurred()) {
if (i != -1 || !PyErr_Occurred()) {

Py_ssize_t start, Py_ssize_t stop)
/*[clinic end generated code: output=846597f6f96f829c input=7f41b5a99e0ad88e]*/
{
PySlice_AdjustIndices(self->groups, &start, &stop, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
PySlice_AdjustIndices(self->groups, &start, &stop, 1);
(void)PySlice_AdjustIndices(self->groups, &start, &stop, 1);

Comment on lines 673 to 674
with self.assertRaises(StopIteration):
next(it)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with self.assertRaises(StopIteration):
next(it)
self.assertRaises(StopIteration, next, it)

Comment on lines 667 to 668
m = re.match(r"(a)(b)(c)", "abc")
it = iter(m)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
m = re.match(r"(a)(b)(c)", "abc")
it = iter(m)
it = iter(re.match(r"(a)(b)(c)", "abc"))

Comment on lines 685 to 689
self.assertRaises(ValueError, m.index, "abc", 1)
self.assertEqual(m.index("a", 1), 1)
self.assertEqual(m.index("b", 1), 2)
self.assertEqual(m.index("c", 1), 3)
self.assertRaises(ValueError, m.index, "123", 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.assertRaises(ValueError, m.index, "abc", 1)
self.assertEqual(m.index("a", 1), 1)
self.assertEqual(m.index("b", 1), 2)
self.assertEqual(m.index("c", 1), 3)
self.assertRaises(ValueError, m.index, "123", 1)
self.assertEqual(m.index("a", 1), 1)
self.assertEqual(m.index("b", 1), 2)
self.assertEqual(m.index("c", 1), 3)
self.assertRaises(ValueError, m.index, "abc", 1)
self.assertRaises(ValueError, m.index, "123", 1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the other assertRaises together as well

next(it)

def test_match_index(self):
m = re.match(r"(a)(b)(c)", "abc")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have a pattern where m.index(x, start) != start and m.index(s) == 1 (duplicated matches)

self.assertEqual(day, "07")
case _:
self.fail()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

@AA-Turner
Copy link
Member

I'm not sure consensus exists for this on either the issue or the linked Discourse thread...

A

@vberlier
Copy link
Contributor Author

vberlier commented Jun 3, 2025

It's a bit quiet, but I guess there's not much to talk about. Serhiy Storchaka brought up a valuable point about the fact that back in the day there was some debate over the possible semantics of len() and unpacking. But with the __getitem__ implementation that was introduced 10 years ago, there's actually not much wiggle room for creative interpretations of what it would mean for re.Match to be a proper Sequence type.

This PR is nothing more than painting by numbers. The API was just waiting for someone to spend a couple of hours to fill in the blanks. So it's not particularly exciting, and I doubt the topic will attract a lot of attention on Discourse, but personally I think this is the kind of polish and attention to detail that makes Python feel like home. And it looks like some other people and a core dev are on board with it too. That's enough for me to get started. Plus, working code can help ground discussions.

That said, I'm still new to CPython's contributing process, so let me know if I should've done something differently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy