gh-133546: Make `re.Match` a well-rounded `Sequence` type #133549

vberlier · 2025-05-07T02:55:53Z

Issue: Make re.Match a well-rounded Sequence type #133546

📚 Documentation preview 📚: https://cpython-previews--133549.org.readthedocs.build/

JelleZijlstra · 2025-05-07T03:59:33Z

Doc/library/re.rst

@@ -1378,6 +1378,27 @@ when there is no match, you can test whether there was a match with a simple
   if match:
       process(match)

+Match objects are proper :class:`~collections.abc.Sequence` types. You can access


This is not true with this PR, Sequence has a number of other requirements (e.g. an index and count method).

Nice catch, thanks! I added index and count by adapting the implementation of tuple.index and tuple.count. I also updated the unit test to cover all Sequence mixin methods.

Modules/_sre/sre.c

picnixz

Test coverage is sometimes insufficient and sometimes redundant with existing ones.

Doc/library/re.rst

Lib/test/test_re.py

picnixz

Some additional feedback. For the test failures, let's add some assertWarns. I don't think we should make some additional checks in C for the type of the operands (the issue is when you compare the input (which is a bytes) with the matched item (which are strings), hence a BytesWarning). So, instead, let's just expect the warning.

By the way, could you avoid using @picnixz in your commits as I'm getting pinged? TiA.

Lib/test/test_re.py

Modules/_sre/sre.c

vberlier · 2025-05-16T00:19:30Z

Oh, sorry for the ping. Is it okay if I just leave out the cases with bytes as input? I can't seem to trigger the warning locally, even with ./python -Werror -m test.

picnixz · 2025-06-02T08:49:15Z

Doc/library/re.rst

+.. method:: Match.index(value, start=0, stop=sys.maxsize, /)
+
+   Return the index of the first occurrence of the value among the matched groups.
+
+   Raises :exc:`ValueError` if the value is not present.
+
+   .. versionadded:: next
+
+.. method:: Match.count(value, /)
+
+   Return the number of occurrences of the value among the matched groups.
+
+   .. versionadded:: next


Suggested change

.. method:: Match.index(value, start=0, stop=sys.maxsize, /)

Return the index of the first occurrence of the value among the matched groups.

Raises :exc:`ValueError` if the value is not present.

.. versionadded:: next

.. method:: Match.count(value, /)

Return the number of occurrences of the value among the matched groups.

.. versionadded:: next

.. method:: Match.index(value, start=0, stop=sys.maxsize, /)

Return the index of the first occurrence of the value among the matched groups.

Raises :exc:`ValueError` if the value is not present.

.. versionadded:: next

.. method:: Match.count(value, /)

Return the number of occurrences of the value among the matched groups.

.. versionadded:: next

picnixz · 2025-06-02T08:54:22Z

Modules/_sre/sre.c

+
+    if (index < 0 || index >= self->groups) {
+        /* raise IndexError if we were given a bad group number */
+        PyErr_SetString(PyExc_IndexError, "no such group");
+        return NULL;
+    }
+
+    return match_getslice_by_index(self, index, Py_None);


Suggested change

if (index < 0 || index >= self->groups) {

/* raise IndexError if we were given a bad group number */

PyErr_SetString(PyExc_IndexError, "no such group");

return NULL;

}

return match_getslice_by_index(self, index, Py_None);

if (index < 0 || index >= self->groups) {

PyErr_SetString(PyExc_IndexError, "no such group");

return NULL;

}

return match_getslice_by_index(self, index, Py_None);

picnixz · 2025-06-02T08:54:51Z

Modules/_sre/sre.c

 static PyObject*
-match_getitem(PyObject *op, PyObject* name)
+match_item(PyObject *op, Py_ssize_t index)


Let's name this one match_sq_item maybe. After coming back to this PR, I got confused with match_getitem. Or rename match_getitem as match_subscript. I previously said not to change the name for match_getitem, but I think it's better if we do it anyway.

Yeah I agree. I reverted match_getitem back to my initial change match_subscript to make things clearer.

picnixz · 2025-06-02T09:08:33Z

Modules/_sre/sre.c

+            PyObject* index = PyDict_GetItemWithError(self->pattern->groupindex, item);
+            if (index && PyLong_Check(index)) {
+                Py_ssize_t i = PyLong_AsSsize_t(index);
+                if (!PyErr_Occurred()) {


Suggested change

if (!PyErr_Occurred()) {

if (i != -1 || !PyErr_Occurred()) {

picnixz · 2025-06-02T09:11:43Z

Modules/_sre/sre.c

+                          Py_ssize_t start, Py_ssize_t stop)
+/*[clinic end generated code: output=846597f6f96f829c input=7f41b5a99e0ad88e]*/
+{
+    PySlice_AdjustIndices(self->groups, &start, &stop, 1);


Suggested change

PySlice_AdjustIndices(self->groups, &start, &stop, 1);

(void)PySlice_AdjustIndices(self->groups, &start, &stop, 1);

picnixz · 2025-06-02T09:19:16Z

Lib/test/test_re.py

+        with self.assertRaises(StopIteration):
+            next(it)


Suggested change

with self.assertRaises(StopIteration):

next(it)

self.assertRaises(StopIteration, next, it)

picnixz · 2025-06-02T09:19:30Z

Lib/test/test_re.py

+        m = re.match(r"(a)(b)(c)", "abc")
+        it = iter(m)


Suggested change

m = re.match(r"(a)(b)(c)", "abc")

it = iter(m)

it = iter(re.match(r"(a)(b)(c)", "abc"))

picnixz · 2025-06-02T09:19:52Z

Lib/test/test_re.py

+        self.assertRaises(ValueError, m.index, "abc", 1)
+        self.assertEqual(m.index("a", 1), 1)
+        self.assertEqual(m.index("b", 1), 2)
+        self.assertEqual(m.index("c", 1), 3)
+        self.assertRaises(ValueError, m.index, "123", 1)


Suggested change

self.assertRaises(ValueError, m.index, "abc", 1)

self.assertEqual(m.index("a", 1), 1)

self.assertEqual(m.index("b", 1), 2)

self.assertEqual(m.index("c", 1), 3)

self.assertRaises(ValueError, m.index, "123", 1)

self.assertEqual(m.index("a", 1), 1)

self.assertEqual(m.index("b", 1), 2)

self.assertEqual(m.index("c", 1), 3)

self.assertRaises(ValueError, m.index, "abc", 1)

self.assertRaises(ValueError, m.index, "123", 1)

I moved the other assertRaises together as well

picnixz · 2025-06-02T09:20:37Z

Lib/test/test_re.py

+            next(it)
+
+    def test_match_index(self):
+        m = re.match(r"(a)(b)(c)", "abc")


Let's have a pattern where m.index(x, start) != start and m.index(s) == 1 (duplicated matches)

picnixz · 2025-06-02T09:21:07Z

Lib/test/test_re.py

+                self.assertEqual(day, "07")
+            case _:
+                self.fail()
+


Suggested change

AA-Turner · 2025-06-02T22:43:36Z

I'm not sure consensus exists for this on either the issue or the linked Discourse thread...

A

vberlier · 2025-06-03T00:50:50Z

It's a bit quiet, but I guess there's not much to talk about. Serhiy Storchaka brought up a valuable point about the fact that back in the day there was some debate over the possible semantics of len() and unpacking. But with the __getitem__ implementation that was introduced 10 years ago, there's actually not much wiggle room for creative interpretations of what it would mean for re.Match to be a proper Sequence type.

This PR is nothing more than painting by numbers. The API was just waiting for someone to spend a couple of hours to fill in the blanks. So it's not particularly exciting, and I doubt the topic will attract a lot of attention on Discourse, but personally I think this is the kind of polish and attention to detail that makes Python feel like home. And it looks like some other people and a core dev are on board with it too. That's enough for me to get started. Plus, working code can help ground discussions.

That said, I'm still new to CPython's contributing process, so let me know if I should've done something differently.

bedevere-app bot mentioned this pull request May 7, 2025

Make re.Match a well-rounded Sequence type #133546

Open

bedevere-app bot added the awaiting review label May 7, 2025

Make re.Match a well-rounded Sequence type

74480a7

vberlier force-pushed the gh-133546 branch from ac95ba2 to 74480a7 Compare May 7, 2025 03:02

JelleZijlstra reviewed May 7, 2025

View reviewed changes

vberlier force-pushed the gh-133546 branch from 01288ce to 497c42a Compare May 7, 2025 05:23

Implement missing index and count methods

a3de846

vberlier force-pushed the gh-133546 branch from 497c42a to a3de846 Compare May 8, 2025 11:08

ZeroIntensity reviewed May 11, 2025

View reviewed changes

Modules/_sre/sre.c Outdated Show resolved Hide resolved

Modules/_sre/sre.c Show resolved Hide resolved

Modules/_sre/sre.c Outdated Show resolved Hide resolved

Modules/_sre/sre.c Show resolved Hide resolved

vberlier added 6 commits May 11, 2025 20:15

Don't check for PyErr_Occurred

603b1d1

Add missing braces

70b73e4

Fix link to ValueError

f51ef45

Rewrite match_subscript to support negative indexing and slicing

f218828

Update ACKS

5272141

Add news entry

d0aa6fa

picnixz reviewed May 15, 2025

View reviewed changes

Lib/test/test_re.py Outdated Show resolved Hide resolved

Modules/_sre/sre.c Outdated Show resolved Hide resolved

Modules/_sre/sre.c Outdated Show resolved Hide resolved

Address feedback from @picnixz

5f67be0

vberlier force-pushed the gh-133546 branch from ccdbbbb to 5f67be0 Compare May 15, 2025 23:57

vberlier added 3 commits May 16, 2025 02:21

Move slicing test cases into its own test function

17feaa6

Fix error checking

fe709f8

Use else if

4095b52

vberlier requested a review from picnixz May 20, 2025 17:29

picnixz reviewed Jun 2, 2025

View reviewed changes

Address next round of comments

51d918d

	PySlice_AdjustIndices(self->groups, &start, &stop, 1);
	(void)PySlice_AdjustIndices(self->groups, &start, &stop, 1);

	with self.assertRaises(StopIteration):
	next(it)
	self.assertRaises(StopIteration, next, it)

	m = re.match(r"(a)(b)(c)", "abc")
	it = iter(m)
	it = iter(re.match(r"(a)(b)(c)", "abc"))

Uh oh!

gh-133546: Make re.Match a well-rounded Sequence type #133549

Are you sure you want to change the base?

gh-133546: Make re.Match a well-rounded Sequence type #133549

Uh oh!

Conversation

vberlier commented May 7, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

picnixz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

picnixz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vberlier commented May 16, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AA-Turner commented Jun 2, 2025

Uh oh!

vberlier commented Jun 3, 2025

Uh oh!

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

gh-133546: Make `re.Match` a well-rounded `Sequence` type #133549

gh-133546: Make `re.Match` a well-rounded `Sequence` type #133549

vberlier commented May 7, 2025 •

edited by github-actions bot

Loading