Skip to content

re: Add support for start- and endpos. #14179

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

greezybacon
Copy link
Contributor

This adds support to match the CPython re module which supports searching and matching a substring using a start- and end-pos specified in the corresponding Pattern method.

Pattern objects have two additional parameters for the ::search and ::match methods to define the starting and ending position of the subject within the string to be searched.

This allows for searching a sub-string without creating a slice of the string, which is advantageous for performance and also makes inroads for something like the finditer method.

However, one caveat of using the start-pos rather than a slice is that the start anchor (^) remains anchored to the beginning of the text.

Copy link

github-actions bot commented Mar 26, 2024

Code size report:

   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:  +112 +0.013% standard
      stm32:   +80 +0.020% PYBV10
     mimxrt:   +80 +0.021% TEENSY40
        rp2:   +72 +0.008% RPI_PICO_W
       samd:   +72 +0.027% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:   +80 +0.018% VIRT_RV32

Copy link

codecov bot commented Mar 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.44%. Comparing base (c72a3e5) to head (d2813a1).
Report is 8 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #14179   +/-   ##
=======================================
  Coverage   98.44%   98.44%           
=======================================
  Files         171      171           
  Lines       22208    22220   +12     
=======================================
+ Hits        21863    21875   +12     
  Misses        345      345           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@stinos
Copy link
Contributor

stinos commented Mar 26, 2024

This adds support to match the CPython re module which supports searching and matching a substring using a start- and end-pos specified in the corresponding Pattern method

Just to be clear: 'match' doesn't mean it's CPython-compatible (and hence the use of a .exp file for the tests)? In that case this feature feels somewhat confusing and hard to discover, and will likely remain underused?

@greezybacon
Copy link
Contributor Author

greezybacon commented Mar 26, 2024 via email

@greezybacon
Copy link
Contributor Author

I was a bit excited and went ahead and added a finditer implementation as well. It's also aimed at compatibility with CPython's re module. If it should remain, I'd like a little direction on the #define to include support for it. Do we need it? Did I get it in the right place?

@greezybacon greezybacon force-pushed the feature/re-start-end-pos branch from b191a01 to 8c28d63 Compare March 27, 2024 03:08
}
else if (endpos < startpos) {
endpos = startpos;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test case that tests this line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @dpgeorge, I did.

@dpgeorge dpgeorge added the extmod Relates to extmod/ directory in source label Apr 19, 2024
@greezybacon greezybacon force-pushed the feature/re-start-end-pos branch from 5ad16c3 to cf778ac Compare July 18, 2025 14:28
@dpgeorge
Copy link
Member

Sorry, this PR got a bit lost...

I was a bit excited and went ahead and added a finditer implementation as well.

OK, so that increases the scope somewhat of this PR 😅

It would be good to know what the code size impact is for just the start/endpos change. To that end, probably best to keep this PR just for start/endpos (including tests for that), and have a separate PR for finditer. That makes it easier to review.

@greezybacon greezybacon force-pushed the feature/re-start-end-pos branch 2 times, most recently from e602699 to dbe3f83 Compare July 19, 2025 02:07
Pattern objects have two additional parameters for the ::search and
::match methods to define the starting and ending position of the
subject within the string to be searched.

This allows for searching a sub-string without creating a slice.
However, one caveat of using the start-pos rather than a slice is that
the start anchor (`^`) remains anchored to the beginning of the text.

Signed-off-by: Jared Hancock <jared@greezybacon.me>
@greezybacon greezybacon force-pushed the feature/re-start-end-pos branch from dbe3f83 to d2813a1 Compare July 19, 2025 02:11
@greezybacon
Copy link
Contributor Author

Thanks, @dpgeorge. I really appreciate the review and feedback. I have separated the finditer method into a separate branch and will propose it as a separate PR if this gets merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extmod Relates to extmod/ directory in source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy