Skip to content

[JsonPath] Handle special whitespaces in expressions #60699

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: 7.3
Choose a base branch
from

Conversation

alexandre-daubois
Copy link
Member

@alexandre-daubois alexandre-daubois commented Jun 5, 2025

Q A
Branch? 7.3
Bug fix? yes
New feature? no
Deprecations? no
Issues -
License MIT

Allows using chars like \n or \t in expression and treat them as normal spaces, as defined in https://datatracker.ietf.org/doc/rfc9535/, section 2.1.1. It is possible to write filters on multiple lines:

$
.store
.book[?
    length(@.author) > 12
    ||
    length(@.author) < 5
]

@carsonbot carsonbot added this to the 7.3 milestone Jun 5, 2025
@alexandre-daubois alexandre-daubois changed the title [JsonPath] Handle special whitespaces in filters [JsonPath] Handle special whitespaces in expressions Jun 5, 2025
@@ -163,13 +185,14 @@ private function evaluateBracket(string $expr, mixed $value): array
return $result;
}

// start, end and step
if (preg_match('/^(-?\d*):(-?\d*)(?::(-?\d+))?$/', $expr, $matches)) {
if (preg_match('/^(-?\d*)\s*:\s*(-?\d*)(?:\s*:\s*(-?\d+))?$/', $expr, $matches)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest changing all those * to *+.
Due to the structure of the regex, using possessive quantifiers will match exactly the same, but it will avoid useless backtracking for non-matching strings (a very smart regex compiler might be able to automatically make them possessive as an optimization, but I'm not sure PCRE is doing it)

@@ -211,8 +234,8 @@ private function evaluateBracket(string $expr, mixed $value): array
}

// filter expressions
if (preg_match('/^\?(.*)$/', $expr, $matches)) {
$filterExpr = $matches[1];
if (preg_match('/^\?\s*(.*)$/', $expr, $matches)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (preg_match('/^\?\s*(.*)$/', $expr, $matches)) {
if (preg_match('/^\?\s*+(.*)$/', $expr, $matches)) {

In this case, we want the possessive quantifier to ensure that all spaces are always consumed by the \s* part, as we have 2 consecutive quantifiers that could consume them, which is the typical ReDoS pattern.
Alternatively, let spaces be consumed by the (.*) (as done before) and be cleaned by the trimming on next line.

@@ -346,7 +373,7 @@ private function evaluateScalar(string $expr, array $context): mixed
}

// function calls
if (preg_match('/^(\w+)\((.*)\)$/', $expr, $matches)) {
if (preg_match('/^(\w+)\s*\(\s*(.*)\s*\)$/', $expr, $matches)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\s*(.*)\s*\ will be a ReDoS pattern for non-matching strings (and it does not even ensure the last \s* consumes the trailing spaces).

It would be simpler to let the (.*) match all the content between the braces and to trim $matches[2] before using it.

@alexandre-daubois
Copy link
Member Author

Thank you @stof, I didn't know that denial of service was actually a thing with regex. I updated accordingly.

@stof
Copy link
Member

stof commented Jun 5, 2025

@alexandre-daubois this can be a thing when you allow user input for the string being matched by the regex (which could totally happen in this component). Backtracking engines (like PCRE) have an exponential complexity based on the length of the input when attempting to match an affected regex (and failing to match it, as this is the worse case of backtracking).

This is commonly reported in the npm ecosystem (also because JS does not support possessive quantifiers in its Regexp syntax, and so cannot apply the easy fix to prevent them in many cases, making the issue more common)

@alexandre-daubois
Copy link
Member Author

Applied your suggestions, it makes the code a bit simpler. Thanks!

"\r" => ' ',
]);

return trim(preg_replace('/\s+/', ' ', $normalized));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether this might replace consecutive spaces inside quoted string literals, changing their meaning. Do we need to replace those consecutive spaces ? trimming will remove multiple spaces if needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, I updated the method and added testMatchFunctionWithMultipleSpacesTrimmed to ensure we don't trim meaningful blank spaces

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy