-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
[JsonPath] Handle special whitespaces in expressions #60699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 7.3
Are you sure you want to change the base?
[JsonPath] Handle special whitespaces in expressions #60699
Conversation
ba52060
to
9ceeb5f
Compare
@@ -163,13 +185,14 @@ private function evaluateBracket(string $expr, mixed $value): array | |||
return $result; | |||
} | |||
|
|||
// start, end and step | |||
if (preg_match('/^(-?\d*):(-?\d*)(?::(-?\d+))?$/', $expr, $matches)) { | |||
if (preg_match('/^(-?\d*)\s*:\s*(-?\d*)(?:\s*:\s*(-?\d+))?$/', $expr, $matches)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest changing all those *
to *+
.
Due to the structure of the regex, using possessive quantifiers will match exactly the same, but it will avoid useless backtracking for non-matching strings (a very smart regex compiler might be able to automatically make them possessive as an optimization, but I'm not sure PCRE is doing it)
@@ -211,8 +234,8 @@ private function evaluateBracket(string $expr, mixed $value): array | |||
} | |||
|
|||
// filter expressions | |||
if (preg_match('/^\?(.*)$/', $expr, $matches)) { | |||
$filterExpr = $matches[1]; | |||
if (preg_match('/^\?\s*(.*)$/', $expr, $matches)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (preg_match('/^\?\s*(.*)$/', $expr, $matches)) { | |
if (preg_match('/^\?\s*+(.*)$/', $expr, $matches)) { |
In this case, we want the possessive quantifier to ensure that all spaces are always consumed by the \s*
part, as we have 2 consecutive quantifiers that could consume them, which is the typical ReDoS pattern.
Alternatively, let spaces be consumed by the (.*)
(as done before) and be cleaned by the trimming on next line.
@@ -346,7 +373,7 @@ private function evaluateScalar(string $expr, array $context): mixed | |||
} | |||
|
|||
// function calls | |||
if (preg_match('/^(\w+)\((.*)\)$/', $expr, $matches)) { | |||
if (preg_match('/^(\w+)\s*\(\s*(.*)\s*\)$/', $expr, $matches)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
\s*(.*)\s*\
will be a ReDoS pattern for non-matching strings (and it does not even ensure the last \s*
consumes the trailing spaces).
It would be simpler to let the (.*)
match all the content between the braces and to trim $matches[2]
before using it.
9ceeb5f
to
9e73c21
Compare
Thank you @stof, I didn't know that denial of service was actually a thing with regex. I updated accordingly. |
@alexandre-daubois this can be a thing when you allow user input for the string being matched by the regex (which could totally happen in this component). Backtracking engines (like PCRE) have an exponential complexity based on the length of the input when attempting to match an affected regex (and failing to match it, as this is the worse case of backtracking). This is commonly reported in the npm ecosystem (also because JS does not support possessive quantifiers in its Regexp syntax, and so cannot apply the easy fix to prevent them in many cases, making the issue more common) |
9e73c21
to
2c412e1
Compare
2c412e1
to
5beab82
Compare
Applied your suggestions, it makes the code a bit simpler. Thanks! |
"\r" => ' ', | ||
]); | ||
|
||
return trim(preg_replace('/\s+/', ' ', $normalized)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering whether this might replace consecutive spaces inside quoted string literals, changing their meaning. Do we need to replace those consecutive spaces ? trimming will remove multiple spaces if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, I updated the method and added testMatchFunctionWithMultipleSpacesTrimmed
to ensure we don't trim meaningful blank spaces
5beab82
to
ab0853a
Compare
Allows using chars like
\n
or\t
in expression and treat them as normal spaces, as defined in https://datatracker.ietf.org/doc/rfc9535/, section 2.1.1. It is possible to write filters on multiple lines: