-
-
Notifications
You must be signed in to change notification settings - Fork 9.7k
[OptionsResolver] Optimize splitOutsideParenthesis() - 5.9x faster #61239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OptionsResolver] Optimize splitOutsideParenthesis() - 5.9x faster #61239
Conversation
This comment has been minimized.
This comment has been minimized.
1312319
to
e01b177
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance improvements are refactors, which means this should target 7.4. 7.3 being already released, the branch receives bug fixes only.
Also, would you mind sharing the test set you used to run benchmarks? In addition to the methodology used as explained in the description, having the opportunity to see the benchmark code would be a plus!
e01b177
to
1591e54
Compare
Hey @bendavies thanks for this optimization proposal! For performance refactoring PRs, we usually target the latest dev branch, in this case, 7.4 |
1591e54
to
c135a89
Compare
# Test individual optimizations
hyperfine --warmup 1 --runs 10 \
--reference 'php test_original.php' \
'php test_opt1_fast_path_simple.php' \
'php test_opt2_fast_path_union.php' \
'php test_opt3_no_string_concat.php' \
'php test_opt4_switch_statement.php'
# Test combined optimization
hyperfine --warmup 1 --runs 10 \
--reference 'php test_original.php' \
'php test_optimized.php' test_original.php - Original implementation<?php
function splitOutsideParenthesis(string $type): array
{
$parts = [];
$currentPart = '';
$parenthesisLevel = 0;
$typeLength = \strlen($type);
for ($i = 0; $i < $typeLength; ++$i) {
$char = $type[$i];
if ('(' === $char) {
++$parenthesisLevel;
} elseif (')' === $char) {
--$parenthesisLevel;
}
if ('|' === $char && 0 === $parenthesisLevel) {
$parts[] = $currentPart;
$currentPart = '';
} else {
$currentPart .= $char;
}
}
if ('' !== $currentPart) {
$parts[] = $currentPart;
}
return $parts;
}
$testCases = [
'string',
'int',
'bool',
'array',
'string|int',
'string|int|bool',
'string|int|bool|array',
'string|(int|bool)',
'(string|int)|bool',
'string|(int|(bool|float))',
'(string|int)|(bool|float)',
'MyClass',
'string[]',
'int[]',
'\\Namespace\\Class',
'string|int|bool|array|object|resource|callable',
];
$iterations = 100000;
$start = microtime(true);
for ($i = 0; $i < $iterations; $i++) {
foreach ($testCases as $testCase) {
splitOutsideParenthesis($testCase);
}
}
$end = microtime(true);
$duration = ($end - $start) * 1000;
$peakMemory = memory_get_peak_usage(true) / 1024 / 1024;
echo "Original implementation: " . number_format($duration, 2) . "ms for " . ($iterations * count($testCases)) . " operations\n";
echo "Peak memory usage: " . number_format($peakMemory, 2) . " MB\n"; test_opt1_fast_path_simple.php - Optimization 1: Fast path for simple types<?php
function splitOutsideParenthesis(string $type): array
{
if (!\str_contains($type, '|')) {
return [$type];
}
$parts = [];
$currentPart = '';
$parenthesisLevel = 0;
$typeLength = \strlen($type);
for ($i = 0; $i < $typeLength; ++$i) {
$char = $type[$i];
if ('(' === $char) {
++$parenthesisLevel;
} elseif (')' === $char) {
--$parenthesisLevel;
}
if ('|' === $char && 0 === $parenthesisLevel) {
$parts[] = $currentPart;
$currentPart = '';
} else {
$currentPart .= $char;
}
}
if ('' !== $currentPart) {
$parts[] = $currentPart;
}
return $parts;
}
$testCases = [
'string',
'int',
'bool',
'array',
'string|int',
'string|int|bool',
'string|int|bool|array',
'string|(int|bool)',
'(string|int)|bool',
'string|(int|(bool|float))',
'(string|int)|(bool|float)',
'MyClass',
'string[]',
'int[]',
'\\Namespace\\Class',
'string|int|bool|array|object|resource|callable',
];
$iterations = 100000;
$start = microtime(true);
for ($i = 0; $i < $iterations; $i++) {
foreach ($testCases as $testCase) {
splitOutsideParenthesis($testCase);
}
}
$end = microtime(true);
$duration = ($end - $start) * 1000;
$peakMemory = memory_get_peak_usage(true) / 1024 / 1024;
echo "Optimization 1 (fast path simple): " . number_format($duration, 2) . "ms for " . ($iterations * count($testCases)) . " operations\n";
echo "Peak memory usage: " . number_format($peakMemory, 2) . " MB\n"; test_opt2_fast_path_union.php - Optimization 2: Fast path for union types<?php
function splitOutsideParenthesis(string $type): array
{
if (!\str_contains($type, '(') && !\str_contains($type, ')')) {
return \explode('|', $type);
}
$parts = [];
$currentPart = '';
$parenthesisLevel = 0;
$typeLength = \strlen($type);
for ($i = 0; $i < $typeLength; ++$i) {
$char = $type[$i];
if ('(' === $char) {
++$parenthesisLevel;
} elseif (')' === $char) {
--$parenthesisLevel;
}
if ('|' === $char && 0 === $parenthesisLevel) {
$parts[] = $currentPart;
$currentPart = '';
} else {
$currentPart .= $char;
}
}
if ('' !== $currentPart) {
$parts[] = $currentPart;
}
return $parts;
}
$testCases = [
'string',
'int',
'bool',
'array',
'string|int',
'string|int|bool',
'string|int|bool|array',
'string|(int|bool)',
'(string|int)|bool',
'string|(int|(bool|float))',
'(string|int)|(bool|float)',
'MyClass',
'string[]',
'int[]',
'\\Namespace\\Class',
'string|int|bool|array|object|resource|callable',
];
$iterations = 100000;
$start = microtime(true);
for ($i = 0; $i < $iterations; $i++) {
foreach ($testCases as $testCase) {
splitOutsideParenthesis($testCase);
}
}
$end = microtime(true);
$duration = ($end - $start) * 1000;
$peakMemory = memory_get_peak_usage(true) / 1024 / 1024;
echo "Optimization 2 (fast path union): " . number_format($duration, 2) . "ms for " . ($iterations * count($testCases)) . " operations\n";
echo "Peak memory usage: " . number_format($peakMemory, 2) . " MB\n"; test_opt3_no_string_concat.php - Optimization 3: Eliminate string concatenation<?php
function splitOutsideParenthesis(string $type): array
{
$parts = [];
$start = 0;
$parenthesisLevel = 0;
$length = \strlen($type);
for ($i = 0; $i < $length; ++$i) {
$char = $type[$i];
if ('(' === $char) {
++$parenthesisLevel;
} elseif (')' === $char) {
--$parenthesisLevel;
} elseif ('|' === $char && 0 === $parenthesisLevel) {
$parts[] = \substr($type, $start, $i - $start);
$start = $i + 1;
}
}
if ($start < $length) {
$parts[] = \substr($type, $start);
}
return $parts;
}
$testCases = [
'string',
'int',
'bool',
'array',
'string|int',
'string|int|bool',
'string|int|bool|array',
'string|(int|bool)',
'(string|int)|bool',
'string|(int|(bool|float))',
'(string|int)|(bool|float)',
'MyClass',
'string[]',
'int[]',
'\\Namespace\\Class',
'string|int|bool|array|object|resource|callable',
];
$iterations = 100000;
$start = microtime(true);
for ($i = 0; $i < $iterations; $i++) {
foreach ($testCases as $testCase) {
splitOutsideParenthesis($testCase);
}
}
$end = microtime(true);
$duration = ($end - $start) * 1000;
$peakMemory = memory_get_peak_usage(true) / 1024 / 1024;
echo "Optimization 3 (no string concat): " . number_format($duration, 2) . "ms for " . ($iterations * count($testCases)) . " operations\n";
echo "Peak memory usage: " . number_format($peakMemory, 2) . " MB\n"; test_opt4_switch_statement.php - Optimization 4: Switch statement<?php
function splitOutsideParenthesis(string $type): array
{
$parts = [];
$currentPart = '';
$parenthesisLevel = 0;
$typeLength = \strlen($type);
for ($i = 0; $i < $typeLength; ++$i) {
$char = $type[$i];
switch ($char) {
case '(':
++$parenthesisLevel;
$currentPart .= $char;
break;
case ')':
--$parenthesisLevel;
$currentPart .= $char;
break;
case '|':
if (0 === $parenthesisLevel) {
$parts[] = $currentPart;
$currentPart = '';
} else {
$currentPart .= $char;
}
break;
default:
$currentPart .= $char;
break;
}
}
if ('' !== $currentPart) {
$parts[] = $currentPart;
}
return $parts;
}
$testCases = [
'string',
'int',
'bool',
'array',
'string|int',
'string|int|bool',
'string|int|bool|array',
'string|(int|bool)',
'(string|int)|bool',
'string|(int|(bool|float))',
'(string|int)|(bool|float)',
'MyClass',
'string[]',
'int[]',
'\\Namespace\\Class',
'string|int|bool|array|object|resource|callable',
];
$iterations = 100000;
$start = microtime(true);
for ($i = 0; $i < $iterations; $i++) {
foreach ($testCases as $testCase) {
splitOutsideParenthesis($testCase);
}
}
$end = microtime(true);
$duration = ($end - $start) * 1000;
$peakMemory = memory_get_peak_usage(true) / 1024 / 1024;
echo "Optimization 4 (switch statement): " . number_format($duration, 2) . "ms for " . ($iterations * count($testCases)) . " operations\n";
echo "Peak memory usage: " . number_format($peakMemory, 2) . " MB\n"; test_optimized.php - Final optimized implementation (all optimizations combined)<?php
function splitOutsideParenthesis(string $type): array
{
if (!\str_contains($type, '|')) {
return [$type];
}
if (!\str_contains($type, '(') && !\str_contains($type, ')')) {
return \explode('|', $type);
}
$parts = [];
$start = 0;
$parenthesisLevel = 0;
$length = \strlen($type);
for ($i = 0; $i < $length; ++$i) {
$char = $type[$i];
switch ($char) {
case '(':
++$parenthesisLevel;
break;
case ')':
--$parenthesisLevel;
break;
case '|':
if (0 === $parenthesisLevel) {
$parts[] = \substr($type, $start, $i - $start);
$start = $i + 1;
}
break;
}
}
if ($start < $length) {
$parts[] = \substr($type, $start);
}
return $parts;
}
$testCases = [
'string',
'int',
'bool',
'array',
'string|int',
'string|int|bool',
'string|int|bool|array',
'string|(int|bool)',
'(string|int)|bool',
'string|(int|(bool|float))',
'(string|int)|(bool|float)',
'MyClass',
'string[]',
'int[]',
'\\Namespace\\Class',
'string|int|bool|array|object|resource|callable',
];
$iterations = 100000;
$start = microtime(true);
for ($i = 0; $i < $iterations; $i++) {
foreach ($testCases as $testCase) {
splitOutsideParenthesis($testCase);
}
}
$end = microtime(true);
$duration = ($end - $start) * 1000;
$peakMemory = memory_get_peak_usage(true) / 1024 / 1024;
echo "Optimized implementation: " . number_format($duration, 2) . "ms for " . ($iterations * count($testCases)) . " operations\n";
echo "Peak memory usage: " . number_format($peakMemory, 2) . " MB\n"; |
i investigated @dunglas suggestion to use regex.
|
@dunglas pure regex solution added. reproducing commit message here for completeness:
|
| # OR | ||
\| # Match the pipe delimiter. This will only be matched if it was not inside a skipped group. | ||
/x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you try using a recursive regexp? this will remove the current limitation of parsing only the first 2 levels of nesting:
/
# Match a recursively balanced parenthetical group, then skip it
\( # Match an opening parenthesis
(?: # Start a non-capturing group for the contents
[^()] # Match any character that is not a parenthesis
| # OR
(?R) # Recurse the entire pattern to match a nested group
)* # Repeat the group for all contents
\) # Match the final closing parenthesis
(*SKIP)(*FAIL) # Discard the match and find the next one
| # OR
\| # Match the pipe delimiter (only if not inside a skipped group)
/x
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this didnt work because i think (?R) is trying to match the entire pattern (including the (*SKIP)(*FAIL) part i think?
fixed it by defining a recursive subroutine.
PR title and body would need an update then for the 5. optimization |
2048814
to
3aa0c9d
Compare
Got a bit scared after @nicolas-grekas pointed out that my regex only worked for 2 levels, so added a bunch of regression tests a first commit. then added a new recursive regex which supports unlimited depth while maintaining the performance improvement. |
3aa0c9d
to
b568fef
Compare
Thank you @bendavies. |
@nicolas-grekas sheesh that was fast 😆. I didn't have time to update the PR body |
You can still do it after the merge ;) |
This PR optimises the
splitOutsideParenthesis
method inOptionsResolver.php
, achieving a 5.9x performance improvement.I discovered this method as a performance hotspot while benchmarking a large Symfony form with many fields. Profiling revealed that
splitOutsideParenthesis
was consuming a significant portion of the form processing time.The
splitOutsideParenthesis
method (introduced in PR #59354) is called frequently during options resolution and has several performance bottlenecks:Test Methodology
Here's how all performance measurements were conducted:
string
,int
,bool
,array
string|int
,string|int|bool
,string|int|bool|array
string|(int|bool)
,(string|int)|bool
string|(int|(bool|float))
,(string|int)|(bool|float)
string[]
,int[]
MyClass
,\\Namespace\\Class
string|int|bool|array|object|resource|callable
Each optimisation was tested in isolation to measure its individual impact, then all optimisations were combined for the final benchmark.
Optimisations
1. Fast Path for Simple Types (No Pipes)
Most type declarations are simple types like
string
,int
,MyClass
, etc. without any union types.Implementation:
2. Fast Path for Union Types (No Parentheses)
Common union types like
string|int|bool
don't need complex parsing - PHP'sexplode()
is much faster.Implementation:
3. Eliminate String Concatenation
String concatenation in loops creates memory overhead. Using
substr()
avoids creating intermediate strings.Implementation:
4. Switch Statement Optimisation
Eliminates Multiple conditional checks per character.
Implementation:
Benchmark Results
Individual Optimisation Impact
Testing each optimisation in isolation:
Combined Optimisation Impact
Combining all optimisations:
Further Optimization: Regex Solution
After achieving the 2.91x improvement with manual optimizations, I explored regex-based solutions for even better performance as suggested by @dunglas.
The challenge was maintaining correctness for deeply nested union types.
Final Solution: Recursive Regex
The solution uses PCRE's
(?(DEFINE)
syntax to create a recursive pattern supporting unlimited nesting:Final Benchmark Results
Updated test data includes a complex deep nesting case:
complex|(nested|(types|(with|(deep|nesting))))|arrays[]|(more|(complex|types))