Skip to content

[OptionsResolver] Optimize splitOutsideParenthesis() - 5.9x faster #61239

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

bendavies
Copy link
Contributor

@bendavies bendavies commented Jul 25, 2025

Q A
Branch? 7.4
Bug fix? no
New feature? no
Deprecations? no
Issues Fix #59354
License MIT

This PR optimises the splitOutsideParenthesis method in OptionsResolver.php, achieving a 5.9x performance improvement.

I discovered this method as a performance hotspot while benchmarking a large Symfony form with many fields. Profiling revealed that splitOutsideParenthesis was consuming a significant portion of the form processing time.

The splitOutsideParenthesis method (introduced in PR #59354) is called frequently during options resolution and has several performance bottlenecks:

  1. Character-by-character string concatenation creates new string objects on each iteration (particularly inefficient in PHP due to copy-on-write behavior)
  2. All input strings are processed the same way, regardless of complexity - no fast path for simple types
  3. Multiple conditional checks per character

Test Methodology

Here's how all performance measurements were conducted:

  • Benchmark tool: hyperfine (10 runs with 1 warmup run)
  • Test iterations: 100,000 iterations per test case
  • Test data: 16 different type patterns:
    • Simple types: string, int, bool, array
    • Union types: string|int, string|int|bool, string|int|bool|array
    • Parentheses types: string|(int|bool), (string|int)|bool
    • Nested types: string|(int|(bool|float)), (string|int)|(bool|float)
    • Array types: string[], int[]
    • Class types: MyClass, \\Namespace\\Class
    • Complex union: string|int|bool|array|object|resource|callable

Each optimisation was tested in isolation to measure its individual impact, then all optimisations were combined for the final benchmark.

Optimisations

1. Fast Path for Simple Types (No Pipes)

Most type declarations are simple types like string, int, MyClass, etc. without any union types.

Implementation:

if (!\str_contains($type, '|')) {
    return [$type];
}

2. Fast Path for Union Types (No Parentheses)

Common union types like string|int|bool don't need complex parsing - PHP's explode() is much faster.

Implementation:

if (!\str_contains($type, '(') && !\str_contains($type, ')')) {
    return \explode('|', $type);
}

3. Eliminate String Concatenation

String concatenation in loops creates memory overhead. Using substr() avoids creating intermediate strings.

Implementation:

// Instead of: $currentPart .= $char;
// Use: $parts[] = \substr($type, $start, $i - $start);

4. Switch Statement Optimisation

Eliminates Multiple conditional checks per character.

Implementation:

switch ($char) {
    case '(':
        ++$parenthesisLevel;
        break;
    case ')':
        --$parenthesisLevel;
        break;
    case '|':
        // ...
}

Benchmark Results

Individual Optimisation Impact

Testing each optimisation in isolation:

hyperfine --warmup 1 --runs 10 \
  --sort=command \
  --reference 'php test_original.php' \
  'php test_opt1_fast_path_simple.php' \
  'php test_opt2_fast_path_union.php' \
  'php test_opt3_no_string_concat.php'  \
  'php test_opt4_switch_statement.php'
Relative speed comparison
        1.00          php test_original.php
        1.23 ±  0.02  php test_opt1_fast_path_simple.php
        1.95 ±  0.04  php test_opt2_fast_path_union.php
        1.13 ±  0.03  php test_opt3_no_string_concat.php
        1.35 ±  0.03  php test_opt4_switch_statement.php

Combined Optimisation Impact

Combining all optimisations:

hyperfine --warmup 1 --runs 10 \
  --sort=command \
  --reference 'php test_original.php' \
  'php test_optimised.php'
Relative speed comparison
        1.00          php test_original.php
        2.91 ±  0.03  php test_optimised.php

Further Optimization: Regex Solution

After achieving the 2.91x improvement with manual optimizations, I explored regex-based solutions for even better performance as suggested by @dunglas.
The challenge was maintaining correctness for deeply nested union types.

Final Solution: Recursive Regex

The solution uses PCRE's (?(DEFINE) syntax to create a recursive pattern supporting unlimited nesting:

Final Benchmark Results

Updated test data includes a complex deep nesting case: complex|(nested|(types|(with|(deep|nesting))))|arrays[]|(more|(complex|types))

hyperfine --warmup 1 --runs 10 --reference 'php test_original.php' \
  'php test_optimized.php' 'php test_recursive_final.php'
Summary
  php test_original.php ran
    5.85 ± 0.11 times slower than php test_recursive_final.php
    2.57 ± 0.12 times slower than php test_optimized.php

@carsonbot

This comment has been minimized.

@bendavies bendavies force-pushed the optionresolver-splitOutsideParenthesis-performance branch from 1312319 to e01b177 Compare July 25, 2025 16:10
@bendavies bendavies changed the title [OptionsResolver] Optimize splitOutsideParenthesis() - 2.90x faster [OptionsResolver] Optimize splitOutsideParenthesis() - 2.91x faster Jul 25, 2025
@bendavies bendavies changed the base branch from 7.4 to 7.3 July 25, 2025 16:11
Copy link
Member

@alexandre-daubois alexandre-daubois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance improvements are refactors, which means this should target 7.4. 7.3 being already released, the branch receives bug fixes only.

Also, would you mind sharing the test set you used to run benchmarks? In addition to the methodology used as explained in the description, having the opportunity to see the benchmark code would be a plus!

@bendavies bendavies force-pushed the optionresolver-splitOutsideParenthesis-performance branch from e01b177 to 1591e54 Compare July 25, 2025 16:20
@yceruto
Copy link
Member

yceruto commented Jul 25, 2025

Hey @bendavies thanks for this optimization proposal! For performance refactoring PRs, we usually target the latest dev branch, in this case, 7.4

@bendavies bendavies force-pushed the optionresolver-splitOutsideParenthesis-performance branch from 1591e54 to c135a89 Compare July 25, 2025 16:26
@bendavies bendavies changed the base branch from 7.3 to 7.4 July 25, 2025 16:26
@bendavies
Copy link
Contributor Author

bendavies commented Jul 25, 2025

Performance improvements are refactors, which means this should target 7.4. 7.3 being already released, the branch receives bug fixes only.

Also, would you mind sharing the test set you used to run benchmarks? In addition to the methodology used as explained in the description, having the opportunity to see the benchmark code would be a plus!

# Test individual optimizations
hyperfine --warmup 1 --runs 10 \
  --reference 'php test_original.php' \
  'php test_opt1_fast_path_simple.php' \
  'php test_opt2_fast_path_union.php' \
  'php test_opt3_no_string_concat.php' \
  'php test_opt4_switch_statement.php'

# Test combined optimization
hyperfine --warmup 1 --runs 10 \
  --reference 'php test_original.php' \
  'php test_optimized.php'
test_original.php - Original implementation
<?php

function splitOutsideParenthesis(string $type): array
{
    $parts = [];
    $currentPart = '';
    $parenthesisLevel = 0;

    $typeLength = \strlen($type);
    for ($i = 0; $i < $typeLength; ++$i) {
        $char = $type[$i];

        if ('(' === $char) {
            ++$parenthesisLevel;
        } elseif (')' === $char) {
            --$parenthesisLevel;
        }

        if ('|' === $char && 0 === $parenthesisLevel) {
            $parts[] = $currentPart;
            $currentPart = '';
        } else {
            $currentPart .= $char;
        }
    }

    if ('' !== $currentPart) {
        $parts[] = $currentPart;
    }

    return $parts;
}

$testCases = [
    'string',
    'int',
    'bool',
    'array',
    'string|int',
    'string|int|bool',
    'string|int|bool|array',
    'string|(int|bool)',
    '(string|int)|bool',
    'string|(int|(bool|float))',
    '(string|int)|(bool|float)',
    'MyClass',
    'string[]',
    'int[]',
    '\\Namespace\\Class',
    'string|int|bool|array|object|resource|callable',
];

$iterations = 100000;

$start = microtime(true);

for ($i = 0; $i < $iterations; $i++) {
    foreach ($testCases as $testCase) {
        splitOutsideParenthesis($testCase);
    }
}

$end = microtime(true);
$duration = ($end - $start) * 1000;

$peakMemory = memory_get_peak_usage(true) / 1024 / 1024;

echo "Original implementation: " . number_format($duration, 2) . "ms for " . ($iterations * count($testCases)) . " operations\n";
echo "Peak memory usage: " . number_format($peakMemory, 2) . " MB\n";
test_opt1_fast_path_simple.php - Optimization 1: Fast path for simple types
<?php

function splitOutsideParenthesis(string $type): array
{
    if (!\str_contains($type, '|')) {
        return [$type];
    }
    
    $parts = [];
    $currentPart = '';
    $parenthesisLevel = 0;

    $typeLength = \strlen($type);
    for ($i = 0; $i < $typeLength; ++$i) {
        $char = $type[$i];

        if ('(' === $char) {
            ++$parenthesisLevel;
        } elseif (')' === $char) {
            --$parenthesisLevel;
        }

        if ('|' === $char && 0 === $parenthesisLevel) {
            $parts[] = $currentPart;
            $currentPart = '';
        } else {
            $currentPart .= $char;
        }
    }

    if ('' !== $currentPart) {
        $parts[] = $currentPart;
    }

    return $parts;
}

$testCases = [
    'string',
    'int',
    'bool',
    'array',
    'string|int',
    'string|int|bool',
    'string|int|bool|array',
    'string|(int|bool)',
    '(string|int)|bool',
    'string|(int|(bool|float))',
    '(string|int)|(bool|float)',
    'MyClass',
    'string[]',
    'int[]',
    '\\Namespace\\Class',
    'string|int|bool|array|object|resource|callable',
];

$iterations = 100000;

$start = microtime(true);

for ($i = 0; $i < $iterations; $i++) {
    foreach ($testCases as $testCase) {
        splitOutsideParenthesis($testCase);
    }
}

$end = microtime(true);
$duration = ($end - $start) * 1000;

$peakMemory = memory_get_peak_usage(true) / 1024 / 1024;

echo "Optimization 1 (fast path simple): " . number_format($duration, 2) . "ms for " . ($iterations * count($testCases)) . " operations\n";
echo "Peak memory usage: " . number_format($peakMemory, 2) . " MB\n";
test_opt2_fast_path_union.php - Optimization 2: Fast path for union types
<?php

function splitOutsideParenthesis(string $type): array
{
    if (!\str_contains($type, '(') && !\str_contains($type, ')')) {
        return \explode('|', $type);
    }

    $parts = [];
    $currentPart = '';
    $parenthesisLevel = 0;

    $typeLength = \strlen($type);
    for ($i = 0; $i < $typeLength; ++$i) {
        $char = $type[$i];

        if ('(' === $char) {
            ++$parenthesisLevel;
        } elseif (')' === $char) {
            --$parenthesisLevel;
        }

        if ('|' === $char && 0 === $parenthesisLevel) {
            $parts[] = $currentPart;
            $currentPart = '';
        } else {
            $currentPart .= $char;
        }
    }

    if ('' !== $currentPart) {
        $parts[] = $currentPart;
    }

    return $parts;
}

$testCases = [
    'string',
    'int',
    'bool',
    'array',
    'string|int',
    'string|int|bool',
    'string|int|bool|array',
    'string|(int|bool)',
    '(string|int)|bool',
    'string|(int|(bool|float))',
    '(string|int)|(bool|float)',
    'MyClass',
    'string[]',
    'int[]',
    '\\Namespace\\Class',
    'string|int|bool|array|object|resource|callable',
];

$iterations = 100000;

$start = microtime(true);

for ($i = 0; $i < $iterations; $i++) {
    foreach ($testCases as $testCase) {
        splitOutsideParenthesis($testCase);
    }
}

$end = microtime(true);
$duration = ($end - $start) * 1000;

$peakMemory = memory_get_peak_usage(true) / 1024 / 1024;

echo "Optimization 2 (fast path union): " . number_format($duration, 2) . "ms for " . ($iterations * count($testCases)) . " operations\n";
echo "Peak memory usage: " . number_format($peakMemory, 2) . " MB\n";
test_opt3_no_string_concat.php - Optimization 3: Eliminate string concatenation
<?php

function splitOutsideParenthesis(string $type): array
{
    $parts = [];
    $start = 0;
    $parenthesisLevel = 0;
    $length = \strlen($type);
    
    for ($i = 0; $i < $length; ++$i) {
        $char = $type[$i];

        if ('(' === $char) {
            ++$parenthesisLevel;
        } elseif (')' === $char) {
            --$parenthesisLevel;
        } elseif ('|' === $char && 0 === $parenthesisLevel) {
            $parts[] = \substr($type, $start, $i - $start);
            $start = $i + 1;
        }
    }

    if ($start < $length) {
        $parts[] = \substr($type, $start);
    }
    
    return $parts;
}

$testCases = [
    'string',
    'int',
    'bool',
    'array',
    'string|int',
    'string|int|bool',
    'string|int|bool|array',
    'string|(int|bool)',
    '(string|int)|bool',
    'string|(int|(bool|float))',
    '(string|int)|(bool|float)',
    'MyClass',
    'string[]',
    'int[]',
    '\\Namespace\\Class',
    'string|int|bool|array|object|resource|callable',
];

$iterations = 100000;

$start = microtime(true);

for ($i = 0; $i < $iterations; $i++) {
    foreach ($testCases as $testCase) {
        splitOutsideParenthesis($testCase);
    }
}

$end = microtime(true);
$duration = ($end - $start) * 1000;

$peakMemory = memory_get_peak_usage(true) / 1024 / 1024;

echo "Optimization 3 (no string concat): " . number_format($duration, 2) . "ms for " . ($iterations * count($testCases)) . " operations\n";
echo "Peak memory usage: " . number_format($peakMemory, 2) . " MB\n";
test_opt4_switch_statement.php - Optimization 4: Switch statement
<?php

function splitOutsideParenthesis(string $type): array
{
    $parts = [];
    $currentPart = '';
    $parenthesisLevel = 0;

    $typeLength = \strlen($type);
    for ($i = 0; $i < $typeLength; ++$i) {
        $char = $type[$i];

        switch ($char) {
            case '(':
                ++$parenthesisLevel;
                $currentPart .= $char;
                break;
            case ')':
                --$parenthesisLevel;
                $currentPart .= $char;
                break;
            case '|':
                if (0 === $parenthesisLevel) {
                    $parts[] = $currentPart;
                    $currentPart = '';
                } else {
                    $currentPart .= $char;
                }
                break;
            default:
                $currentPart .= $char;
                break;
        }
    }

    if ('' !== $currentPart) {
        $parts[] = $currentPart;
    }

    return $parts;
}

$testCases = [
    'string',
    'int',
    'bool',
    'array',
    'string|int',
    'string|int|bool',
    'string|int|bool|array',
    'string|(int|bool)',
    '(string|int)|bool',
    'string|(int|(bool|float))',
    '(string|int)|(bool|float)',
    'MyClass',
    'string[]',
    'int[]',
    '\\Namespace\\Class',
    'string|int|bool|array|object|resource|callable',
];

$iterations = 100000;

$start = microtime(true);

for ($i = 0; $i < $iterations; $i++) {
    foreach ($testCases as $testCase) {
        splitOutsideParenthesis($testCase);
    }
}

$end = microtime(true);
$duration = ($end - $start) * 1000;

$peakMemory = memory_get_peak_usage(true) / 1024 / 1024;

echo "Optimization 4 (switch statement): " . number_format($duration, 2) . "ms for " . ($iterations * count($testCases)) . " operations\n";
echo "Peak memory usage: " . number_format($peakMemory, 2) . " MB\n";
test_optimized.php - Final optimized implementation (all optimizations combined)
<?php

function splitOutsideParenthesis(string $type): array
{
    if (!\str_contains($type, '|')) {
        return [$type];
    }
    
    if (!\str_contains($type, '(') && !\str_contains($type, ')')) {
        return \explode('|', $type);
    }
    
    $parts = [];
    $start = 0;
    $parenthesisLevel = 0;
    $length = \strlen($type);
    
    for ($i = 0; $i < $length; ++$i) {
        $char = $type[$i];
        
        switch ($char) {
            case '(':
                ++$parenthesisLevel;
                break;
            case ')':
                --$parenthesisLevel;
                break;
            case '|':
                if (0 === $parenthesisLevel) {
                    $parts[] = \substr($type, $start, $i - $start);
                    $start = $i + 1;
                }
                break;
        }
    }
    
    if ($start < $length) {
        $parts[] = \substr($type, $start);
    }
    
    return $parts;
}

$testCases = [
    'string',
    'int',
    'bool',
    'array',
    'string|int',
    'string|int|bool',
    'string|int|bool|array',
    'string|(int|bool)',
    '(string|int)|bool',
    'string|(int|(bool|float))',
    '(string|int)|(bool|float)',
    'MyClass',
    'string[]',
    'int[]',
    '\\Namespace\\Class',
    'string|int|bool|array|object|resource|callable',
];

$iterations = 100000;

$start = microtime(true);

for ($i = 0; $i < $iterations; $i++) {
    foreach ($testCases as $testCase) {
        splitOutsideParenthesis($testCase);
    }
}

$end = microtime(true);
$duration = ($end - $start) * 1000;

$peakMemory = memory_get_peak_usage(true) / 1024 / 1024;

echo "Optimized implementation: " . number_format($duration, 2) . "ms for " . ($iterations * count($testCases)) . " operations\n";
echo "Peak memory usage: " . number_format($peakMemory, 2) . " MB\n";

@bendavies
Copy link
Contributor Author

bendavies commented Jul 26, 2025

i investigated @dunglas suggestion to use regex.
a pure regex solution appears to be almost twice as fast as my current solution.
I'll update this PR on Monday

Summary
  php test_original.php ran
    5.53 ± 0.15 times slower than php test_optimized_v2.php <-- pure regex, no loops
    3.12 ± 0.08 times slower than php test_optimized.php

@bendavies
Copy link
Contributor Author

bendavies commented Jul 29, 2025

@dunglas pure regex solution added. reproducing commit message here for completeness:

[OptionsResolver] optimize splitOutsideParenthesis with regex (5.88x faster)

Replace manual string parsing with regex-based implementation using PCRE's
(*SKIP)(*FAIL) backtracking control verbs. This approach:

Achieves 5.88x performance improvement over the original implementation

The regex pattern '\([^()]*(?:\([^()]*\)[^()]*)*\)(*SKIP)(*FAIL)|\|' works by:
 1. Matching parentheses and their contents (including nested parentheses)
 2. Using (*SKIP)(*FAIL) to skip these matches entirely
 3. Then splitting on the remaining pipe characters outside parentheses

| # OR
\| # Match the pipe delimiter. This will only be matched if it was not inside a skipped group.
/x
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you try using a recursive regexp? this will remove the current limitation of parsing only the first 2 levels of nesting:

/
  # Match a recursively balanced parenthetical group, then skip it
  \(              # Match an opening parenthesis
  (?:             # Start a non-capturing group for the contents
      [^()]       # Match any character that is not a parenthesis
    |             # OR
      (?R)        # Recurse the entire pattern to match a nested group
  )*              # Repeat the group for all contents
  \)              # Match the final closing parenthesis
  (*SKIP)(*FAIL)  # Discard the match and find the next one
| # OR
  \|              # Match the pipe delimiter (only if not inside a skipped group)
/x

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this didnt work because i think (?R) is trying to match the entire pattern (including the (*SKIP)(*FAIL) part i think?

fixed it by defining a recursive subroutine.

@OskarStark
Copy link
Contributor

PR title and body would need an update then for the 5. optimization

@bendavies bendavies force-pushed the optionresolver-splitOutsideParenthesis-performance branch from 2048814 to 3aa0c9d Compare July 29, 2025 20:03
@bendavies
Copy link
Contributor Author

bendavies commented Jul 29, 2025

Got a bit scared after @nicolas-grekas pointed out that my regex only worked for 2 levels, so added a bunch of regression tests a first commit.

then added a new recursive regex which supports unlimited depth while maintaining the performance improvement.

@bendavies bendavies changed the title [OptionsResolver] Optimize splitOutsideParenthesis() - 2.91x faster [OptionsResolver] Optimize splitOutsideParenthesis() - 5.9x faster Jul 29, 2025
@nicolas-grekas nicolas-grekas force-pushed the optionresolver-splitOutsideParenthesis-performance branch from 3aa0c9d to b568fef Compare July 29, 2025 20:10
@nicolas-grekas
Copy link
Member

Thank you @bendavies.

@nicolas-grekas nicolas-grekas merged commit 9977966 into symfony:7.4 Jul 29, 2025
8 of 9 checks passed
@bendavies
Copy link
Contributor Author

bendavies commented Jul 29, 2025

@nicolas-grekas sheesh that was fast 😆. I didn't have time to update the PR body

@nicolas-grekas
Copy link
Member

You can still do it after the merge ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy