-
-
Notifications
You must be signed in to change notification settings - Fork 934
Description
Bug report
I have some code that downloads a large quantity of data and then analyses it. The data consists of compressed csv and json files stored into a deeply nested directory structure (~8k files, ~8k symlinks, ~20k directories). Within that structure, related data items have symbolic links between them, both ways.
The data is excluded from source control through use of .gitignore
. The data is also excluded from phpstan completely via a .phpstan/config.neon
file that has:
parameters:
excludePaths:
analyseAndScan:
- ../data-dir (?)
- ../analysis-dir (?)
The trailing (?)
is to indicate that the directories might not exist at first.
When there is a large quantity of data in those directories, phpstan runs significantly slower than expected and than it would do otherwise:
- With data present, we have to wait anywhere between 1 min 40 and 3 min for analysis to complete, every single time (~170 MB consumed).
- Without data present, analysis takes 8 secs (~ 170 MB consumed), faster with caching.
Also, phpstan has occasionally got itself into a state where it keeps crashing while analysing. From the phpstan stack trace (which I don't have a record of now, sorry!), it appeared that it was crashing in phpstan RecursiveDirectoryIterator
code, apparently deeply nested following symlinks in the data directory, jumping back and forth between related data items until it hits some recursion limit - even thought it has no good reason to be probing through that directory at all. Clearing phpstan's cache via phpstan clear-result-cache
seemed to help with that.
With data present, I also observe when running with -vvv
that it is taking ~30 secs just to restore the result cache. Not sure why that is. The resultCache.php
file is 266K in size and does not appear to reference the data directories except in relation to them being excluded via the project config.
When running with --debug -vvv
I see that 26 files are being analysed, one of which takes ~3 secs to process, one of which takes ~2 secs to process, while all others take a fraction of a second. The time is being taken up by something else.
I think that phpstan may be unnecessarily iterating over all file system objects in the excluded directory for some reason, perhaps sometimes also following symlinks in that directory (and possibly failing to notice that it has returned back to where it started, so permitting infinite looping), and that this is slowing phpstan down significantly. It is slow with or without utilising the cache, but since restoring the cache is reported as a slow process, there may be a problem there too.
There is an opportunity here to improve performance for the benefit of everyone by ensuring that the structure and contents of excluded directories does not impact on the execution of phpstan.
Code snippet that reproduces the problem
No response
Expected output
Same, but with the time taken to analyse the code being independent of and not being slowed down by the content of excluded directories.
Did PHPStan help you today? Did it make you happy in any way?
PHPStan has helped us update modernise (autloading, strict typing throughout, PHP 7 to 8 migration) a PHP library and two internal applications recently.