Skip to content

fix: avoid call stack overflow while processing globs #19035

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 23, 2024

Conversation

LiviaMedeiros
Copy link
Contributor

Prerequisites checklist

What is the purpose of this pull request? (put an "X" next to an item)

[ ] Documentation update
[x] Bug fix (template)
[ ] New rule (template)
[ ] Changes an existing rule (template)
[ ] Add autofix to a rule
[ ] Add a CLI option
[ ] Add something to the core
[ ] Other, please explain:

What changes did you make? (Give an overview)

If there's a directory with considerable amount of files (I encountered it at 150_000), pushing them at once exceeds call stack limit.
Array.prototype.push.apply(filePaths, result.value); without spread operator would fail as well.

Is there anything you'd like reviewers to focus on?

Performance.
I've tried different CHUNK_SIZEs between 512 and 10000, and the time curve wasn't even weakly monotonic.
A good alternative would be pre-allocating the array, something like this:

let index = filePaths.length;
filePaths.length += result.value.length;
for (let i = 0; i < result.value.length; i++) {
    filePaths[index++] = result.value[i];
}

But the performance benefit is negligible and IMHO it looks uglier.

@LiviaMedeiros LiviaMedeiros requested a review from a team as a code owner October 21, 2024 04:18
Copy link

linux-foundation-easycla bot commented Oct 21, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@eslint-github-bot eslint-github-bot bot added the bug ESLint is working incorrectly label Oct 21, 2024
@github-actions github-actions bot added cli Relates to ESLint's command-line interface core Relates to ESLint's core APIs and features labels Oct 21, 2024
Copy link

netlify bot commented Oct 21, 2024

Deploy Preview for docs-eslint canceled.

Name Link
🔨 Latest commit aca961e
🔍 Latest deploy log https://app.netlify.com/sites/docs-eslint/deploys/6717d0423309980008ce7911

@LiviaMedeiros LiviaMedeiros force-pushed the fix-preemfile-stackoverflow branch from 488136e to 72bf938 Compare October 21, 2024 04:28
@mdjermanovic
Copy link
Member

Would it fix the problem, and would the performance be better if we just do return results.flatMap(result => result.value); instead of pushing to filePaths?

@LiviaMedeiros
Copy link
Contributor Author

Do you mean checking non-fulfilled results in separate loop? I don't think we can await throwErrorForUnmatchedPatterns inside flatMap.
Without awaiting, it would fix the problem, the performance is within error margin (for single directory with 65535 files I'm getting 4150~4270ms in both versions; with 262143 files flatMap was slower by about 100ms out of 16 seconds).

@mdjermanovic
Copy link
Member

throwErrorForUnmatchedPatterns always throws so we don't have to worry about it? I believe the loop checks if the promise is rejected and in that case throws an error, otherwise pushes the values to filePaths.

I was thinking about something like this:

- return filePaths;
+ return results.flatMap(result => result.value);

And then we could remove the filePaths intermediary variable?

@LiviaMedeiros
Copy link
Contributor Author

Like this? It looks cleaner and wouldn't be significantly slower, but we effectively iterate over results twice.

@mdjermanovic
Copy link
Member

but we effectively iterate over results twice.

I think that isn't a concern as results is typically a very small array. Its length is less than or equal to the number of globs and directories passed on the command line.

@mdjermanovic
Copy link
Member

mdjermanovic commented Oct 21, 2024

I can reproduce the error locally: I changed DEFAULT_FILE_COUNT in the emfile check to 150_000, ran npm run test:emfile and got the following.

$ npm run test:emfile

> eslint@9.13.0 test:emfile
> node tools/check-emfile-handling.js

Generating 150000 files in tmp/emfile-check...
Running ESLint...
(node:10236) ESLintIgnoreWarning: The ".eslintignore" file is no longer supported. Switch to using the "ignores" property in "eslint.config.js": https://eslint.org/docs/latest/use/configure/migration-guide#ignoring-files
(Use `node --trace-warnings ...` to show where the warning was created)

Oops! Something went wrong! :(

ESLint: 9.13.0

RangeError: Maximum call stack size exceeded
    at globMultiSearch (C:\projects\eslint\lib\eslint\eslint-helpers.js:435:27)
    at async findFiles (C:\projects\eslint\lib\eslint\eslint-helpers.js:591:27)
    at async ESLint.lintFiles (C:\projects\eslint\lib\eslint\eslint.js:740:27)
    at async Object.execute (C:\projects\eslint\lib\cli.js:498:23)
    at async main (C:\projects\eslint\bin\eslint.js:158:22)
node:child_process:965
    throw err;
    ^

Error: Command failed: node bin/eslint.js tmp/emfile-check -c tests/fixtures/emfile/eslint.config.js
    at genericNodeError (node:internal/errors:984:15)
    at wrappedFn (node:internal/errors:538:14)
    at checkExecSyncError (node:child_process:890:11)
    at execSync (node:child_process:962:15)
    at Object.<anonymous> (C:\projects\eslint\tools\check-emfile-handling.js:92:1)
    at Module._compile (node:internal/modules/cjs/loader:1369:14)
    at Module._extensions..js (node:internal/modules/cjs/loader:1427:10)
    at Module.load (node:internal/modules/cjs/loader:1206:32)
    at Module._load (node:internal/modules/cjs/loader:1022:12)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:135:12) {
  status: 2,
  signal: null,
  output: [ null, null, null ],
  pid: 3348,
  stdout: null,
  stderr: null
}

eslint-helpers.js:435:27 indeed points to the filePaths.push(...result.value); line.

So, marking as accepted.

I can confirm that the changes from this PR fix the "Maximum call stack size exceeded" problem in my test setup. Though, the test eventually fails with an EMFILE error, but we'll take another look at the EMFILE problem in #18977.

@mdjermanovic mdjermanovic added accepted There is consensus among the team that this change meets the criteria for inclusion contributor pool labels Oct 21, 2024
@LiviaMedeiros
Copy link
Contributor Author

Speaking of tools/check-emfile-handling.js, i suspect that

// if we're on a Mac, make sure the limit isn't high enough to cause a call stack error
if (os.platform() === "darwin") {
FILE_COUNT = Math.min(FILE_COUNT, 100000);
}
is directly related to this issue. If removing it won't break CI (it totes can, I also can confirm that the EMFILE still occurs at least after exceeding hard limit), perhaps it should be done in scope of this PR.

@LiviaMedeiros
Copy link
Contributor Author

Ah okay, it fails with FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory rather than call stack error. Reverting.

@LiviaMedeiros LiviaMedeiros force-pushed the fix-preemfile-stackoverflow branch from 38a50fb to df082d6 Compare October 21, 2024 10:45
@mdjermanovic
Copy link
Member

Speaking of tools/check-emfile-handling.js, i suspect that

// if we're on a Mac, make sure the limit isn't high enough to cause a call stack error
if (os.platform() === "darwin") {
FILE_COUNT = Math.min(FILE_COUNT, 100000);
}

is directly related to this issue. If removing it won't break CI (it totes can, I also can confirm that the EMFILE still occurs at least after exceeding hard limit), perhaps it should be done in scope of this PR.

Yeah, it's quite possible that the comment refers to this particular call stack error.

I'm not sure if it's possible to determine the max number of elements that can be spread into a function call, to ensure we'll not use spread arguments with the list of files anywhere, and then test with that number. And it seems that the test would eventually run into either an EMFILE error or a heap out of memory error. So, I'm inclined to merging this without any particular tests and consider increasing limits in the EMFILE check in a PR that will fix #18977. @nzakas what do you think?

@mdjermanovic mdjermanovic changed the title fix: avoid call stack overflow by processing file paths in chunks fix: avoid call stack overflow while processing globs Oct 21, 2024
mdjermanovic
mdjermanovic previously approved these changes Oct 21, 2024
Copy link
Member

@mdjermanovic mdjermanovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! Leaving it open for a second review, and since there's an open question about whether we want to add some tests in this PR.

@mdjermanovic mdjermanovic added the repro:yes Issues with a reproducible example label Oct 21, 2024
Copy link
Member

@nzakas nzakas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that we can determine the max number of spread arguments a function may have. I'm guessing it's close to the max number of array elements (2^32 - 1), so I hope we wont' have to worry about it.

I think it's okay to merge without a test because I'm not sure we'd be able to make a non-flaky test to validate this change. We should put a comment in the code, though, explaining what's happening so we don't accidentally change this back in the future.

@@ -421,20 +421,12 @@ async function globMultiSearch({ searches, configLoader, errorOnUnmatchedPattern
)
);

const filePaths = [];

for (let i = 0; i < results.length; i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment here explaining what we're looping over results twice? Especially without a test to validate this change, having some comments here would be very helpful for long-term maintenance.

@LiviaMedeiros
Copy link
Contributor Author

I'm not sure if it's possible to determine the max number of elements that can be spread into a function call, to ensure we'll not use spread arguments with the list of files anywhere, and then test with that number.

I don't know that we can determine the max number of spread arguments a function may have. I'm guessing it's close to the max number of array elements (2^32 - 1), so I hope we wont' have to worry about it.

Determining the real stack limit here is quite simple: we can just run something like [].push(...Array.from({ length })) and catch RangeError. I'm getting something around 122825 in ideal conditions on x86_64. This value is never constant though, i'd expect any test involving exceeding stack to be flaky.

I think the tools/check-emfile-handling.js, once the EMFILE error is fixed, would be sufficient as a test for this.
Right now the test passes despite error still not fixed, so perhaps it should test with bigger amounts of files, maybe something like Math.max(100_000, ulimit * 2) or more.
Determining the limit might also be more complicated... On my system, ulimit -n outputs 1024, ulimit -Hn outputs 4096, but ${child_process.execSync('ulimit -n')} in Node.js outputs 4096. Meanwhile /proc/sys/fs/file-max tells that max amount of FDs is 8192. And the worst thing is, running test that actually pushes the limits renders OS unusable, not allowing to run even something simple like ls or ps.

If we want to test specifically for spread operator issue and similar stuff (something.apply(something, beeeegArray)), i'd recommend to lower the stack size in the test(s), e.g. by adding // Flags: --stack-size=256.

Copy link
Member

@nzakas nzakas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@nzakas
Copy link
Member

nzakas commented Oct 23, 2024

I'm okay merging without a test, but would like @mdjermanovic to confirm.

Copy link
Member

@mdjermanovic mdjermanovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! We can evaluate adding tests separately.

@mdjermanovic mdjermanovic merged commit d474443 into eslint:main Oct 23, 2024
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted There is consensus among the team that this change meets the criteria for inclusion bug ESLint is working incorrectly cli Relates to ESLint's command-line interface contributor pool core Relates to ESLint's core APIs and features repro:yes Issues with a reproducible example
Projects
Status: Complete
Development

Successfully merging this pull request may close these issues.

3 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy