Skip to content

tests/extmod/select_poll_eintr: Skip unreliable test in Github CI. #17745

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

AJMansfield
Copy link
Contributor

@AJMansfield AJMansfield commented Jul 22, 2025

Summary

extmod/select_poll_eintr.py is a constant source of spurious failures in Github CI.

This PR adds it to the list of tests skipped when running on Github CI, to help reduce the overall false positive rate and improve the predictive value of the test fail indication.

Testing

I exampled a sample of the last 25 failed Github Actions runs, tabulated their causes, and calculated relevant confusion matrix statistics over the results to determine that there is in fact adequate statistical evidence to support my original anecdotal experience with extmod/select_poll_eintr.py being problematic.

Action Run Failed Job(s) Cause
16447411965 stackless_clang thread/stress_aes.py
16446157516 qemu_riscv64 thread/stress_aes.py
16445640721 qemu_riscv64 thread/stress_aes.py
16445092499 standard_v2 extmod/select_poll_eintr.py
16442539782 settrace_stackless extmod/select_poll_eintr.py
16439460414 standard_v2
stackless_clang
extmod/select_poll_eintr.py
extmod/select_poll_eintr.py
16439339413 settrace_stackless extmod/select_poll_eintr.py
16438892781 standard_v2 extmod/select_poll_eintr.py
16438838082 standard extmod/select_poll_eintr.py
16438686105 standard_v2
settrace_stackless
extmod/select_poll_eintr.py
extmod/select_poll_eintr.py
16437062166 float_clang
settrace_stackless
extmod/select_poll_eintr.py
extmod/select_poll_eintr.py
16435694536 settrace_stackless extmod/select_poll_eintr.py
16435294140 standard_v2 extmod/select_poll_eintr.py
16435084663 settrace_stackless extmod/select_poll_eintr.py
16434901639 float extmod/select_poll_eintr.py
16433931194 standard extmod/select_poll_eintr.py
16433726206 standard_v2
stackless_clang
macos
10 other jobs
extmod/select_poll_eintr.py
extmod/select_poll_eintr.py
extmod/select_poll_eintr.py basics/slice_optimse.py
basics/slice_optimse.py
16433010322 standard
standard_v2
longlong
extmod/select_poll_eintr.py
extmod/select_poll_eintr.py
(many failures)
16432556955 14 jobs build failure
16432475831 settrace_stackless extmod/select_poll_eintr.py
16432121694 longlong extmod/vfs_rom.py import/import_broken.py
16421543831 standard extmod/select_poll_eintr.py
16420969407 standard
standard_v2
extmod/select_poll_eintr.py
extmod/select_poll_eintr.py
16420440397 standard_v2 extmod/select_poll_eintr.py
16418722881 standard
standard_v2
settrace_stackless
extmod/select_poll_eintr.py
extmod/select_poll_eintr.py
extmod/select_poll_eintr.py

(Note that reproducible was excluded from tabulation as it doesn't run extmod/select_poll_eintr.py )

20 of these 25 examined runs include extmod/select_poll_eintr.py as a failure, compared to only 6 runs that include any other kind of failure.
As far as I can tell, none of these failures have anything to do with changes made to the select module in the triggering branch, making all but the one run that also included another failure false positives.
Over the same sample period, there were a total of 9 passing unix runs. Under the assumption that all 6 non-extmod/select_poll_eintr.py failed runs are true positives and that all 9 of these passing runs are true negatives, that gives the test suite with extmod/select_poll_eintr.py included a false positive rate of 67.8%, a positive predictive value of only 24%, and an F1 score of 0.387. These values support the conclusion that the rate of spurious failures is excessive, and that the usefulness of the CI failure indicator is diluted as a result.

Considering extmod/select_poll_eintr.py individually, this test has a per-job false positive rate of 5.5% and a per-run fpr of 60.6%. This supports the conclusion that the weak predictive value of the test suite is largely attributable to this test.

Overall, the sample I examined supports the conclusion that extmod/select_poll_eintr.py is problematic should be excluded from Github CI runs going forward.

Statistics Code, for anyone who cares to check my math:
from dataclasses import dataclass

@dataclass
class ConfusionMatrix:
    tp: int
    tn: int
    fp: int
    fn: int

    @property
    def p(self):
        return self.tp + self.fn

    @property
    def n(self):
        return self.fp + self.tn

    @property
    def pp(self):
        return self.tp + self.fp

    @property
    def pn(self):
        return self.fn + self.tn
    
    @property
    def pop(self):
        return self.tp + self.fp + self.tn + self.fn

    @property
    def tpr(self):
        return self.tp / self.p
    
    @property
    def fnr(self):
        return self.fn / self.p
    
    @property
    def fpr(self):
        return self.fp / self.n
    
    @property
    def tnr(self):
        return self.tn / self.n
    
    @property
    def ppv(self):
        return self.tp / self.pp
    
    @property
    def npv(self):
        return self.tn / self.pn
    
    @property
    def fdr(self):
        return self.fp / self.pp
    
    @property
    def fOr(self):
        return self.fn / self.pn
    
    @property
    def f1(self):
        return 2*self.tp / (2*self.tp + self.fp + self.fn)

    def report(self, title):
        return f"""\
{title}
  Population: {self.pop}
  Confusion Matrix:
              PPos PNeg
    Positive: {self.tp: 4} {self.fn: 4}
    Negative: {self.fp: 4} {self.tn: 4}
  Positive Predictive Value: {self.ppv:%}
  False Positive Rate: {self.fpr:%}
  F1 Score: {self.f1}
"""

# 19 fail runs with only eintr
# 5 fail runs with only other failures (1 of them precluded eintr)
# 1 fail run with both
# 9 pass runs
print(ConfusionMatrix(
    tp = 5 + 1,
    fp = 19,
    tn = 9,
    fn = 0,
).report("Overall, by runs:"))

print(ConfusionMatrix(
    tp = 0,
    fp = 19 + 1,
    tn = 9 + 4,
    fn = 0,
).report("eintr, by runs:"))

# 28 fail jobs with only eintr
# 28 fail jobs with only other failures
# 1 fail job with both
# 327 pass jobs from fail runs
# 144 pass jobs from pass runs
print(ConfusionMatrix(
    tp = 28 + 1,
    fp = 28,
    tn = 327 + 144,
    fn = 0,
).report("Overall, by jobs:"))

print(ConfusionMatrix(
    tp = 0,
    fp = 28 + 1,
    tn = 327 + 144 + 28,
    fn = 0,
).report("eintr, by jobs:"))

Output:

Overall, by runs:
  Population: 34
  Confusion Matrix:
              PPos PNeg
    Positive:    6    0
    Negative:   19    9
  Positive Predictive Value: 24.000000%
  False Positive Rate: 67.857143%
  F1 Score: 0.3870967741935484

eintr, by runs:
  Population: 33
  Confusion Matrix:
              PPos PNeg
    Positive:    0    0
    Negative:   20   13
  Positive Predictive Value: 0.000000%
  False Positive Rate: 60.606061%
  F1 Score: 0.0

Overall, by jobs:
  Population: 528
  Confusion Matrix:
              PPos PNeg
    Positive:   29    0
    Negative:   28  471
  Positive Predictive Value: 50.877193%
  False Positive Rate: 5.611222%
  F1 Score: 0.6744186046511628

eintr, by jobs:
  Population: 528
  Confusion Matrix:
              PPos PNeg
    Positive:    0    0
    Negative:   29  499
  Positive Predictive Value: 0.000000%
  False Positive Rate: 5.492424%
  F1 Score: 0.0

Copy link

codecov bot commented Jul 22, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.41%. Comparing base (e993f53) to head (26d9bf2).
Report is 8 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #17745   +/-   ##
=======================================
  Coverage   98.41%   98.41%           
=======================================
  Files         171      171           
  Lines       22210    22210           
=======================================
  Hits        21857    21857           
  Misses        353      353           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@AJMansfield AJMansfield changed the title tests/extmod/select_poll_eintr: Skip unreliable test on ci/cd. tests/extmod/select_poll_eintr: Skip unreliable test in Github CI. Jul 22, 2025
extmod/select_poll_eintr.py is a constant source of spurious failures in
Github CI. This PR adds it to the list of tests skipped in that
environment in order to improve the test suite's false positive rate and
positive predictive value in detecting defects.

Signed-off-by: Anson Mansfield <amansfield@mantaro.com>
@AJMansfield AJMansfield force-pushed the cicd-ignore-broken-eintr branch from 28ea4a6 to 26d9bf2 Compare July 22, 2025 18:54
@dpgeorge
Copy link
Member

Thanks for the very detailed analysis!

I should have been clearer that this is intended to be fixed (with a workaround for the true bug) by #17655.

@AJMansfield
Copy link
Contributor Author

I should have been clearer that this is intended to be fixed (with a workaround for the true bug) by #17655.

Oh, I think that did actually come up in the search I did, guess I should've read further.

@AJMansfield AJMansfield deleted the cicd-ignore-broken-eintr branch July 23, 2025 01:46
@dpgeorge
Copy link
Member

And that PR has just been merged, so CI should be a lot happier now.

@AJMansfield
Copy link
Contributor Author

AJMansfield commented Jul 23, 2025

And that PR has just been merged, so CI should be a lot happier now.

Ty! Since, this test is the main reason I didn't notice the other rv32 test failures when I was reviewing #17716 originally --- with how conditioned I was starting to get, expecting there to always be one or two failures in every CI run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy