Skip to content

ports/esp32: Fix crash and improve timer interrupt allocation #17265

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 5, 2025

Conversation

DvdGiessen
Copy link
Contributor

@DvdGiessen DvdGiessen commented May 7, 2025

Summary

esp_intr_free is not safe to call from the timer ISR because it requires the current task (the one the ISR interrupted) to be pinned to the same core as the interrupt was allocated on. The ISR is of course guaranteed to interrupt a task on the same core, but it's not guaranteed that that task is also pinned to that core.

This was causing a lockup followed by ISR watchdog timeout in the machine_uart RXIDLE timer (which disables the timer from its own callback) when the ISR happened to interrupt a task that was not pinned to a specific core (for example for me if often hit the lwIP TCP/IP thread task).

The first commit in this PR fixes that by merely disabling the interrupt, which is safe to do from the ISR since that only requires that we're currently running on the same core (which the ISR always is), regardless of the current task.

Additionally, the second commit makes repeated disabling and enabling of the interrupt (such as the UART RXIDLE does) a bit more efficient by re-enabling instead of reallocating it. That also allowed to removing some code duplication and simplify how machine_uart uses the timer.

Testing

I've been using these fixes in combination with #17138 (which changed the PPP implementation to use that RXIDLE IRQ). Like that PR, I've been running this on three custom boards (1x original ESP32, 2x ESP32S3) for a few weeks at the moment of writing.

@DvdGiessen DvdGiessen changed the title ports/esp32: Improve timer interrupt aloc ports/esp32: Fix crash and improve timer interrupt allocation May 7, 2025
@DvdGiessen DvdGiessen force-pushed the esp32_timer_interrupt branch from 2e0fb59 to 2015bda Compare May 7, 2025 15:22
@projectgus projectgus self-requested a review May 9, 2025 06:19
Copy link
Contributor

@projectgus projectgus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look good to me, @DvdGiessen, and even without the use-after-free bug I agree on principle that it's better to be disabling/re-enabling the interrupt rather than freeing and re-allocating it where we can.

My only concern is that it'd be nice to have either a regression test or test coverage for this. I'm guessing that's non-trivial, yeah? i.e. It doesn't sound like the crash would be easy to reproduce in a unit test, and there's no existing unit test for machine.Timer on esp32 so "extending" test coverage would really mean "implementing" test coverage which is a much bigger task. What do you think?

@DvdGiessen
Copy link
Contributor Author

Thanks for reviewing! Note I don't think it's a use-after-free, it's due to the use of esp_ipc_call_blocking that it's more of a deadlock-because-the-ISR-waits-on-IPC-which-cant-run-until-the-ISR-is-finished. While triggering the bug is nondeterministic (due to it depending on which task is currently scheduled) once triggered the behaviour is deterministic.

I agree that the ESP32 could use some machine.Timer tests, but I don't think it'll be easy to test this specific bug. Due to it being dependant on scheduling (and having a non-pinned task in the first place) it won't be easily reproduced with a unit test; best we could do is either engineer the circumstances with a modified build (which seems a bit pointless) or run a reproduction in a loop with the standard build and assert it doesn't crash within some timeout.

@projectgus
Copy link
Contributor

Note I don't think it's a use-after-free, it's due to the use of esp_ipc_call_blocking that it's more of a deadlock-because-the-ISR-waits-on-IPC-which-cant-run-until-the-ISR-is-finished

Oh, sorry, yes you explained the issue very clearly in your description! Which I read, then waited a week, then reviewed the code so I'd forgotten what you said about it. 🤦

I think it's probably OK to merge this without worrying about automated testing, if Damien is OK with that.

@projectgus projectgus requested a review from dpgeorge May 30, 2025 06:19
@dpgeorge dpgeorge added this to the release-1.26.0 milestone Jun 4, 2025
esp_intr_free is not safe to call from the timer ISR because it requires
the current task (the one the ISR interrupted) to be pinned to the same
core as the interrupt was allocated on. Merely disabling the ISR however is
safe since that only requires that we're currently running on the same core
(which the ISR always is), regardless of the current task.

This was causing deadlocks in machine_uart when the ISR happened to
interrupt a task that was not pinned to a specific core.

Signed-off-by: Daniël van de Giessen <daniel@dvdgiessen.nl>
If the interrupt is not freed but merely disabled, instead of reallocating
it every time the timer is enabled again we can instead just re-enable it.
That means we're no longer setting the handler every time, and we need to
ensure it does not change. Doing so by adding an additional wrapper
function does not only solve that problem, it also allows us to remove
some code duplication and simplify how machine_uart uses the timer.

Signed-off-by: Daniël van de Giessen <daniel@dvdgiessen.nl>
Copy link
Member

@dpgeorge dpgeorge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks good.

Regarding tests: I'm OK not adding any new tests. We already have UART IRQ tests which should test part of the changes here. And eventually we'll have better machine.Timer tests that cover all ports.

@dpgeorge dpgeorge force-pushed the esp32_timer_interrupt branch from 2015bda to 2c2f0b2 Compare June 5, 2025 06:46
@dpgeorge dpgeorge merged commit 2c2f0b2 into micropython:master Jun 5, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy