Skip to content

esp32/network_ppp: Bugfixes for deadlocks and crashes when disconnecting. #17656

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

DvdGiessen
Copy link
Contributor

@DvdGiessen DvdGiessen commented Jul 10, 2025

Summary

A follow-up to #17138 in which I reworked the ESP32 PPP implementation to be closer to the extmod implementation.

In that PR it was mentioned that:

PPP is quite hard to get right and modems can be very finicky. So it's great now that the code running on esp32 boards is also (indirectly) testing the extmod code, and vice versa

Well, I found three issues on the ESP32. Two of them are specific to the ESP32 and it's use of a thread-safe API, and I think one of them (the PCB cleanup) also is probably also wrong in the extmod version.

The three fixes (each a separate commit):

Use thread-safe API for PPPoS input

The ESP32 port uses the thread-safe API, but in the previous PR the PPP input function was accidentally changed to use the non-safe API. It happens to work fine, but the correct way is to use the thread-safe API as we do elsewhere in the implementation (and did before this change was accidentally introduced).

(extmod doesn't use the thread-safe API so isn't affected.)

Use non-thread-safe API inside status callback

The status callback runs on the lwIP tcpip_thread, and thus on the ESP32 we must use the non-thread-safe API because the thread-safe API would cause a deadlock (because it would wait on that same tcpip_thread to first finish executing the status callback).

(extmod doesn't use the thread-safe API so isn't affected, other than a change that doesn't change any functionally but keeps the two files as similar as possible when diffing them.)

Correctly clean up PPP PCB after close

If PPP is still connected, freeing the PCB will fail (see lwIP code here) and thus instead we should trigger a disconnect and wait for the lwIP callback to actually free the PCB.

When PPP is not connected we should check if the freeing failed, warn the user if so, and only mark the connection as inactive if not.

When all this happens during garbage collection the best case is that the PPP connection is already dead, which means the callback will be called immediately and cleanup will happen correctly. The worst case is that the connection is still alive, thus we are unable to free the PCB (lwIP won't let us) and it remains referenced in the netif_list, meaning a use-after-free happens later when lwIP traverses that linked list.

While this change does not fully fix the garbage collection case, on the ESP32 port specifically it does improve how the PPP.active(False) method behaves: It no longer immediately tries to free (and fails), but instead triggers a disconnect and lets the cleanup happen correctly through the status callback. (extmod doesn't have the .active() method.)

Testing

I've so far only tested this on the ESP32 port, by repeatedly connecting/disconnecting/deleting, and checking via GDB that the netif_list did not get corrupted anymore. As for other ports: I did not test them, but am fairly confident the change makes sense; as the linked code from the exact lwIP submodule used by extmod shows the ppp_free function indeed can fail and thus the change to handle it in the callback seems correct.

A small follow-up to 3b1e22c, in which
the entire PPP implementation was reworked to more closely resemble the
extmod version. One of the differences between extmod and the ESP32 port
is that the ESP32 port uses the thread-safe API, but in that changeset
the PPP input function was accidentally changed to use the non-safe API.

Signed-off-by: Daniël van de Giessen <daniel@dvdgiessen.nl>
The status callback runs on the lwIP tcpip_thread, and thus must use the
non-thread-safe API because the thread-safe API would cause a deadlock.

Signed-off-by: Daniël van de Giessen <daniel@dvdgiessen.nl>
If PPP is still connected, freeing the PCB will fail and thus instead we
should trigger a disconnect and wait for the lwIP callback to actually
free the PCB.

When PPP is not connected we should check if the freeing failed, warn
the user if so, and only mark the connection as inactive if not.

When all this happens during garbage collection the best case is that
the PPP connection is already dead, which means the callback will be
called immediately and cleanup will happen correctly. The worst case is
that the connection is still alive, thus we are unable to free the PCB
(lwIP won't let us) and it remains referenced in the netif_list, meaning
a use-after-free happens later when lwIP traverses that linked list.

This change does not fully prevent that, but it *does* improve how the
PPP.active(False) method on the ESP32 port behaves: It no longer
immediately tries to free (and fails), but instead triggers a disconnect
and lets the cleanup happen correctly through the status callback.

Signed-off-by: Daniël van de Giessen <daniel@dvdgiessen.nl>
Copy link

codecov bot commented Jul 10, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.44%. Comparing base (df05cae) to head (c0717d5).
Report is 4 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #17656   +/-   ##
=======================================
  Coverage   98.44%   98.44%           
=======================================
  Files         171      171           
  Lines       22192    22192           
=======================================
  Hits        21847    21847           
  Misses        345      345           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link

Code size report:

   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:    +0 +0.000% standard
      stm32:    +0 +0.000% PYBV10
     mimxrt:    +0 +0.000% TEENSY40
        rp2:    +0 +0.000% RPI_PICO_W
       samd:    +0 +0.000% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:    +0 +0.000% VIRT_RV32

@DvdGiessen DvdGiessen changed the title Esp32 ppp improvements esp32/network_ppp: Bugfixes for deadlocks and crashes when disconnecting. Jul 10, 2025
@dpgeorge dpgeorge added this to the release-1.26.0 milestone Jul 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy