gh-109934: notify cancelled futures on thread pool shutdown #134618

duaneg · 2025-05-24T03:18:36Z

When ThreadPoolExecutor shuts down it cancels any pending futures, however at present it doesn't notify waiters. Thus their state stays as CANCELLED instead of CANCELLED_AND_NOTIFIED and any waiters are not awakened.

Call set_running_or_notify_cancel on the cancelled futures to fix this.

Issue: concurrent.futures.wait returns cancelled as "not done" #109934

When `ThreadPoolExecutor` shuts down it cancels any pending futures, however at present it doesn't notify waiters. Thus their state stays as `CANCELLED` instead of `CANCELLED_AND_NOTIFIED` and any waiters are not awakened. Call `set_running_or_notify_cancel` on the cancelled futures to fix this.

…l shutdown

…process

Misc/NEWS.d/next/Library/2025-05-24-15-15-43.gh-issue-109934.WXOdC8.rst

…n the ProcessPoolExecutor in NEWS.

… blocking future has started before checking its status.

…ment resources.

duaneg · 2025-07-16T10:49:27Z

What on earth possessed me to say "I think I've managed to come up with [a unit test] that works works reliably", I have no idea. Utter foolishness. Oh well, we'll get there.

chrisvanrun · 2025-07-17T08:15:37Z

What on earth possessed me to say "I think I've managed to come up with [a unit test] that works works reliably", I have no idea. Utter foolishness. Oh well, we'll get there.

Been there, done that!

In my local tests I just added a time-based approach that sets the the process at 10 seconds OR fail directly: and a generic maximum runtime of the test on 4s. Which is fine for the current project which also has a 'kill all children' directly following the shutdown.

I think a good approach would perhaps be to add a generic Lock, have each process stall on that except for one. The one gets picked up in the first 'batch' and then immediately errors out. Then call for executor.shutdown and subsequent release the lock to free up the stalled processes.

You could then assert for the final n tasks to be correctly canceled. I suspect a runtime variance here is that the failed task might or might not free-up a worker for the next Lock-blocked task; but perhaps the exector blocked while a future's completeness is being handled?

duaneg · 2025-07-17T23:17:09Z

I think a good approach would perhaps be to add a generic Lock, have each process stall on that except for one. The one gets picked up in the first 'batch' and then immediately errors out. Then call for executor.shutdown and subsequent release the lock to free up the stalled processes.

Yeah, that is basically what the test does: submits a bunch of tasks, the first max_workers of which immediately block waiting on a barrier, so we know all workers are engaged (and blocked, so they will remain so), and that there are remaining tasks that are pending and hence will be cancelled. Then issue the shutdown, then release the barrier and unblock the workers.

You could then assert for the final n tasks to be correctly canceled. I suspect a runtime variance here is that the failed task might or might not free-up a worker for the next Lock-blocked task; but perhaps the exector blocked while a future's completeness is being handled?

This should be reliable, as the shutdown is initiated synchronously while all workers are blocked. At the point any of the tasks complete the executor must be shutdown and no additional pending tasks will be started.

However, there are lots of tricky details in ensuring this all works robustly. E.g. once the executor is shutdown use of any synchronisation primitives may fail, depending on timing and details of the implementation, so we have to handle BrokenBarrierError (both directly in the test and the indirectly if the workers hit it).

Also the internal multiprocessing machinery uses a work queue and considers tasks running (and no longer pending) once they are enqueued, even if they haven't been distributed to workers. It eagerly enqueues some extra tasks over-and-above the max number of workers to keep the pipeline filled. Those tasks will not be cancelled, but may or may not actually run, depending on timing.

Anyway, hopefully with all of that taken into account the test is now robust and reliable!

bedevere-app bot mentioned this pull request May 24, 2025

concurrent.futures.wait returns cancelled as "not done" #109934

Open

bedevere-app bot added the awaiting review label May 24, 2025

chrisvanrun mentioned this pull request Jul 14, 2025

ProcessPoolExecutor fails to notify when called with shutdown(cancel_futures=True) #136655

Open

duaneg added 3 commits July 15, 2025 22:35

Add reference to GH issue

bb89020

pythongh-136655: ensure cancelled futures are notified on process poo…

ff70843

…l shutdown

Ignore broken barrier errors when waiting on the barrier in the main …

df11bf3

…process

chrisvanrun reviewed Jul 15, 2025

View reviewed changes

Misc/NEWS.d/next/Library/2025-05-24-15-15-43.gh-issue-109934.WXOdC8.rst Show resolved Hide resolved

duaneg added 2 commits July 16, 2025 10:40

Catch BrokenBarrierError in another place it can be thrown and mentio…

ffdcb27

…n the ProcessPoolExecutor in NEWS.

Fix another race condition in the threading executor test: ensure the…

8e2b573

… blocking future has started before checking its status.

This comment was marked as spam.

Sign in to view

Attempt to make the unit test more robust and clean up process manage…

bc3751e

…ment resources.

Merge branch 'main' into pythongh-109934

ff0d951

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-109934: notify cancelled futures on thread pool shutdown #134618

gh-109934: notify cancelled futures on thread pool shutdown #134618

Uh oh!

duaneg commented May 24, 2025 •

edited by bedevere-app bot

Loading

Uh oh!

Uh oh!

This comment was marked as spam.

duaneg commented Jul 16, 2025

Uh oh!

chrisvanrun commented Jul 17, 2025 •

edited

Loading

Uh oh!

duaneg commented Jul 17, 2025

Uh oh!

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Uh oh!

gh-109934: notify cancelled futures on thread pool shutdown #134618

Are you sure you want to change the base?

gh-109934: notify cancelled futures on thread pool shutdown #134618

Uh oh!

Conversation

duaneg commented May 24, 2025 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

This comment was marked as spam.

duaneg commented Jul 16, 2025

Uh oh!

chrisvanrun commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

duaneg commented Jul 17, 2025

Uh oh!

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

duaneg commented May 24, 2025 •

edited by bedevere-app bot

Loading

chrisvanrun commented Jul 17, 2025 •

edited

Loading