-
-
Notifications
You must be signed in to change notification settings - Fork 8.3k
asyncio: Properly cancel the main task on exception #9870
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
asyncio: Properly cancel the main task on exception #9870
Conversation
Thanks for the contribution. Can you please provide a working example that shows why this patch is needed? The patch itself will impact overall performance of |
Consider this trivial example: import uasyncio as asyncio
async def main_loop():
try:
while True:
await asyncio.sleep(1)
print('.')
finally:
print('cleaning up')
asyncio.run(main_loop()) Run it and interrupt it with CTRL+C (ie. inject an unexpected exception). You'll notice the |
This issue is explained here in the tutorial. |
Sorry if my example was too trivial. Here's the actual issue I faced. Consider this code: import uasyncio as asyncio
async def handle_client(reader, writer):
while True:
msg = await reader.readline()
if not msg:
break
writer.write(msg)
async def main():
try:
server = await asyncio.start_server(handle_client, '127.0.0.1', 8888)
async with server:
await server.task
finally:
print('cleaning up')
asyncio.run(main()) Again, if interrupted, the |
This problem only arises with If fixing this costs performance, my vote (if I had one) would be for documenting it... |
I think assuming that only one exception may ever be raised while waiting for IO is unwise. I can see for one that extmod/uselect.c line 88 in poll_map_poll (called from poll_poll_internal, called from poll_ipoll, called from IOQueue.wait_io_event) may raise OSError. Is the performance hit from a try/except block really so expensive that we should consider not using it? How many times a second is a typical program generally going to run through this code? |
If performance truly is an issue, I also looked at writing it this way: def run_until_complete(main_task=None):
try:
return _run_until_complete(main_task)
except BaseException as e:
if main_task:
main_task.coro.throw(e)
raise (and rename the current |
4b45cac
to
4aec5dc
Compare
@peterhinch @dpgeorge sorry if I came across unprofessionally. I really would like to help resolve this confusing issue. I've revised it to use a trampoline entrance to |
4aec5dc
to
4a02876
Compare
You didn't, it's OK.
We need to consider if this is an issue that is worth fixing, or if it's an edge case that can be documented. I'm not sure at this stage. This is one of those issues that require some analysis and thinking. To make progress we need to write a test that shows the problem, and write some performance benchmarks to measure how much impact a solution has. For the test showing the problem, I found another way to trigger it, by making a polling object that fails during poll: try:
import io
import uasyncio as asyncio
except ImportError:
print("SKIP")
raise SystemExit
class Fail(io.IOBase):
def ioctl(self, req, param):
fail
async def wait(self):
yield asyncio.core._io_queue.queue_read(self)
async def main():
try:
await Fail().wait()
finally:
print("cleaning up")
asyncio.run(main()) (That won't run under CPython, only MicroPython.) I also wrote a quick benchmark: try:
import uasyncio as asyncio
except ImportError:
try:
import asyncio
except ImportError:
print("SKIP")
raise SystemExit
async def test(r):
for _ in r:
await asyncio.sleep_ms(0)
###########################################################################
# Benchmark interface
bm_params = {
(32, 10): (1000,),
(1000, 10): (10000,),
(5000, 10): (100000,),
}
def bm_setup(params):
(nloop,) = params
return lambda: asyncio.run(test(range(nloop))), lambda: (nloop // 100, None) (Also won't run under CPython due to I found that wrapping the whole Would be good to write another performance test which schedules many tasks at once, to measure the impact on that scenario as well. @greezybacon I don't expect you to do all the above, but that's what will be needed before a fix can be properly evaluated. |
I don't think I understand the performance concern, considering that the trampoline method proposed would only use the try/except block once for an If we're sweating microseconds in the asyncio loop, I think there is still room for improvement anyways. This change makes about a 10% improvement (in the diff --git a/extmod/uasyncio/core.py b/extmod/uasyncio/core.py
index 10a310809..242c8bbac 100644
--- a/extmod/uasyncio/core.py
+++ b/extmod/uasyncio/core.py
@@ -151,12 +151,15 @@ def run_until_complete(main_task=None):
global cur_task
excs_all = (CancelledError, Exception) # To prevent heap allocation in loop
excs_stop = (CancelledError, StopIteration) # To prevent heap allocation in loop
+ queue_peek = _task_queue.peek
+ queue_pop = _task_queue.pop
+ wait_io_event = _io_queue.wait_io_event
while True:
# Wait until the head of _task_queue is ready to run
dt = 1
while dt > 0:
dt = -1
- t = _task_queue.peek()
+ t = queue_peek()
if t:
# A task waiting on _task_queue; "ph_key" is time to schedule task at
dt = max(0, ticks_diff(t.ph_key, ticks()))
@@ -164,10 +167,10 @@ def run_until_complete(main_task=None):
# No tasks can be woken so finished running
return
# print('(poll {})'.format(dt), len(_io_queue.map))
- _io_queue.wait_io_event(dt)
+ wait_io_event(dt)
# Get next task to run and continue it
- t = _task_queue.pop()
+ t = queue_pop()
cur_task = t
try:
# Continue running the coroutine, it's responsible for rescheduling itself (since creating bound methods requires dynamic memory and lookup opcodes) |
From my perspective, reliable exception handling and context managers are a major design feature of Python. For the interpreter to bypass them is to me like finding this in a manual:
This statement may be extreme or dramatic, but I otherwise don't understand a design choice to avoid allowing the main async task to participate in a fatal exception arising from the internal selector. And, I do realize that forcing a reboot if the |
Yes you're right! It might be a good idea to tune that inner loop some more. But first write some good benchmarks. (Of course, that's a separate activity not for this PR.)
In MicroPython, doing |
OK, so I think I'm convinced that this needs to be fixed, and performance won't be affected much if at all (and can be regained in other ways if needed). With the current approach of having a What was your original fix exactly, with the try/except inside the main dispatch loop? |
@greezybacon it looks like you made a post then deleted it? From reading the thing you deleted, the patch you made to polling for the next event, I suspect the fairness tests failed. |
@dpgeorge interestingly enough, it passed all the tests but yet still didn't work right. It didn't handle the case where all tasks are waiting on IO with no timeout. Here's one that addresses it all better and is still a bit faster than the base case: diff --git a/extmod/uasyncio/core.py b/extmod/uasyncio/core.py
index 10a310809..4e3ab1f9e 100644
--- a/extmod/uasyncio/core.py
+++ b/extmod/uasyncio/core.py
@@ -151,23 +151,25 @@ def run_until_complete(main_task=None):
global cur_task
excs_all = (CancelledError, Exception) # To prevent heap allocation in loop
excs_stop = (CancelledError, StopIteration) # To prevent heap allocation in loop
+ queue_peek = _task_queue.peek
+ queue_pop = _task_queue.pop
+ wait_io_event = _io_queue.wait_io_event
while True:
# Wait until the head of _task_queue is ready to run
- dt = 1
- while dt > 0:
- dt = -1
- t = _task_queue.peek()
- if t:
- # A task waiting on _task_queue; "ph_key" is time to schedule task at
- dt = max(0, ticks_diff(t.ph_key, ticks()))
- elif not _io_queue.map:
- # No tasks can be woken so finished running
- return
- # print('(poll {})'.format(dt), len(_io_queue.map))
- _io_queue.wait_io_event(dt)
+ t = queue_peek()
+ if t:
+ # A task waiting on _task_queue; "ph_key" is time to schedule task at
+ dt = ticks_diff(t.ph_key, ticks())
+ if dt > 0:
+ wait_io_event(dt)
+ elif not _io_queue.map:
+ # No tasks can be woken so finished running
+ return
+ else:
+ wait_io_event(-1)
# Get next task to run and continue it
- t = _task_queue.pop()
+ t = queue_pop()
cur_task = t
try:
# Continue running the coroutine, it's responsible for rescheduling itself In my simple test, it seems to reduce the loop time from about 28us to about 23us on an (NXP RT1062 @ 600MHz) |
4a02876
to
615bf4a
Compare
Code size report:
|
Okay- I refactored it again to eliminate the trampoline and wrapped the entire main loop with the try/except block. On the performance front, I used this simple test: import uasyncio, time
async def count_switches(count=10000):
start = time.ticks_ms()
try:
for i in range(count):
await uasyncio.sleep_ms(0)
finally:
now = time.ticks_ms()
dur = now - start
if i:
msps = dur / i
print(f"There were {i} switches in {dur}ms, or {msps:.3f}ms per switch")
uasyncio.run(count_switches()) On a RP2040 running the main branch, the code block reports 0.319ms per switch. With the code in this PR, it reports 0.231ms. As an additional test, try interrupting the test with CTRL+C. With this code, it additionally runs the It turns out that the example failed the |
Code size report:
|
One more bit- I removed the call to |
Here's a echo socket server/client pair test: import sys
try:
from time import ticks_ms
import uasyncio as asyncio
except ImportError:
import asyncio
import time
ticks_ms = lambda: int(time.monotonic() * 1000)
transferred = 0
async def client():
async def readbehind(reader):
global transferred
while True:
block = await reader.read(128)
if not block:
break
transferred += len(block)
reader, writer = await asyncio.open_connection('localhost', 8888)
asyncio.create_task(readbehind(reader))
block = b'\x00' * 128
while True:
writer.write(block)
await writer.drain()
async def server(ready):
async def echo_server(reader, writer):
while True:
block = await reader.read(128)
if not block:
break
writer.write(block)
await writer.drain()
await asyncio.start_server(echo_server, 'localhost', 8888)
ready.set()
async def monitor():
global transferred
then = ticks_ms()
while True:
await asyncio.sleep(1)
dur = ticks_ms() - then
kbps = transferred / dur
sys.stderr.write(f'\rTransferred {transferred} bytes in {dur}ms or {kbps:.1f}kB/s')
async def main(num_clients=1):
ready = asyncio.Event()
asyncio.create_task(server(ready))
await ready.wait()
for _ in range(num_clients):
asyncio.create_task(client())
await monitor()
asyncio.run(main(10)) Running with 10 clients (about 30 async tasks in total), I get
|
extmod/uasyncio/core.py
Outdated
dt = max(0, ticks_diff(t.ph_key, ticks())) | ||
dt = ticks_diff(t.ph_key, ticks()) | ||
if dt > 0: | ||
_io_queue.wait_io_event(dt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is correct, because it may be possible to starve tasks waiting on an IO event. Eg if there are two tasks that keep rescheduling themselves immediately then we always get dt == 0
here and wait_io_event()
is never called. So tasks that are on the IO queue will remain there forever, even if they become readable/writable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If what I say above is true then we should write a test for this case, see it fail, then add that test to the existing test suite to catch such cases in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, @dpgeorge. I changed it to always call wait_io_event
but still avoid the call to max()
561adf6
to
48ceb05
Compare
Okay- I changed it to always call
Less impressive but also not slower. |
Here's a test that would fail if # Test fairness of scheduler, with tasks waiting on IO.
try:
import uasyncio as asyncio
import usocket as socket
except ImportError:
try:
import asyncio, socket
except ImportError:
print("SKIP")
raise SystemExit
async def task_sleep(id):
print("task start", id)
await asyncio.sleep(0)
print("task tick", id)
await asyncio.sleep(0)
print("task done", id)
async def task_socket(id):
print("task start", id)
s = asyncio.StreamReader(socket.socket())
try:
await s.read(1)
except OSError as er:
print("OSError")
print("task done", id)
async def main():
print("main start")
t1 = asyncio.create_task(task_sleep(1))
t2 = asyncio.create_task(task_sleep(2))
t3 = asyncio.create_task(task_socket(3))
await asyncio.sleep(0)
print("main tick")
await asyncio.sleep(0)
print("main tick")
await asyncio.sleep(0)
print("main done")
asyncio.run(main()) |
extmod/uasyncio/core.py
Outdated
# Keep scheduling tasks until there are none left to schedule | ||
def run_until_complete(main_task=None): | ||
global cur_task | ||
excs_all = (CancelledError, Exception) # To prevent heap allocation in loop | ||
excs_stop = (CancelledError, StopIteration) # To prevent heap allocation in loop | ||
_io_queue.wait_io_event(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this still needed, now that wait_io_event()
is always called below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. I removed it. I thought I needed it for one of the tests, but they all pass without it.
extmod/uasyncio/core.py
Outdated
# Wait until the head of _task_queue is ready to run | ||
dt = 1 | ||
while dt > 0: | ||
dt = -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure this while dt > 0
loop is there for a reason... what if wait_io_event()
returns early yet does not queue anything (is that possible?), then the next available task on the run queue will be scheduled early.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would be a bug in the poller::ipoll
method, but I certainly can't fault your careful approach. Thank you for bringing it up.
It took me a while to understand how the code works the way you describe:
- (assuming there is one task waiting with a ph_key delta of 1000ms)
- dt is set to 1
- LOOP (dt == 1)
- Check if a task is waiting
- Calculate dt (will be set to 1000)
- Run
wait_io_event
with dt := 1000 - LOOP (dt == 1000)
- Check if a task is waiting
- Calculate dt (will be ideally set to <= 0, but could be a positive number)
- Run
wait_io_event
with dt := 0 - EXIT LOOP (dt <= 0)
With the logic I'm proposing:
- (assuming there is one task waiting with a ph_key delta of 1000ms)
- LOOP
- Check if a task is waiting
- Calculate dt (will be set to 1000)
- Run
wait_io_event
with dt := 1000 - LOOP
- Calculate dt (will ideally be set to <= 0, but could be a positive number))
- Run
wait_io_event
with dt := 0 - BREAK if dt <= 0
I also took the liberty of making slight optimizations to the wait_io_queue
method to reduce some opcode overhead by avoiding lookups and calculations on the POLLIN and POLLOUT constants as well as unpacking the map entry instead of using item lookups to fetch [0] and [1] repeatedly.
Here's a simple benchmark script I used with results:
import asyncio
counter = 0
async def count():
global counter
while True:
await asyncio.sleep_ms(0)
counter += 1
try:
asyncio.run(asyncio.wait_for(count(), timeout=2))
finally:
print(counter, 2e6/counter)
(unix) Apple M2 -------
WITH PATCH:
schedules | us/schedule |
---|---|
236425 | 8.459 |
ON v1.23.0-preview.203.gd712feb68
schedules | us/schedule |
---|---|
219436 | 9.114 |
(rp2) RP2040 (@125MHz) ---------
WITH PATCH
schedules | us/schedule |
---|---|
8659 | 230.9736 |
8390 | 238.379 |
8919 | 224.2404 |
ON v1.22.2
schedules | us/schedule |
---|---|
6442 | 310.4626 |
And a more cooperative test requiring IO waiting:
flag = asyncio.ThreadSafeFlag()
async def sender():
while True:
flag.set()
await asyncio.sleep_ms(0)
async def recv():
global counter
while True:
await flag.wait()
counter += 1
counter = 0
try:
asyncio.create_task(sender())
asyncio.run(asyncio.wait_for(recv(), timeout=2))
finally:
print(counter, 2e6/counter)
(rp2) RP2040 (@125MHz) ---------
WITH PATCH
schedules | us/schedule |
---|---|
1675 | 1194.03 |
1650 | 1212.121 |
ON v1.22.2
schedules | us/schedule |
---|---|
1509 | 1325.381 |
1506 | 1328.021 |
Not sure if these are the best tests to demonstrate overall performance of asyncio, but I wanted to demonstrate that the patch isn't slower than the baseline.
Cheers,
48ceb05
to
b6ae725
Compare
b6ae725
to
f74001c
Compare
b23abd5
to
4a8aa48
Compare
If the main task is interrupted by e.g. a KeyboardInterrupt, then the main task needs to have the exception injected into it so it will run the exception handlers and contextmanager __aexit__ methods. Additionally, if an error is encountered in the poll loop, it will be injected into the main task. Signed-off-by: Jared Hancock <jared@greezybacon.me>
@dpgeorge in the sprit of making this mergable, I rebased it with a less aggressive approach leaving the core of the event loop unchanged. The changes proposed are strictly performance motivated and do not change the logic of the Performance-wise, it's definitely faster- 20-25% for the tight loop time and 3-20% for the two tasks synchronized with a ThreadSafeFlag:
Not sure why Unix performs poorly on the ThreadSafeFlag test. Times were calculated from the total run time (2 seconds) divided by the number of loop iterations. FWIW- I smoked a $25 stepper motor a couple months ago. It was running a calibration test while rotating slowly at moderate current. I interrupted the test and walked away for a couple hours. What I didn't realize is that the motion controller (TMC5160) was not instructed to stop because of this issue. When I returned later to continue my coding, the motor was hot enough to burn me and melted the plastic it was set on. It was my mistake. I realize now I should have reset the microcontroller and cut the power to the motor. I believe this would add to the professionalism of this project overall while also improving the performance of |
4a8aa48
to
de20c36
Compare
Actually, on a less dramatic note, I believe the I believe we need external access to all the running tasks in a loop so they could all be canceled upon exception/interruption, like this (in spirit): try:
asyncio.run_until_complete()
except:
for t in asyncio.all_tasks():
if not t.done():
t.cancel()
asyncio.run_until_complete()
# Should never get here- trigger reset
machine.soft_reset() |
@greezybacon sorry I dropped the ball on this one, and didn't reply to a lot of your comments/questions above.
Is that improvement still valid? That link no longer works, so I can't really review it at this stage. Aside from performance improvements, the original issue this PR was addressing is still not fixed. Did you want to try a different way to fix it, based on your comment #9870 (comment) ? |
@dpgeorge I implemented I think this approach should capture the spirit of what I set out to do in a performant manner that is also consistent with the CPython behavior. |
I opened #17699 to continue the discussion on the performance improvement. |
If the main task is interrupted by e.g. a KeyboardInterrupt, then the main task needs to have the exception injected into it so it will run the exception handlers and contextmanager
__aexit__
methods.