Skip to content

asyncio: slight optimizations for run_until_complete and sleep_ms #17699

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

greezybacon
Copy link
Contributor

@greezybacon greezybacon commented Jul 17, 2025

Summary

This is aimed at improving the loop timing of the asyncio core loop. It makes a few small optimizations to the core and realizes about a 20% impact in overall performance.

  • In the IO poll method, the POLLIN and POLLOUT constants are looked up in the local module context rather than in the select module when used.
  • In sleep_ms, max is not used for each call. Instead, an if expression handles the case when t is negative.
  • In run_until_complete, a call to max is avoided
  • In run_until_complete the methods for the task and IO queues are only looked up once.

Testing

I ran two tests on three platforms. Source code is given below. The tight-loop just runs a single task as quickly as possible. The second task uses a ThreadSafeFlag to run two tasks as quickly as possible but requires IO polling between the tasks.

test platform base (v1.25.0) PR Change
tight-loop unix (ubuntu 22.04 on Mac M2) 1.45us 1.05us -28%
mimxrt (Teensy 4.1) 49us 32us -34%
rp2 (W5100S EVB PICO @ 125MHz) 621us 476us -23.3%
io-poll unix 2724us 2724us (none)
mimxrt 252us 199us -21%
rp2 2107us 1713us -18.7%

tight-loop test

import asyncio

async def count():
    global counter
    while True:
        await asyncio.sleep_ms(0)
        counter += 1

try:
    counter = 0
    asyncio.run(asyncio.wait_for(count(), timeout=2))
finally:
    print(counter, 2e6/counter)

io-poll test

import asyncio

flag = asyncio.ThreadSafeFlag()
async def sender():
    while True:
        flag.set()
        await asyncio.sleep_ms(0)

async def recv():
    global counter
    while True:
        await flag.wait()
        counter += 1

counter = 0
try:
    asyncio.create_task(sender())
    asyncio.run(asyncio.wait_for(recv(), timeout=2))
finally:
    if counter:
        print(counter, 2e6/counter)

@greezybacon greezybacon changed the title asyncio: Make slight optimizations for IOQueue.wait_io_event asyncio: slight optimizations for run_until_complete and sleep_ms Jul 17, 2025
Copy link

github-actions bot commented Jul 17, 2025

Code size report:

   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:  +112 +0.013% standard
      stm32:   +28 +0.007% PYBV10
     mimxrt:   +32 +0.009% TEENSY40
        rp2:   +32 +0.003% RPI_PICO_W
       samd:   +40 +0.015% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:    +0 +0.000% VIRT_RV32

Copy link

codecov bot commented Jul 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.38%. Comparing base (f498a16) to head (f7c769c).
⚠️ Report is 452 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #17699      +/-   ##
==========================================
- Coverage   98.54%   98.38%   -0.16%     
==========================================
  Files         169      171       +2     
  Lines       21890    22239     +349     
==========================================
+ Hits        21571    21880     +309     
- Misses        319      359      +40     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@@ -54,7 +55,8 @@ def __next__(self):
# Use a SingletonGenerator to do it without allocating on the heap
def sleep_ms(t, sgen=SingletonGenerator()):
assert sgen.state is None
sgen.state = ticks_add(ticks(), max(0, t))
now = ticks()
sgen.state = ticks_add(now, t) if t > 0 else now
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this give a measurable speed improvement? Is it worth it for the cost in code size?

I measure this change here as +5 bytes to the bytecode. The most taken path will be when ticks_add() needs to be called, which goes from 12 opcodes previously to now 16 opcodes. It's usually the opcode overhead that's slow, rather than the actual call (eg out to max, which should be quick with two small int args). So I would guess that this change actually makes things a little slower.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dpgeorge thank you for asking. I spent too long working on this. It turns out that it does make a performance improvement on all platforms (ie. both Unix and MCUs), but it isn't substantial. I'm happy to remove it and search for a more substantial impact.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for investigating. Unless it's a significant improvement, I'd prefer to leave it as-is (ie prefer shorter bytecode over performance).

Calculate ~POLLIN and ~POLLOUT as constants to remove the runtime cost
of continuously calculating them. And unpack the queue entry rather than
using repeated item lookups.

Additionally, avoid call to max() in sleep_ms. Generally, the waittime
specified will not be negative, so the call to `max` should generally
not be needed. Instead, the code will either call `ticks_add` if `t` is
positive or else use the current ticks time.
@greezybacon greezybacon force-pushed the fix/improve-asyncio-core-loop-perf branch from b4a3017 to f7c769c Compare July 28, 2025 02:05
@dpgeorge dpgeorge added the extmod Relates to extmod/ directory in source label Jul 28, 2025
@dpgeorge dpgeorge added this to the release-1.27.0 milestone Jul 28, 2025
Copy link
Contributor

@jepler jepler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions, but please feel free to ignore.

@@ -3,6 +3,7 @@

from time import ticks_ms as ticks, ticks_diff, ticks_add
import sys, select
from select import POLLIN, POLLOUT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alas I don't think it works to write POLLIN = const(select.POLLIN). I wonder if POLLIN = const(1); assert POLLIN == select.POLLIN benefits performance enough that it would be worth doing. (is micropython asyncio supposed to be cpython compatible? I guess there's no guarantee of the value of POLLIN/POLLOUT constants there. But there's no time.ticks_ms so probably this is a non-goal)

else:
sm = self.map[id(s)]
assert sm[idx] is None
assert sm[1 - idx] is not None
sm[idx] = cur_task
self.poller.modify(s, select.POLLIN | select.POLLOUT)
self.poller.modify(s, POLLIN | POLLOUT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any measurable benefit to having POLLANY = POLLIN | POLLOUT to avoid the calculation here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extmod Relates to extmod/ directory in source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy