Skip to content

asyncio: slight optimizations for run_until_complete and sleep_ms #17699

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

greezybacon
Copy link
Contributor

@greezybacon greezybacon commented Jul 17, 2025

Summary

This is aimed at improving the loop timing of the asyncio core loop. It makes a few small optimizations to the core and realizes about a 20% impact in overall performance.

  • In the IO poll method, the POLLIN and POLLOUT constants are looked up in the local module context rather than in the select module when used.
  • In sleep_ms, max is not used for each call. Instead, an if expression handles the case when t is negative.
  • In run_until_complete, a call to max is avoided
  • In run_until_complete the methods for the task and IO queues are only looked up once.

Testing

I ran two tests on three platforms. Source code is given below. The tight-loop just runs a single task as quickly as possible. The second task uses a ThreadSafeFlag to run two tasks as quickly as possible but requires IO polling between the tasks.

test platform base (v1.25.0) PR Change
tight-loop unix (ubuntu 22.04 on Mac M2) 1.45us 1.05us -28%
mimxrt (Teensy 4.1) 49us 32us -34%
rp2 (W5100S EVB PICO @ 125MHz) 621us 476us -23.3%
io-poll unix 2724us 2724us (none)
mimxrt 252us 199us -21%
rp2 2107us 1713us -18.7%

tight-loop test

import asyncio

async def count():
    global counter
    while True:
        await asyncio.sleep_ms(0)
        counter += 1

try:
    counter = 0
    asyncio.run(asyncio.wait_for(count(), timeout=2))
finally:
    print(counter, 2e6/counter)

io-poll test

import asyncio

flag = asyncio.ThreadSafeFlag()
async def sender():
    while True:
        flag.set()
        await asyncio.sleep_ms(0)

async def recv():
    global counter
    while True:
        await flag.wait()
        counter += 1

counter = 0
try:
    asyncio.create_task(sender())
    asyncio.run(asyncio.wait_for(recv(), timeout=2))
finally:
    if counter:
        print(counter, 2e6/counter)

Calculate ~POLLIN and ~POLLOUT as constants to remove the runtime cost
of continuously calculating them. And unpack the queue entry rather than
using repeated item lookups.

Additionally, avoid call to max() in sleep_ms. Generally, the waittime
specified will not be negative, so the call to `max` should generally
not be needed. Instead, the code will either call `ticks_add` if `t` is
positive or else use the current ticks time.
@greezybacon greezybacon changed the title asyncio: Make slight optimizations for IOQueue.wait_io_event asyncio: slight optimizations for run_until_complete and sleep_ms Jul 17, 2025
Copy link

Code size report:

   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:   +80 +0.009% standard
      stm32:   +36 +0.009% PYBV10
     mimxrt:   +32 +0.009% TEENSY40
        rp2:   +32 +0.003% RPI_PICO_W
       samd:   +40 +0.015% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:    +0 +0.000% VIRT_RV32

Copy link

codecov bot commented Jul 17, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.44%. Comparing base (f498a16) to head (b4a3017).
Report is 387 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #17699      +/-   ##
==========================================
- Coverage   98.54%   98.44%   -0.10%     
==========================================
  Files         169      171       +2     
  Lines       21890    22208     +318     
==========================================
+ Hits        21571    21863     +292     
- Misses        319      345      +26     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@@ -54,7 +55,8 @@ def __next__(self):
# Use a SingletonGenerator to do it without allocating on the heap
def sleep_ms(t, sgen=SingletonGenerator()):
assert sgen.state is None
sgen.state = ticks_add(ticks(), max(0, t))
now = ticks()
sgen.state = ticks_add(now, t) if t > 0 else now
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this give a measurable speed improvement? Is it worth it for the cost in code size?

I measure this change here as +5 bytes to the bytecode. The most taken path will be when ticks_add() needs to be called, which goes from 12 opcodes previously to now 16 opcodes. It's usually the opcode overhead that's slow, rather than the actual call (eg out to max, which should be quick with two small int args). So I would guess that this change actually makes things a little slower.

dt = max(0, ticks_diff(t.ph_key, ticks()))
dt = ticks_diff(t.ph_key, ticks())
if dt < 0:
dt = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, does this change here actually make things faster?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy