-
-
Notifications
You must be signed in to change notification settings - Fork 8.3k
asyncio: slight optimizations for run_until_complete
and sleep_ms
#17699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
asyncio: slight optimizations for run_until_complete
and sleep_ms
#17699
Conversation
run_until_complete
and sleep_ms
Code size report:
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #17699 +/- ##
==========================================
- Coverage 98.54% 98.38% -0.16%
==========================================
Files 169 171 +2
Lines 21890 22239 +349
==========================================
+ Hits 21571 21880 +309
- Misses 319 359 +40 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
extmod/asyncio/core.py
Outdated
@@ -54,7 +55,8 @@ def __next__(self): | |||
# Use a SingletonGenerator to do it without allocating on the heap | |||
def sleep_ms(t, sgen=SingletonGenerator()): | |||
assert sgen.state is None | |||
sgen.state = ticks_add(ticks(), max(0, t)) | |||
now = ticks() | |||
sgen.state = ticks_add(now, t) if t > 0 else now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this give a measurable speed improvement? Is it worth it for the cost in code size?
I measure this change here as +5 bytes to the bytecode. The most taken path will be when ticks_add()
needs to be called, which goes from 12 opcodes previously to now 16 opcodes. It's usually the opcode overhead that's slow, rather than the actual call (eg out to max
, which should be quick with two small int args). So I would guess that this change actually makes things a little slower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dpgeorge thank you for asking. I spent too long working on this. It turns out that it does make a performance improvement on all platforms (ie. both Unix and MCUs), but it isn't substantial. I'm happy to remove it and search for a more substantial impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for investigating. Unless it's a significant improvement, I'd prefer to leave it as-is (ie prefer shorter bytecode over performance).
Calculate ~POLLIN and ~POLLOUT as constants to remove the runtime cost of continuously calculating them. And unpack the queue entry rather than using repeated item lookups. Additionally, avoid call to max() in sleep_ms. Generally, the waittime specified will not be negative, so the call to `max` should generally not be needed. Instead, the code will either call `ticks_add` if `t` is positive or else use the current ticks time.
b4a3017
to
f7c769c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few suggestions, but please feel free to ignore.
@@ -3,6 +3,7 @@ | |||
|
|||
from time import ticks_ms as ticks, ticks_diff, ticks_add | |||
import sys, select | |||
from select import POLLIN, POLLOUT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alas I don't think it works to write POLLIN = const(select.POLLIN)
. I wonder if POLLIN = const(1); assert POLLIN == select.POLLIN
benefits performance enough that it would be worth doing. (is micropython asyncio supposed to be cpython compatible? I guess there's no guarantee of the value of POLLIN/POLLOUT constants there. But there's no time.ticks_ms
so probably this is a non-goal)
else: | ||
sm = self.map[id(s)] | ||
assert sm[idx] is None | ||
assert sm[1 - idx] is not None | ||
sm[idx] = cur_task | ||
self.poller.modify(s, select.POLLIN | select.POLLOUT) | ||
self.poller.modify(s, POLLIN | POLLOUT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any measurable benefit to having POLLANY = POLLIN | POLLOUT
to avoid the calculation here?
Summary
This is aimed at improving the loop timing of the asyncio core loop. It makes a few small optimizations to the core and realizes about a 20% impact in overall performance.
select
module when used.sleep_ms
,max
is not used for each call. Instead, anif
expression handles the case whent
is negative.run_until_complete
, a call tomax
is avoidedrun_until_complete
the methods for the task and IO queues are only looked up once.Testing
I ran two tests on three platforms. Source code is given below. The tight-loop just runs a single task as quickly as possible. The second task uses a ThreadSafeFlag to run two tasks as quickly as possible but requires IO polling between the tasks.
tight-loop test
io-poll test