Skip to content

Commit cb39410

Browse files
Initial internal asyncio docs (#135469)
Currently focused on `_asynciomodule.c` but could also receive updates about internals of the Python package.
1 parent 269e19e commit cb39410

File tree

2 files changed

+216
-0
lines changed

2 files changed

+216
-0
lines changed

InternalDocs/README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,3 +41,9 @@ Program Execution
4141
- [Garbage Collector Design](garbage_collector.md)
4242

4343
- [Exception Handling](exception_handling.md)
44+
45+
46+
Modules
47+
---
48+
49+
- [asyncio](asyncio.md)

InternalDocs/asyncio.md

Lines changed: 210 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,210 @@
1+
asyncio
2+
=======
3+
4+
5+
This document describes the working and implementation details of C
6+
implementation of the
7+
[`asyncio`](https://docs.python.org/3/library/asyncio.html) module.
8+
9+
10+
## Pre-Python 3.14 implementation
11+
12+
Before Python 3.14, the C implementation of `asyncio` used a
13+
[`WeakSet`](https://docs.python.org/3/library/weakref.html#weakref.WeakSet)
14+
to store all the tasks created by the event loop. `WeakSet` was used
15+
so that the event loop doesn't hold strong references to the tasks,
16+
allowing them to be garbage collected when they are no longer needed.
17+
The current task of the event loop was stored in a dict mapping the
18+
event loop to the current task.
19+
20+
```c
21+
/* Dictionary containing tasks that are currently active in
22+
all running event loops. {EventLoop: Task} */
23+
PyObject *current_tasks;
24+
25+
/* WeakSet containing all tasks scheduled to run on event loops. */
26+
PyObject *scheduled_tasks;
27+
```
28+
29+
This implementation had a few drawbacks:
30+
31+
1. **Performance**: Using a `WeakSet` for storing tasks is
32+
inefficient, as it requires maintaining a full set of weak references
33+
to tasks along with corresponding weakref callback to cleanup the
34+
tasks when they are garbage collected. This increases the work done
35+
by the garbage collector, and in applications with a large number of
36+
tasks, this becomes a bottleneck, with increased memory usage and
37+
lower performance. Looking up the current task was slow as it required
38+
a dictionary lookup on the `current_tasks` dict.
39+
40+
2. **Thread safety**: Before Python 3.14, concurrent iterations over
41+
`WeakSet` was not thread safe[^1]. This meant calling APIs like
42+
`asyncio.all_tasks()` could lead to inconsistent results or even
43+
`RuntimeError` if used in multiple threads[^2].
44+
45+
3. **Poor scaling in free-threading**: Using global `WeakSet` for
46+
storing all tasks across all threads lead to contention when adding
47+
and removing tasks from the set which is a frequent operation. As such
48+
it performed poorly in free-threading and did not scale well with the
49+
number of threads. Similarly, accessing the current task in multiple
50+
threads did not scale due to contention on the global `current_tasks`
51+
dictionary.
52+
53+
## Python 3.14 implementation
54+
55+
To address these issues, Python 3.14 implements several changes to
56+
improve the performance and thread safety of tasks management.
57+
58+
- **Per-thread double linked list for tasks**: Python 3.14 introduces
59+
a per-thread circular double linked list implementation for
60+
storing tasks. This allows each thread to maintain its own list of
61+
tasks and allows for lock free addition and removal of tasks. This
62+
is designed to be efficient, and thread-safe and scales well with
63+
the number of threads in free-threading. This also allows external
64+
introspection tools such as `python -m asyncio pstree` to inspect
65+
tasks running in all threads and was implemented as part of [Audit
66+
asyncio thread
67+
safety](https://github.com/python/cpython/issues/128002).
68+
69+
- **Per-thread current task**: Python 3.14 stores the current task on
70+
the current thread state instead of a global dictionary. This
71+
allows for faster access to the current task without the need for
72+
a dictionary lookup. Each thread maintains its own current task,
73+
which is stored in the `PyThreadState` structure. This was
74+
implemented in https://github.com/python/cpython/issues/129898.
75+
76+
Storing the current task and list of all tasks per-thread instead of
77+
storing it per-loop was chosen primarily to support external
78+
introspection tools such as `python -m asyncio pstree` as looking up
79+
arbitrary attributes on the loop object is not possible
80+
externally. Storing data per-thread also makes it easy to support
81+
third party event loop implementations such as `uvloop`, and is more
82+
efficient for the single threaded asyncio use-case as it avoids the
83+
overhead of attribute lookups on the loop object and several other
84+
calls on the performance critical path of adding and removing tasks
85+
from the per-loop task list.
86+
87+
## Per-thread double linked list for tasks
88+
89+
This implementation uses a circular doubly linked list to store tasks
90+
on the thread states. This is used for all tasks which are instances
91+
of `asyncio.Task` or subclasses of it, for third-party tasks a
92+
fallback `WeakSet` implementation is used. The linked list is
93+
implemented using an embedded `llist_node` structure within each
94+
`TaskObj`. By embedding the list node directly into the task object,
95+
the implementation avoids additional memory allocations for linked
96+
list nodes.
97+
98+
The `PyThreadState` structure gained a new field `asyncio_tasks_head`,
99+
which serves as the head of the circular linked list of tasks. This
100+
allows for lock free addition and removal of tasks from the list.
101+
102+
It is possible that when a thread state is deallocated, there are
103+
lingering tasks in its list; this can happen if another thread has
104+
references to the tasks of this thread. Therefore, the
105+
`PyInterpreterState` structure also gains a new `asyncio_tasks_head`
106+
field to store any lingering tasks. When a thread state is
107+
deallocated, any remaining lingering tasks are moved to the
108+
interpreter state tasks list, and the thread state tasks list is
109+
cleared. The `asyncio_tasks_lock` is used protect the interpreter's
110+
tasks list from concurrent modifications.
111+
112+
```c
113+
typedef struct TaskObj {
114+
...
115+
struct llist_node asyncio_node;
116+
} TaskObj;
117+
118+
typedef struct PyThreadState {
119+
...
120+
struct llist_node asyncio_tasks_head;
121+
} PyThreadState;
122+
123+
typedef struct PyInterpreterState {
124+
...
125+
struct llist_node asyncio_tasks_head;
126+
PyMutex asyncio_tasks_lock;
127+
} PyInterpreterState;
128+
```
129+
130+
When a task is created, it is added to the current thread's list of
131+
tasks by the `register_task` function. When the task is done, it is
132+
removed from the list by the `unregister_task` function. In
133+
free-threading, the thread id of the thread which created the task is
134+
stored in `task_tid` field of the `TaskObj`. This is used to check if
135+
the task is being removed from the correct thread's task list. If the
136+
current thread is same as the thread which created it then no locking
137+
is required, otherwise in free-threading, the `stop-the-world` pause
138+
is used to pause all other threads and then safely remove the task
139+
from the tasks list.
140+
141+
```mermaid
142+
143+
flowchart TD
144+
subgraph one["Executing Thread"]
145+
A["task = asyncio.create_task(coro())"] -->B("register_task(task)")
146+
B --> C{"task->task_state?"}
147+
C -->|pending| D["task_step(task)"]
148+
C -->|done| F["unregister_task(task)"]
149+
C -->|cancelled| F["unregister_task(task)"]
150+
D --> C
151+
F --> G{"free-threading?"}
152+
G --> |false| H["unregister_task_safe(task)"]
153+
G --> |true| J{"correct thread? <br>task->task_tid == _Py_ThreadId()"}
154+
J --> |true| H
155+
J --> |false| I["stop the world <br> pause all threads"]
156+
I --> H["unregister_task_safe(task)"]
157+
end
158+
subgraph two["Thread deallocating"]
159+
A1{"thread's task list empty? <br> llist_empty(tstate->asyncio_tasks_head)"}
160+
A1 --> |true| B1["deallocate thread<br>free_threadstate(tstate)"]
161+
A1 --> |false| C1["add tasks to interpreter's task list<br> llist_concat(&tstate->interp->asyncio_tasks_head,tstate->asyncio_tasks_head)"]
162+
C1 --> B1
163+
end
164+
165+
one --> two
166+
```
167+
168+
`asyncio.all_tasks` now iterates over the per-thread task lists of all
169+
threads and the interpreter's task list to get all the tasks. In
170+
free-threading, this is done by pausing all the threads using the
171+
`stop-the-world` pause to ensure that no tasks are being added or
172+
removed while iterating over the lists. This allows for a consistent
173+
view of all task lists across all threads and is thread safe.
174+
175+
This design allows for lock free execution and scales well in
176+
free-threading with multiple event loops running in different threads.
177+
178+
## Per-thread current task
179+
180+
This implementation stores the current task in the `PyThreadState`
181+
structure, which allows for faster access to the current task without
182+
the need for a dictionary lookup.
183+
184+
```c
185+
typedef struct PyThreadState {
186+
...
187+
PyObject *asyncio_current_loop;
188+
PyObject *asyncio_current_task;
189+
} PyThreadState;
190+
```
191+
192+
When a task is entered or left, the current task is updated in the
193+
thread state using `enter_task` and `leave_task` functions. When
194+
`current_task(loop)` is called where `loop` is the current running
195+
event loop of the current thread, no locking is required as the
196+
current task is stored in the thread state and is returned directly
197+
(general case). Otherwise, if the `loop` is not current running event
198+
loop, the `stop-the-world` pause is used to pause all threads in
199+
free-threading and then by iterating over all the thread states and
200+
checking if the `loop` matches with `tstate->asyncio_current_loop`,
201+
the current task is found and returned. If no matching thread state is
202+
found, `None` is returned.
203+
204+
In free-threading, it avoids contention on a global dictionary as
205+
threads can access the current task of thier running loop without any
206+
locking.
207+
208+
209+
[^1]: https://github.com/python/cpython/issues/123089
210+
[^2]: https://github.com/python/cpython/issues/80788

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy