-
-
Notifications
You must be signed in to change notification settings - Fork 32.5k
GH-136410: Faster side exits #136411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
GH-136410: Faster side exits #136411
Conversation
// from being immediately detected as cold and invalidated. | ||
cold->vm_data.warm = true; | ||
if (_PyJIT_Compile(cold, cold->trace, 1)) { | ||
Py_DECREF(cold); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's immortal. Decrefing won't do anything. I think you mean to call the dealloc function on it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot. I think I need to move making it immortal to after it is fully created.
@pablogsal any ideas what this failure means? https://github.com/python/cpython/actions/runs/16140914160/job/45547999926?pr=136411 |
Is a race condition in the tests that will be fixed by #136347 |
Performance is in the noise:
However, I wouldn't expect much of a speedup by reducing the size of exits, so this seems reasonable. |
I think the benchmarking public mirror is down. Can you please share the link when it comes up again? |
Note to self: The cold exit executor should be freed when freeing the interpreter to avoid leaking memory. |
executor->exits[i].temperature = initial_temperature_backoff_counter(); | ||
executor->exits[i].executor = cold; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We put cold_executor
here without INCREF because cold_executor
is immortal. But in executor_clear
we DECREF each executor from exits
because there can be other executors as well. But this is imbalance of INCREF/DECREF. Maybe it is worth to add a comment here that we omit INCREF because of immortality of cold executor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of notes. I like the general idea, not a huge fan of the jit_exit
side-channel (but I get why we have it).
Py_FatalError("Cannot allocate core JIT code"); | ||
} | ||
#endif | ||
_Py_SetImmortal((PyObject *)cold); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it need to be immortal? This means that we'll leak one of these per interpreter, along with about a page of JIT code. I think the interpreter can just hold a normal reference that we free at shutdown, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is cleared when the interpreter is freed https://github.com/python/cpython/pull/136411/files#diff-7ac11e526f79b42d6ea9d3592cb99da46775640c69fa5510f4a6de87cced7141R818
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then why is it immortal? I'm worried it could get shared at some point, which is generally safe to do with immortal objects. An immortal object that gets cleared at interpreter shutdown seems like it could lead to hard-to-debug problems down the road.
Maybe just not make it immortal, and refcount it normally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is immortal: it outlives all mortal objects in the same interpreter. Reference counting it is just extra overhead.
} | ||
exit->temperature = initial_temperature_backoff_counter(); | ||
} | ||
assert(tstate->jit_exit == exit); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe set it to NULL
now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do, but I'd rather leave it pointing to the last exit.
Cold exit relies on jit_exit
being the last exit, and I can imagine us adding "compile me now" stubs later that would also rely on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big deal, but I'd lean towards clearing it here. A possibly-dangling pointer seems dangerous, and we could assert it's NULL
when setting it to make sure we didn't mess up somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is only meaningful when starting an executor, and it will always have been set to a live executor at that point.
I'd like to keep it this way as it will simplify changing it to be passed in a register with TOS caching.
When you're done making the requested changes, leave the comment: |
Once we have TOS caching we can pass the exit in one of the cache registers. On occasion we will need to spill a value on the stack, but it should be mostly free. |
With the changes to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple suggestions for possible improvements (and in the discussions above), but nothing blocking. This is a good change, thanks!
[_START_EXECUTOR] = HAS_DEOPT_FLAG | HAS_ESCAPES_FLAG, | ||
[_MAKE_WARM] = 0, | ||
[_FATAL_ERROR] = 0, | ||
[_DEOPT] = 0, | ||
[_ERROR_POP_N] = HAS_ARG_FLAG, | ||
[_TIER2_RESUME_CHECK] = HAS_DEOPT_FLAG, | ||
[_COLD_EXIT] = HAS_ESCAPES_FLAG, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you mark _PyExecutor_ClearExit
and _PyExecutor_FromExit
as non-escaping?
...also, am I the only one around here who reviews generated code? ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah wait, _PyExecutor_ClearExit
can escape.
That's annoying... the traces are going to have a _CHECK_VALIDITY
op after the _START_EXECUTOR
anyways, because the deopt path in _START_EXECUTOR
can escape. But it will never actually reach the next instruction. Seems like something that should be addressed, actually.
Maybe just go back to checking in _EXIT_TRACE
like we were before? It's going to get checked anyways, so might as well just do it there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...also, am I the only one around here who reviews generated code? ;)
Maybe 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking validity in _START_EXECUTOR
is much more efficient than in _EXIT_TRACE
as it is only a single predictable branch.
_PyExecutorObject *e = executor->exits[i].executor; | ||
executor->exits[i].executor = cold; | ||
Py_DECREF(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use _PyExecutor_ClearExit
here to avoid repeating the same logic twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that couples the two functions, and they do have distinct uses.
It can't really escape, because it is on the deopt path. But that's a separate issue: #137276 |
This PR reinstates the
_COLD_EXIT
uop, but this time expects the exit to be passed, not the executor.This way we only need one jitted stub, not hundreds.
Using stubs hugely simplifies
_EXIT_TRACE
as all it needs to do is jump to the exit's executor.The x86-64 stencil for
_EXIT_TRACE
shrinks from 384 bytes to 36 bytes, although it does require one extra_CHECK_VALIDITY
(19 bytes) to be added to each trace.Since traces often contain multiple
_EXIT_TRACE
s this is a substantial space saving.