Skip to content

gh-136681: make argsbuf static to speedup processing positional arguments with AC #136732

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

skirpichev
Copy link
Contributor

@skirpichev skirpichev commented Jul 17, 2025

@skirpichev
Copy link
Contributor Author

benchmark: #136681 (comment)

@ZeroIntensity
Copy link
Member

Won't this break when called concurrently?

@skirpichev
Copy link
Contributor Author

Won't this break when called concurrently?

Yes, apparently it does :-( Probably, this can't be fixed easily.

Alternative approach should work: #136681 (comment). The _PyArg_UnpackKeywords() is implemented as a macro (which encodes fast-path) and a function. So, the if condition should keep negation of the fast-path condition in the macro (which can be removed).

@skirpichev skirpichev closed this Jul 19, 2025
@skirpichev skirpichev deleted the ac-argsbuf/136681 branch July 19, 2025 08:22
@skirpichev skirpichev restored the ac-argsbuf/136681 branch July 29, 2025 14:22
@skirpichev
Copy link
Contributor Author

Hmm, @ZeroIntensity, test seems fixed with using combined static and _Thread_local. May this work or this is too naive approach?

@ZeroIntensity
Copy link
Member

ZeroIntensity commented Jul 29, 2025

Hm, you could try it and see what the test suite says. To my knowledge, thread local lookups are generally slower, so we might not see a speedup.

@skirpichev skirpichev reopened this Jul 29, 2025
@skirpichev
Copy link
Contributor Author

Well, CI tests pass, but that might be just an accident.

thread local lookups are generally slower, so we might not see a speedup.

Here my quick measurements for default configure arguments on Linux box. (Free-threading build might change the picture.) Micro-benchmarks are for math.fmin(): when only positional arguments are allowed (as in the main) vs positional-or-keyword allowed.

In the main:

Benchmark posonly-ref posorkw-ref
fmin(1.0, 2.0) 169 ns 188 ns: 1.11x slower
fmin(1.0, 2.0) x 2 times 969 ns 1.00 us: 1.03x slower
fmin(1.0, 2.0) x 10 times 2.09 us 2.27 us: 1.08x slower
fmin(1.0, 2.0) x 100 times 14.8 us 16.5 us: 1.12x slower
Geometric mean (ref) 1.09x slower

With the patch:

Benchmark posonly-patch posorkw-patch
fmin(1.0, 2.0) 170 ns 173 ns: 1.02x slower
fmin(1.0, 2.0) x 10 times 2.06 us 2.08 us: 1.01x slower
fmin(1.0, 2.0) x 100 times 14.8 us 15.0 us: 1.01x slower
Geometric mean (ref) 1.01x slower

Benchmark hidden because not significant (1): fmin(1.0, 2.0) x 2 times

import pyperf
from math import fmin

def f(n):
    for _ in range(n):
        fmin(1.0, 2.0)

runner = pyperf.Runner()
runner.bench_func('fmin(1.0, 2.0)', fmin, 1.0, 2.0)
for n in [2, 10, 100]:
    s = f'fmin(1.0, 2.0) x {n:3} times'
    runner.bench_func(s, f, n)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy