Skip to content

ZJIT: Profile each instruction at most num_profiles times #13903

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 16, 2025

Conversation

k0kubun
Copy link
Member

@k0kubun k0kubun commented Jul 15, 2025

This PR changes ZJIT instructions to only profile at most --zjit-num-profiles times.

ZJIT currently profiles all instructions for num_profiles calls, but when the ISEQ contains a while loop, each instruction can be profiled more than num_profiles times, which I didn't expect. So this PR optimizes such cases. We don't really use while loops in reality, but loop methods themselves use a while loop, so this will be helpful for real-world code too.

There's a trade-off that we need to consume more memory for the per-insn counters, but the impact seems significant enough to justify it. In addition, we could make it more compact and/or restrict some instructions to profile only once to save memory, but it's out of scope in this PR.

$ ./run_benchmarks.rb getivar --chruby 'before --zjit;after --zjit' --once
Running benchmark "getivar" (1/1)
+ setarch x86_64 -R taskset -c 10 /opt/rubies/before/bin/ruby --zjit -I harness /home/k0kubun/src/github.com/Shopify/yjit-bench/benchmarks/getivar.rb
ruby 3.5.0dev (2025-07-15T20:31:58Z master 3cf32e9364) +ZJIT +PRISM [x86_64-linux]
itr:   time
 #1:  684ms
RSS: 14.9MiB
MAXRSS: 16.7MiB
Running benchmark "getivar" (1/1)
+ setarch x86_64 -R taskset -c 10 /opt/rubies/after/bin/ruby --zjit -I harness /home/k0kubun/src/github.com/Shopify/yjit-bench/benchmarks/getivar.rb
ruby 3.5.0dev (2025-07-15T22:07:56Z zjit-profile-count b0e903fad8) +ZJIT +PRISM [x86_64-linux]
itr:   time
 #1:  128ms
RSS: 15.0MiB
MAXRSS: 16.7MiB
Total time spent benchmarking: 1s

before: ruby 3.5.0dev (2025-07-15T20:31:58Z master 3cf32e9364) +ZJIT +PRISM [x86_64-linux]
after: ruby 3.5.0dev (2025-07-15T22:07:56Z zjit-profile-count b0e903fad8) +ZJIT +PRISM [x86_64-linux]

-------  -----------  ----------  ----------  ----------  -------------  ------------
bench    before (ms)  stddev (%)  after (ms)  stddev (%)  after 1st itr  before/after
getivar  684.6        0.0         128.6       0.0         5.324          5.324
-------  -----------  ----------  ----------  ----------  -------------  ------------
Legend:
- after 1st itr: ratio of before/after time for the first benchmarking iteration.
- before/after: ratio of before/after time. Higher is better for after. Above 1 represents a speedup.

@tekknolagi
Copy link
Contributor

What do you think about having num profiles correspond to the sum of:

  • function call
  • backward jump

This way we can still keep one count per iseq but we don't have a profile-time explosion with loops.

Copy link
Contributor

@tekknolagi tekknolagi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems a reasonable solution for this approach but curious what you think of an alternative

@k0kubun
Copy link
Member Author

k0kubun commented Jul 16, 2025

What do you think about having num profiles correspond to the sum of:
function call
backward jump
This way we can still keep one count per iseq but we don't have a profile-time explosion with loops.

With that implementation, when there's a loop, non-looped instructions or instructions in a different loop will have a significantly smaller number of profiles. While it satisfies "we don't have a profile-time explosion with loops", I don't think it's the intended behavior.

@k0kubun k0kubun merged commit acc3172 into ruby:master Jul 16, 2025
81 checks passed
@k0kubun k0kubun deleted the zjit-profile-count branch July 16, 2025 16:53
@tekknolagi
Copy link
Contributor

With that implementation, when there's a loop, non-looped instructions or instructions in a different loop will have a significantly smaller number of profiles.

I think that's fine, actually, and mirrors what we have today as well. It's kind of sampling-profiler-esque.

@k0kubun
Copy link
Member Author

k0kubun commented Jul 16, 2025

I think many instructions will need only one profile (as suggested in the PR description), so I'm fine with that for such instructions. I feel uncomfortable having only one profile for other instructions.

However, if the intention is that we assume it never happens in reality because people don't use while loops and it doesn't matter for loop methods themselves, I'm open to merging such a change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ZJIT: ZJIT instructions can be profiled more than --zjit-num-profiles times
2 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy