Skip to content

[NO MERGE] is no_x_dim really faster? #159048

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jataylo
Copy link
Collaborator

@jataylo jataylo commented Jul 24, 2025

@jataylo jataylo added the ciflow/inductor-perf-test-nightly-rocm Trigger inductor perf tests on ROCm label Jul 24, 2025
Copy link

pytorch-bot bot commented Jul 24, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159048

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures, 1 Unrelated Failure

As of commit fdfc7c0 with merge base bcf34d2 (image):

NEW FAILURES - The following jobs have failed:

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link
Contributor

This PR needs a release notes: label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@jataylo jataylo added the ciflow/inductor-perf-test-nightly Trigger nightly inductor perf tests label Jul 25, 2025
Copy link

pytorch-bot bot commented Jul 25, 2025

Warning: Unknown label ciflow/inductor-perf-test-nightly.
Currently recognized labels are

  • ciflow/binaries
  • ciflow/binaries_libtorch
  • ciflow/binaries_wheel
  • ciflow/triton_binaries
  • ciflow/inductor
  • ciflow/inductor-periodic
  • ciflow/inductor-rocm
  • ciflow/inductor-perf-test-nightly-rocm
  • ciflow/inductor-perf-compare
  • ciflow/inductor-micro-benchmark
  • ciflow/inductor-micro-benchmark-cpu-x86
  • ciflow/inductor-perf-test-nightly-x86-zen
  • ciflow/inductor-cu126
  • ciflow/linux-aarch64
  • ciflow/mps
  • ciflow/nightly
  • ciflow/periodic
  • ciflow/periodic-rocm-mi300
  • ciflow/rocm
  • ciflow/rocm-mi300
  • ciflow/s390
  • ciflow/slow
  • ciflow/trunk
  • ciflow/unstable
  • ciflow/xpu
  • ciflow/torchbench
  • ciflow/op-benchmark
  • ciflow/pull
  • ciflow/h100
  • ciflow/h100-distributed
  • ciflow/win-arm64
  • ciflow/h100-symm-mem
  • ciflow/h100-cutlass-backend

Please add the new label to .github/pytorch-probot.yml

@jataylo
Copy link
Collaborator Author

jataylo commented Jul 25, 2025

@pytorchbot rebase -b main

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased no_x_dim_test onto refs/remotes/origin/main, please pull locally before adding more changes (for example, via git checkout no_x_dim_test && git pull --rebase)

@jataylo jataylo added pt2-pass-rate-regression Track regression of PT2 dashboard pass rate and removed pt2-pass-rate-regression Track regression of PT2 dashboard pass rate labels Jul 25, 2025
@jataylo
Copy link
Collaborator Author

jataylo commented Jul 25, 2025

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased no_x_dim_test onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout no_x_dim_test && git pull --rebase)

jithunnair-amd pushed a commit to ROCm/pytorch that referenced this pull request Jul 30, 2025
…im removal (#2417)

We noticed persistent reduction kernels can be extremely poor performing
https://ontrack-internal.amd.com/browse/SWDEV-539215

The root cause is that in certain size restrictions and kernels
"no_x_dim" mode is enabled, which embeds static XBLOCK=1 into the
kernel. This means tuning is not optimal. Removing this mode and
enabling autotune we achieve 2x performance proving that new heuristics
must be made.

We will bring this into 2.7 for perf uplift, discussion is undergoing
with upstream on removing no_x_dim, if there is no perf regression they
are in agreement. Draft PR shows no perf loss on ROCm for any inductor
benchmark pytorch#159048

Removing tests because no longer relevant.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy