port distributed pipeline test files for Intel GPU #159033

wincent8 · 2025-07-24T10:31:28Z

In this PR we will port all distributed pipeline test files.
We could enable Intel GPU with following methods and try the best to keep the original code styles:

instantiate_device_type_tests()
use "torch.accelerator.current_accelerator()" to determine the accelerator backend
use "requires_accelerator_dist_backend()" to replace requires_nccl()
use "get_default_backend_for_device()" to get backend
enabled XPU for some test path
add TEST_MULTIACCELERATOR in common_utils for all backend.

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @gujinghui @EikanWang @fengyuan14 @guangyey

…_pipeline

Adjust error tolerance for oneDNN nondeterministic

…orch into wliao2/add_pipeline

pytorch-bot · 2025-07-24T10:31:32Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159033

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (4 Unrelated Failures)

As of commit 909fcf7 with merge base f636736 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

xpu / linux-jammy-xpu-2025.1-py3.9 / test (default, 3, 6, linux.idc.xpu) (gh) (disabled by #159331)
export/test_torchbind.py::TestCompileTorchbind::test_compile_obj_torchbind_op_with_autocast_device_cuda_backend_aot_eager

BROKEN TRUNK - The following jobs failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

xpu / linux-jammy-xpu-2025.1-py3.9 / test (default, 4, 6, linux.idc.xpu) (gh) (trunk failure)
The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.
xpu / linux-jammy-xpu-2025.1-py3.9 / test (default, 6, 6, linux.idc.xpu) (gh) (trunk failure)
test_testing.py::TestImports::test_circular_dependencies

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / linux-jammy-py3_9-clang9-xla / test (xla, 1, 1, linux.12xlarge, unstable) (gh) (#158876)
sccache: error: couldn't connect to server

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2025-07-24T10:31:35Z

The committers listed above are authorized under a signed CLA.

✅ login: wincent8 / name: wliao2 (909fcf7, c95cb7f, b966018, 27aee57, 7483fee, cb04674, 7feb8bd, 123fb49, 86812f4, 04fc4ec, 2699fb0, 5b9f224, b07b5d7, 59866c2)
✅ login: daisyden / name: Daisy Deng (9760c99)

wincent8 · 2025-07-25T02:59:30Z

@pytorchbot label "module: xpu"
@pytorchbot label "triaged"

wincent8 · 2025-07-25T03:00:41Z

@pytorchbot label "triaged"

test/distributed/pipelining/test_backward.py

pytorch-bot · 2025-07-25T03:05:31Z

To add the ciflow label ciflow/xpu please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

test/distributed/pipelining/test_backward.py

test/distributed/pipelining/test_schedule_multiproc.py

…_pipeline

wincent8 · 2025-07-25T08:15:27Z

@pytorchbot label "module: xpu"

wincent8 · 2025-07-25T08:15:44Z

@pytorchbot label "triaged"

test/distributed/pipelining/test_schedule.py

torch/testing/_internal/common_utils.py

test/distributed/pipelining/test_unflatten.py

guangyey

Overall LGTM. I recommend to change TEST_MULTIGPU to TEST_MULTIACCELERATOR

guangyey · 2025-07-28T06:56:29Z

Thanks for the update!

wincent8 and others added 8 commits July 23, 2025 15:43

add xpu for test_stage

7feb8bd

Merge remote-tracking branch 'origin/wliao2/baseline' into wliao2/add…

7483fee

…_pipeline

enable xpu for pipeline cases

27aee57

Merge remote-tracking branch 'origin/wliao2/baseline' into wliao2/add…

b966018

…_pipeline

remove debug and fix some bug

c95cb7f

Merge branch 'pytorch:main' into wliao2/add_pipeline

123fb49

Merge pull request pytorch#6 from Kanya-Mo/patch-2

9760c99

Adjust error tolerance for oneDNN nondeterministic

Merge branch 'wliao2/add_pipeline' of https://github.com/wincent8/pyt…

cb04674

…orch into wliao2/add_pipeline

wincent8 requested a review from a team as a code owner July 24, 2025 10:31

pytorch-bot bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Jul 24, 2025

pytorchbot added the open source label Jul 24, 2025

pytorch-bot bot added the module: xpu Intel XPU related issues label Jul 25, 2025

pytorch-bot bot added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 25, 2025

guangyey reviewed Jul 25, 2025

View reviewed changes

test/distributed/pipelining/test_backward.py Outdated Show resolved Hide resolved

guangyey added this to PyTorch Intel Jul 25, 2025

guangyey added the ciflow/xpu Run XPU CI tasks label Jul 25, 2025

pytorch-bot bot removed the ciflow/xpu Run XPU CI tasks label Jul 25, 2025

guangyey added the topic: not user facing topic category label Jul 25, 2025

daisyden reviewed Jul 25, 2025

View reviewed changes

test/distributed/pipelining/test_backward.py Outdated Show resolved Hide resolved

guangyey added the ciflow/xpu Run XPU CI tasks label Jul 25, 2025

daisyden reviewed Jul 25, 2025

View reviewed changes

test/distributed/pipelining/test_schedule_multiproc.py Outdated Show resolved Hide resolved

Merge remote-tracking branch 'origin/wliao2/baseline' into wliao2/add…

5b9f224

…_pipeline

pytorch-bot bot removed the ciflow/xpu Run XPU CI tasks label Jul 25, 2025

wincent8 added 2 commits July 25, 2025 15:22

revert change of these two files to split the pr

59866c2

update device_type as suggested and make lintrunner happy

86812f4

wincent8 mentioned this pull request Jul 25, 2025

port 2 distributed pipeline test files for Intel GPU #159140

Open

guangyey reviewed Jul 26, 2025

View reviewed changes

test/distributed/pipelining/test_schedule.py Outdated Show resolved Hide resolved

guangyey reviewed Jul 26, 2025

View reviewed changes

torch/testing/_internal/common_utils.py Outdated Show resolved Hide resolved

guangyey reviewed Jul 26, 2025

View reviewed changes

test/distributed/pipelining/test_unflatten.py Show resolved Hide resolved

guangyey added the ciflow/xpu Run XPU CI tasks label Jul 26, 2025

guangyey reviewed Jul 26, 2025

View reviewed changes

Merge branch 'wliao2/baseline' into wliao2/add_pipeline

b07b5d7

pytorch-bot bot removed the ciflow/xpu Run XPU CI tasks label Jul 28, 2025

wincent8 added 2 commits July 28, 2025 10:46

introduce TEST_MULTIACCELERATOR

04fc4ec

remove unused variable

2699fb0

guangyey approved these changes Jul 28, 2025

View reviewed changes

guangyey changed the title ~~[WIP]port distributed pipeline test files for Intel GPU~~ port distributed pipeline test files for Intel GPU Jul 28, 2025

guangyey moved this to Review Required in PyTorch Intel Jul 28, 2025

guangyey added the ciflow/xpu Run XPU CI tasks label Jul 28, 2025

guangyey requested review from kwen2501, d4l3k and albanD July 28, 2025 06:58

pytorch-bot bot removed the ciflow/xpu Run XPU CI tasks label Jul 28, 2025

guangyey added the ciflow/xpu Run XPU CI tasks label Jul 28, 2025

guangyey removed the status in PyTorch Intel Jul 28, 2025

guangyey moved this to Review Required in PyTorch Intel Jul 28, 2025

remove unused import

909fcf7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

port distributed pipeline test files for Intel GPU #159033

port distributed pipeline test files for Intel GPU #159033

wincent8 commented Jul 24, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jul 24, 2025 •

edited

Loading

Uh oh!

linux-foundation-easycla bot commented Jul 24, 2025 •

edited

Loading

Uh oh!

wincent8 commented Jul 25, 2025

Uh oh!

wincent8 commented Jul 25, 2025

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 25, 2025

Uh oh!

Uh oh!

Uh oh!

wincent8 commented Jul 25, 2025

Uh oh!

wincent8 commented Jul 25, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

guangyey left a comment

Uh oh!

guangyey commented Jul 28, 2025

Uh oh!

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

port distributed pipeline test files for Intel GPU #159033

Are you sure you want to change the base?

port distributed pipeline test files for Intel GPU #159033

Conversation

wincent8 commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159033

✅ You can merge normally! (4 Unrelated Failures)

Uh oh!

linux-foundation-easycla bot commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wincent8 commented Jul 25, 2025

Uh oh!

wincent8 commented Jul 25, 2025

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 25, 2025

Uh oh!

Uh oh!

Uh oh!

wincent8 commented Jul 25, 2025

Uh oh!

wincent8 commented Jul 25, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

guangyey left a comment

Choose a reason for hiding this comment

Uh oh!

guangyey commented Jul 28, 2025

Uh oh!

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

wincent8 commented Jul 24, 2025 •

edited

Loading

pytorch-bot bot commented Jul 24, 2025 •

edited

Loading

linux-foundation-easycla bot commented Jul 24, 2025 •

edited

Loading