Skip to content

[NATIVECPU] Emit Native CPU properties (correctness) #19429

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 36 commits into
base: sycl
Choose a base branch
from

Conversation

uwedolinsky
Copy link
Contributor

@uwedolinsky uwedolinsky commented Jul 14, 2025

Extends clang-offload-wrapper and SYCLOffloadWrapper(clang-linker-wrapper), enabling adding kernel properties that are specific only to Native CPU. Adds a compiler pass that checks whether a kernel comes from a sycl::nd-range and adds a Native CPU - only property for it.

This PR fixes at least test_handler from the SYCL-CTS on NativeCPU by using the nd_range attribute in the NativeCPU adapter to only combine multiple work groups in invocations of non-nd_range kernels.

This new "kernel property infrastructure" will be extended in the future to encode other kernel capabilities. For example the applied vector width which determines how many workgroups could be executed in one kernel invocation. This could help making the kernel launches more efficient. Another property could encode whether the kernel supports peeling - without peeling the kernel would have less branches and the NativeCPU adapter could schedule peeling invocations (of the scalar kernel) in separate threads which might benefit some performance scenario.

This PR replaces and extends #16152 which still used the old UR api/repo.

This PR also adds testing to ensure the new clang-linker-wrapper integration produces the expected IR on SYCL code.

@uwedolinsky uwedolinsky requested review from a team as code owners July 14, 2025 12:38
Comment on lines 1225 to 1226
if (LangOpts.SYCLIsNativeCPU)
llvm::sycl::utils::addSYCLNativeCPUEarlyPasses(MPM);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed internally, ideally we'd get rid of both LangOpts.SYCLIsNativeCPU and SYCLNativeCPUBackend and check what target we're building for, but this is blocked on #19344, without that we don't have enough information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since #19344 is now merged I've updated the PR and removed the usage of LangOpts.SYCLIsNativeCPU.

@@ -149,7 +149,8 @@ UR_APIEXPORT ur_result_t UR_APICALL urEnqueueKernelLaunch(
bool isLocalSizeOne =
ndr.LocalSize[0] == 1 && ndr.LocalSize[1] == 1 && ndr.LocalSize[2] == 1;
if (isLocalSizeOne && ndr.GlobalSize[0] > numParallelThreads &&
!kernel->hasLocalArgs()) {
!kernel->hasLocalArgs() && !hKernel->isNDRangeKernel()) {
// TODO: Check if !kernel->hasLocalArgs() is needed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You made the logic say that if it has local args, it's automatically necessary an NDRangeKernel. I'm not sure yet whether that's 100% accurate, but if it is, hasLocalArgs() does become redundant.

Copy link
Contributor

@maksimsab maksimsab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, add/modify test related to ClangLinkerWrapper. It could be clang/test/Driver/linker-wrapper-image.c.

auto FCalle = M.getOrInsertFunction(
sycl::utils::addSYCLNativeCPUSuffix(Name).str(), FTy);
Function *F = dyn_cast<Function>(FCalle.getCallee());
if (F == nullptr)
report_fatal_error("Unexpected callee");
return F;
}
std::optional<util::PropertySet> SYCLNativeCPUPropSet = std::nullopt;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, make it simpler without a semiglobal variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made this a local variable that is passed into Wrapper::addPropertySetRegistry instead. Would that be acceptable?

auto *NullPtr = llvm::ConstantPointerNull::get(PointerType::getUnqual(C));
if (Entries.empty())
return {NullPtr, NullPtr};

std::unique_ptr<MemoryBuffer> MB = MemoryBuffer::getMemBuffer(Entries);
// the Native CPU PI Plug-in expects the BinaryStart field to point to an
// array of struct nativecpu_entry {
// the Native CPU UR adapter expects the BinaryStart field to point to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, move all important details to function's documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added documentation comments to the function definition (or do you mean documentation elsewhere?), but left the original comments describing the details in place. Would that be acceptable?

Copy link
Contributor

@premanandrao premanandrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FE changes in BackendUtil.cpp look good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy