Skip to content

[SYCL][NFC] Mark external libc functions. #19368

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 23, 2025

Conversation

hvdijk
Copy link
Contributor

@hvdijk hvdijk commented Jul 9, 2025

<sycl/detail/defines_elementary.hpp> defines a __DPCPP_SYCL_EXTERNAL macro for functions that are defined elsewhere. This macro is used both for functions defined by the platform's system libraries and for functions defined in libclc/libdevice. For Native CPU, these should be treated differently: functions defined by the platform's system libraries should be processed according to the platform's ABI, functions defined by libclc/libdevice should be processed in a host-target-independent way. Therefore, this PR introduces a __DPCPP_SYCL_EXTERNAL_LIBC macro so that libc functions can be tagged as such.

This PR does not yet make the changes to Native CPU to define __DPCPP_SYCL_EXTERNAL and __DPCPP_SYCL_EXTERNAL_LIBC differently.

<sycl/detail/defines_elementary.hpp> defines a __DPCPP_SYCL_EXTERNAL
macro for functions that are defined elsewhere. This macro is used both
for functions defined by the platform's system libraries and for
functions defined in libclc/libdevice. For Native CPU, these should be
treated differently: functions defined by the platform's system
libraries should be processed according to the platform's ABI, functions
defined by libclc/libdevice should be processed in a target-independent
way. Therefore, this PR introduces a __DPCPP_SYCL_EXTERNAL_LIBC macro so
that libc functions can be tagged as such.

This PR does not yet make the changes to Native CPU to define
__DPCPP_SYCL_EXTERNAL and __DPCPP_SYCL_EXTERNAL_LIBC differently.
@hvdijk hvdijk requested a review from a team as a code owner July 9, 2025 14:52
@hvdijk hvdijk requested a review from aelovikov-intel July 9, 2025 14:52
@hvdijk hvdijk temporarily deployed to WindowsCILock July 9, 2025 14:53 — with GitHub Actions Inactive
@hvdijk
Copy link
Contributor Author

hvdijk commented Jul 9, 2025

The naming is not ideal, I'm happy to take alternative suggestions for __DPCPP_SYCL_EXTERNAL_LIBC.

@aelovikov-intel
Copy link
Contributor

@bader , @AlexeySachkov , any input from you?

@hvdijk hvdijk temporarily deployed to WindowsCILock July 9, 2025 16:38 — with GitHub Actions Inactive
@hvdijk hvdijk temporarily deployed to WindowsCILock July 9, 2025 16:38 — with GitHub Actions Inactive
@bader
Copy link
Contributor

bader commented Jul 9, 2025

@bader , @AlexeySachkov , any input from you?

I know almost nothing about Native CPU, so I'm unsure the input I can provide is valuable. From the description alone, it's not clear what is the problem we are trying to solve. Follow-up patches are supposed to provide the justification for another macro.

For Native CPU, these should be treated differently: functions defined by the platform's system libraries should be processed according to the platform's ABI, functions defined by libclc/libdevice should be processed in a target-independent way.

According to my understanding libclc/libdevice are not target independent. We build dedicated version for each target separately.

@hvdijk
Copy link
Contributor Author

hvdijk commented Jul 9, 2025

Follow-up patches are supposed to provide the justification for another macro.

What we are currently applying in our internal Native CPU testing, and what I am working on cleaning up in order to hopefully be suitable to merge:

https://raw.githubusercontent.com/uxlfoundation/oneapi-construction-kit/65e886ce0cb7c1849601637ad87ee8a0d6b7e881/scripts/testing/patches/DPCPP-0003-SYCL-NativeCPU-Limit-generic-ABI-to-builtins.patch

What this PR prepares for is

diff --git a/sycl/include/sycl/detail/defines_elementary.hpp b/sycl/include/sycl/detail/defines_elementary.hpp
index 17107c9216b3..ba3401496849 100644
--- a/sycl/include/sycl/detail/defines_elementary.hpp
+++ b/sycl/include/sycl/detail/defines_elementary.hpp
@@ -25,7 +25,12 @@
 #endif // __SYCL_ALWAYS_INLINE
 
 #ifdef SYCL_EXTERNAL
+#ifdef __NativeCPU__
+#define __DPCPP_SYCL_EXTERNAL SYCL_EXTERNAL __attribute__((intel_ocl_bicc))
+#define __DPCPP_SYCL_EXTERNAL_LIBC SYCL_EXTERNAL
+#else
 #define __DPCPP_SYCL_EXTERNAL SYCL_EXTERNAL
+#endif
 #else
 #ifdef __SYCL_DEVICE_ONLY__
 #define __DPCPP_SYCL_EXTERNAL __attribute__((sycl_device))

where whether we actually use __attribute__((intel_ocl_bicc)) or switch to a Native CPU-specific new target attribute is still to be decided and out of scope for this PR.

According to my understanding libclc/libdevice are not target independent. We build dedicated version for each target separately.

Apologies, I should have phrased that differently. I'm aware we build libclc/libdevice for each target separately, but we don't build for each aux (host) target separately, nor should we. We just build libclc once for Native CPU, and that version can be used with any host target. That is, the same libclc files are used for clang++ --target=x86_64-linux-gnu -fsycl -fsycl-targets=native_cpu generating native code for x86_64, as for clang++ --target=aarch64-linux-gnu -fsycl -fsycl-targets=native_cpu generating native code for AArch64.

In order for this to work, we either need to default to the platform-independent ABI (calling into libclc) for translating to LLVM IR and explicitly mark those functions where we do need the host ABI (calling into libc), or we default to the host ABI and explicitly mark those functions where we need the platform-independent ABI. Either would work, and both approaches require this PR or something equivalent.

@bader
Copy link
Contributor

bader commented Jul 9, 2025

whether we actually use __attribute__((intel_ocl_bicc)) or switch to a Native CPU-specific new target attribute is still to be decided and out of scope for this PR.

intel_ocl_bicc was created for one very specific use case, which is not relevant to SYCL. Modifying SYCL headers to use calling convention for "Intel OpenCL CPU builtins" doesn't sound right me and goes against the purpose of Native CPU as I understand it. This calling convention works only for x86 processors, moreover it's supposed to be used only for SVML functions implementing OpenCL built-in functions.

Apologies, I should have phrased that differently. I'm aware we build libclc/libdevice for each target separately, but we don't build for each aux (host) target separately, nor should we.

I'm lost here. libclc is written in OpenCL. OpenCL compiler doesn't have the concept of "aux-target".
libdevice is written in SYCL, but AFAIK it's so just to simplify the integration with the SYCL code. It's device-only library with zero host code.

We just build libclc once for Native CPU, and that version can be used with any host target. That is, the same libclc files are used for clang++ --target=x86_64-linux-gnu -fsycl -fsycl-targets=native_cpu generating native code for x86_64, as for clang++ --target=aarch64-linux-gnu -fsycl -fsycl-targets=native_cpu generating native code for AArch64.

When you say "build", what exactly are we producing? My assumption it's LLVM IR.
To be honest, I don't understand how Native CPU is supposed to work as LLVM IR is target dependent by design. For instance, what is the size of the pointer for Native CPU (note: some libclc functions have pointer parameters)? In your example all targets are using 64-bit pointer size, but I don't think you can re-use "same libclc files" for CPUs with 32-bit pointers.

I would prefer this patch to be reviewed by someone who understands Native CPU architecture and this issue. It will take me some time to dig deep into this issue and I don't want @hvdijk to waste his time on dealing with all my questions.

@hvdijk
Copy link
Contributor Author

hvdijk commented Jul 10, 2025

whether we actually use __attribute__((intel_ocl_bicc)) or switch to a Native CPU-specific new target attribute is still to be decided and out of scope for this PR.

intel_ocl_bicc was created for one very specific use case, which is not relevant to SYCL.

Agreed, it is a bit of a misuse of the calling convention, that's why I'm looking at changing that before submitting that as a PR. We need some calling convention that we give special treatment, and in our local patches we just went with intel_ocl_bicc because it was available and was convenient to get things working, it meant we could avoid adding a new keyword. In what we will end up submitting as a PR, we may want to do that differently.

Modifying SYCL headers to use calling convention for "Intel OpenCL CPU builtins" doesn't sound right me and goes against the purpose of Native CPU as I understand it. This calling convention works only for x86 processors, moreover it's supposed to be used only for SVML functions implementing OpenCL built-in functions.

It's a calling convention that in that patch is handled entirely by the Native CPU target, not by the x86 target: we bypass the host target's regular handling for it. This really does work with Native CPU on x86_64, on AArch64, and on RISC-V. Even if in what we end up submitting as a PR, we end up using a different calling convention, the approach will be the same. In that patch, that is handled by the change to clang/lib/CodeGen/Targets/NativeCPU.cpp.

Apologies, I should have phrased that differently. I'm aware we build libclc/libdevice for each target separately, but we don't build for each aux (host) target separately, nor should we.

I'm lost here. libclc is written in OpenCL. OpenCL compiler doesn't have the concept of "aux-target". libdevice is written in SYCL, but AFAIK it's so just to simplify the integration with the SYCL code. It's device-only library with zero host code.

The idea of Native CPU is that SYCL device code runs directly on the host CPU, so any device code is host code.

We just build libclc once for Native CPU, and that version can be used with any host target. That is, the same libclc files are used for clang++ --target=x86_64-linux-gnu -fsycl -fsycl-targets=native_cpu generating native code for x86_64, as for clang++ --target=aarch64-linux-gnu -fsycl -fsycl-targets=native_cpu generating native code for AArch64.

When you say "build", what exactly are we producing? My assumption it's LLVM IR.

For the initial device compilation, we indeed produce LLVM IR. Then, at link time, when we embed that LLVM IR in the host executable, we translate that device LLVM IR using the host backend to produce machine code for whatever target is selected.

To be honest, I don't understand how Native CPU is supposed to work as LLVM IR is target dependent by design. For instance, what is the size of the pointer for Native CPU (note: some libclc functions have pointer parameters)?

Native CPU does not have things like pointer size, it copies all those attributes from the host. It cannot generally be used standalone. There are a few other targets in LLVM upstream already that similarly copy a lot of attributes from the host, that handling was taken from the NVPTX and SPIR targets.

In your example all targets are using 64-bit pointer size, but I don't think you can re-use "same libclc files" for CPUs with 32-bit pointers.

In general, you're right, Native CPU LLVM IR is not host-target-independent, nor is it meant to be. In libclc, however, we avoid anything that results in LLVM IR that varies between host targets and we have nothing that depends on the size of pointers, and those LLVM IR files are meant to be host-target-independent.

As far as I am aware, DPC++ does not support SYCL on CPUs with 32-bit pointers. In a local build, months ago, I tried it and used the compiler to build SYCL-CTS (for spir32). It did not go well. There is also no CI to test this, and no documentation that suggests this should work. If that ever changes, though, it should also work for Native CPU as well. I can double-check later this week that we have no libclc/libdevice bugs in this area just to be sure (separate from this PR).

I would prefer this patch to be reviewed by someone who understands Native CPU architecture and this issue. It will take me some time to dig deep into this issue and I don't want @hvdijk to waste his time on dealing with all my questions.

I'm happy to wait for someone else to review but they're good questions to ask to get answers written out for later reference.

@hvdijk
Copy link
Contributor Author

hvdijk commented Jul 10, 2025

I can double-check later this week that we have no libclc/libdevice bugs in this area just to be sure (separate from this PR).

I might as well update here anyway: we don't have problems with pointers directly, we do have problems with assuming size_t is 64 bits. But that and pointer size (which are going to be the same) have become a general assumption across DPC++ nowadays: trying a 32-bit build I get errors because of that in unified-memory-framework, in libdevice, in libsycl, in unified-runtime, in sycl-jit. If there is any desire to fix this, whatever gets done to fix this for other parts of libdevice will also be possible to do for Native CPU.

@hvdijk hvdijk temporarily deployed to WindowsCILock July 16, 2025 11:23 — with GitHub Actions Inactive
@hvdijk
Copy link
Contributor Author

hvdijk commented Jul 16, 2025

Updated to resolve a merge conflict. For more background, the next PR will be https://github.com/uxlfoundation/oneapi-construction-kit/blob/fe6edb51f22c5e532942b32168f8a02ac6295d01/scripts/testing/patches/DPCPP-0002-SYCL-NativeCPU-Limit-generic-ABI-to-builtins.patch (now updated to not repurpose intel_ocl_bicc), that is the motivation for this.

@hvdijk hvdijk temporarily deployed to WindowsCILock July 16, 2025 12:26 — with GitHub Actions Inactive
@hvdijk hvdijk temporarily deployed to WindowsCILock July 16, 2025 12:26 — with GitHub Actions Inactive
@hvdijk
Copy link
Contributor Author

hvdijk commented Jul 21, 2025

ping @intel/llvm-reviewers-runtime, I will be on holiday after this week, so I would like to address any feedback for this PR and also leave enough time for feedback on that followup PR that I would create next.

@aelovikov-intel
Copy link
Contributor

ping @intel/llvm-reviewers-runtime, I will be on holiday after this week, so I would like to address any feedback for this PR and also leave enough time for feedback on that followup PR that I would create next.

Textual diff if fine, but as I mentioned above, I don't have knowledge to judge if the direction of the entire patch series is solid and it looks like you haven't convinced @bader that it's the right way to go...

@hvdijk
Copy link
Contributor Author

hvdijk commented Jul 22, 2025

Textual diff if fine, but as I mentioned above, I don't have knowledge to judge if the direction of the entire patch series is solid and it looks like you haven't convinced @bader that it's the right way to go...

He wrote "I would prefer this patch to be reviewed by someone who understands Native CPU architecture and this issue", this needs to be reviewed by someone else. I don't know who else would be suitable as a reviewer.

@dm-vodopyanov dm-vodopyanov requested a review from a team July 22, 2025 15:34
@aelovikov-intel
Copy link
Contributor

+ enabling team representatives.

@steffenlarsen
Copy link
Contributor

Because libclc is written in OpenCL C, ABI compatibility is always a concern and something we work around for other targets by producing different variants of libclc at the moment. It is a little tough to say how stable it is going to be, but if we have confidence that this works for x86_64 and AArch64 (did we run the CTS for this?) then I am not overly concerned about this approach.

@hvdijk
Copy link
Contributor Author

hvdijk commented Jul 23, 2025

(did we run the CTS for this?)

Yes, we run SYCL-CTS for Native CPU for x86_64, AArch64, and riscv64. We have three tests that are (sometimes) failing, but not because of ABI issues, and they're not architecture-specific. You can see the logs at https://github.com/uxlfoundation/oneapi-construction-kit/actions/runs/16378383696

Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, I trust this solution! Maybe it would be great with a comment in the defines_elementary.hpp header and/or the design document, but I think it can be with the follow-up, given this is NFC.

@hvdijk
Copy link
Contributor Author

hvdijk commented Jul 23, 2025

@aelovikov-intel With these approvals, are you okay with this being merged?

@aelovikov-intel aelovikov-intel merged commit 7dd93c9 into intel:sycl Jul 23, 2025
25 checks passed
@hvdijk hvdijk deleted the sycl-external-libc branch July 23, 2025 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants