[CIR][ThroughMLIR] Lower simple SwitchOp #1742

terapines-osc-cir · 2025-07-13T06:41:19Z

This deals with fall-through by copying the body of the next cir.case to the previous case. This is needed because scf.index_switch does not support falling through.

This implements the missing feature `cir::setTargetAttributes`. Although other targets might also need attributes, this PR focuses on the CUDA-specific ones. For CUDA kernels (on device side, not stubs), they must have a calling convention of `ptx_kernel`. It is added here. CUDA kernels, as well as global variables, also involves lots of NVVM metadata, which is intended to be dealt with at the same place. It's marked with a new missing feature here.

Lower neon vmaxv_f32

This PR implements \_\_constant\_\_ variables. llvm#1438 only implements \_\_device\_\_ and \_\_shared\_\_ variables, ~~This PR depends on llvm#1445~~

This is part 2 of CUDA lowering. Still more to come! This PR generates `__cuda_register_globals` for functions only, without touching variables. It also fixes two discrepancies mentioned in Part 1, namely: - Now CIR will not generate registration code if there's nothing to register; - `__cuda_fatbin_wrapper` now becomes a constant.

This PR deals with several issues currently present in CUDA CodeGen. Each of them requires only a few lines to fix, so they're combined in a single PR. **Bug 1.** Suppose we write ```cpp __global__ void kernel(int a, int b); ``` Then when we call this kernel with `cudaLaunchKernel`, the 4th argument to that function is something of the form `void *kernel_args[2] = {&a, &b}`. OG allocates the space of it with `alloca ptr, i32 2`, but that doesn't seem to be feasible in CIR, so we allocated `alloca [2 x ptr], i32 1`. This means there must be an extra GEP as compared to OG. In CIR, it means we must add an `array_to_ptrdecay` cast before trying to accessing the array elements. I missed that out in llvm#1332 . **Bug 2.** We missed a load instruction for 6th argument to `cudaLaunchKernel`. It's added back in this PR. **Bug 3.** When we launch a kernel, we first retrieve the return value of `__cudaPopCallConfiguration`. If it's zero, then the call succeeds and we should proceed to call the device stub. In llvm#1348 we did exactly the opposite, calling the device stub only if it's not zero. It's fixed here. **Issue 4.** CallConvLowering is required to make `cudaLaunchKernel` correct. The codepath is unblocked by adding a `getIndirectResult` at the same place as OG does -- the function is already implemented so we can just call it. After this (and other pending PRs), CIR is now able to compile real CUDA programs. There are still missing features, which will be followed up later.

Lower neon vaddlv_s32

This is Part 3 of registration function generation. This generates `__cuda_module_dtor`. It cannot be placed in global dtors list, as treating it as a normal destructor will result in double-free in recent CUDA versions (see comments in OG). Rather, the function is passed as callback of `atexit`, which is called at the end of `__cuda_module_ctor`.

@Lancern

Traditional clang implementation: https://github.com/llvm/clangir/blob/a1ab6bf6cd3b83d0982c16f29e8c98958f69c024/clang/lib/CodeGen/CGBuiltin.cpp#L3618-L3632 The problem here is that `__builtin_clz` allows undefined result, while `__lzcnt` doesn't. As a result, I have to create a new CIR for `__lzcnt`. Since the return type of those two builtin differs, I decided to change return type of current `CIR_BitOp` to allow new `CIR_LzcntOp` to inherit from it. I would like to hear your suggestions. C.c. @Lancern

This PR adds support for compiling builtin variables like `threadIdx` down to the appropriate intrinsic. --------- Co-authored-by: Aidan Wong <[email protected]> Co-authored-by: anominos <[email protected]>

I have now fixed the test. Earlier I made some commits with other changes because we were testing something on my fork. This should be resolved now

CIR is currently ignoring the `signext` and `zeroext` for function arguments and return types produced by CallConvLowering. This PR lowers them to LLVM IR.

I realized I committed a new file with CRLF before. Really sorry about that >_< Related: llvm#1404

The choice of adding a separate file imitates that of OG.

This PR removes a useless argument `convertToInt` and removes hardcoded `Sint32Type`. I realized I committed a new file with CRLF before. Really sorry about that >_<

There are some subtleties here. This is the code in OG: ```cpp // note: this is different from default ABI if (!RetTy->isScalarType()) return ABIArgInfo::getDirect(); ``` which says we should return structs directly. It's correct, has have the same behaviour as `nvcc`, and it obeys the PTX ABI as well. The comment dates back to 2013 (see [this commit](llvm/llvm-project@f9329ff) -- it didn't provide any explanation either), so I believe it's outdated. I didn't include this comment in the PR.

…lvm#1486) The pattern `call {{.*}} i32` mismatches `call i32` due to double spaces surrounding `{{.*}}`. This patch removes the first space to fix the failure.

…1487) This PR resolves an assertion failure in `CIRGenTypes::isFuncParamTypeConvertible`, which is involved when trying to emit a vtable entry to a virtual function whose type includes a pointer-to-member-function.

Lower neon vabsd_s64

…lvm#1431) Implements `::verify` for operations cir.atomic.xchg and cir.atomic.cmp_xchg I believe the existing regression tests don't get to the CIR level type check failure and I was not able to implement a case that does. Most attempts of reproducing cir.atomic.xchg type check failure were along the lines of: ``` int a; long long b,c; __atomic_exchange(&a, &b, &c, memory_order_seq_cst); ``` And they seem to never trigger the failure on `::verify` because they fail earlier in function parameter checking: ``` exmp.cpp:7:27: error: cannot initialize a parameter of type 'int *' with an rvalue of type 'long long *' 7 | __atomic_exchange(&a, &b, &c, memory_order_seq_cst); | ^~ ``` Closes llvm#1378 .

Lower neon vcaled_f64

This PR adds a new boolean flag to the `cir.load` and the `cir.store` operation that distinguishes nontemporal loads and stores. Besides, this PR also adds support for the `__builtin_nontemporal_load` and the `__builtin_nontemporal_store` intrinsic function.

Resolves llvm#1477 .

Lower vcales_f32

This PR adds an insertion guard for the try body scope for try-catch. Currently, the following code snippet fails during CodeGen: ``` void foo() { int r = 1; try { ++r; return; } catch (...) { } } ``` The insertion point doesn't get reset properly and the cleanup is being ran for a wrong/deleted block causing a segmentation fault. I also added a test.

The comments suggested that we should use TableGen to generate the recognizing functions. However, I think templates might be more suitable for generating them -- and I can't find any existing TableGen backends that let us generate arbitrary functions. My choice of design is to offer a template to match standard library functions: ```cpp // matches std::find with 3 arguments, and raise it into StdFindOp StdRecognizer<3, StdFindOp, StdFuncsID::Find> ``` I have to use a TableGen'd enum to map names to IDs, as we can't pass string literals to template arguments easily in C++17. This also constraints design of future `StdXXXOp`s: they must take operands the same way of StdFindOp, where the first one is the original function, and the rest are function arguments. I'm not sure if this approach is the best way. Please tell me if you have concerns or any alternative ways.

…was set explicitly (llvm#1482) This is backported from a change made in llvm/llvm-project#131181 --------- Co-authored-by: Morris Hafner <[email protected]>

Backport functional cast to ComplexType

…nsics (llvm#1738)

bcardosolopes

Overall looks good, mostly nits

clang/lib/CIR/Lowering/ThroughMLIR/LowerCIRToMLIR.cpp

clang/include/clang/CIR/Passes.h

clang/lib/CIR/Lowering/ThroughMLIR/MLIRLoweringPrepare.cpp

This patch bumps the windows CI container to windows server 2022 from windows server 2019. This is necessary as Github has sunsetted support for sever 2019, so we cannot build the container through GHA without updating. Using more recent versions is just good practice anyways. This will not roll out immediately and we'll have to make some TF changes to get deployed, but some additional validation first will be good anyways. Reviewers: lnihlen, tstellar, cmtice Reviewed By: cmtice Pull Request: llvm/llvm-project#148318 (cherry picked from commit 3e43915)

…style (llvm#1741) - This adds common `CIR_` prefix to all operation disambiguating them when used with other dialects. - Unifies traits style in operation definitions

- This fixes default value to be expected 65535 - Introduces DefaultGlobalCtorDtorPriority constant - Makes function to use I32Attr for priority instead of unnecessary attribute with reference to function

Seems like this is the wrong approach. This reverts commit bc91ef4.

This updates the lowering of CIR function aliases in such a way that they now actually become aliases in the final LLVM IR.

…lvm#1740) This PR has two parts: 1. Mimicking the OG [special case](https://github.com/llvm/clangir/blob/d030c9bff74f4f9504a61abe9b2c04a8777028a5/clang/lib/CodeGen/CGException.cpp#L690) for a single catch-all when getting dispatch blocks. The huge testcase I added, gotten by using [creduce](https://github.com/csmith-project/creduce) on a c++ file, crashed at this point [in our version](https://github.com/llvm/clangir/blob/d030c9bff74f4f9504a61abe9b2c04a8777028a5/clang/lib/CIR/CodeGen/CIRGenException.cpp#L789). 2. Fixing multiple destructor calls for the same object. For example, there were tests like [llvm#1](https://github.com/llvm/clangir/blob/d030c9bff74f4f9504a61abe9b2c04a8777028a5/clang/test/CIR/CodeGen/try-catch-dtors.cpp#L370C1-L372C80) and [llvm#2](https://github.com/llvm/clangir/blob/d030c9bff74f4f9504a61abe9b2c04a8777028a5/clang/test/CIR/CodeGen/conditional-cleanup.cpp#L217C1-L224C25), having a second destructor call to an already destroyed object. This PR fixes these and I have updated the tests. Also, I added `"CIR-NEXT"` at some points, to confirm the destructors are indeed called once. As usual, please let me know if you have any concerns.

This patch backports changes made to the bit operations in the upstream PR llvm/llvm-project#148378. Namely, this patch includes the following changes: - This patch removes the `bit.` prefix in the op mnemonic. The operation names now directly correspond to the builtin function names except for bswap which is represented by `cir.byte_swap` for more clarity. - Since all bit operations are `SameOperandsAndResultType`, this patch updates their assembly format and avoids spelling out the operand type twice.

clang/lib/CIR/Lowering/ThroughMLIR/MLIRCoreDialectsLoweringPrepare.cpp

…1746)

The LoweringPrepare pass was generating the wrong condition for loops when lowering the ArrayCtor op, causing only one element in an array of objects to be constructed. This fixes that problem.

Backporting passing enum kind directly to complex cast helpers

…ent (llvm#1748) ## Overview Currently, getting the pointer to an element of an array requires a pointer decay and a (possible) pointer stride. A similar pattern for records has been eliminated with the `cir.get_member` operation. This PR provides a similar level of abstraction for arrays with the `get_element` operation. `get_element` replaces the above pattern with a single operation, which takes a pointer to an array and an index, and produces a pointer to the element at that index. There are many places in CIR analysis and lowering where the `ptr_stride(array_to_ptrdecay(x), i)` pattern is handled as a special case. By subsuming the special case pattern with an explicit operation, we make these analyses and lowering more robust. ## Changes Adds the `cir.get_element` operation. Extends CIRGen to emit `cir.get_element` for array subscript expressions. Updated LifetimeCheck to handle `get_element` operation, subsuming special case analysis of `cir.ptr_stride` operation (did not remove the special case). Extends CIR-to-LLVM lowering to lower `cir.get_element` to `llvm.getelementptr` Extends CIR-to-MLIR lowering to lower `cir.get_element` to `memref` operations, matching existing special case `cir.ptr_stride` lowering. ## Additional Notes Currently, 47.6% of `cir.ptr_stride` operations in the llvm-test-suite (SingleSource and MultiSource) can be replaced by `cir.get_element` operations. ### Operator Breakdown (current) name | count | % -- | -- | -- cir.load | 825221 | 22.27% cir.br | 429822 | 11.60% cir.const | 380381 | 10.26% cir.cast | 325646 | 8.79% cir.store | 309586 | 8.35% cir.get_member | 226895 | 6.12% cir.get_global | 186851 | 5.04% cir.ptr_stride | 158094 | 4.27% cir.call | 144522 | 3.90% cir.binop | 141142 | 3.81% cir.alloca | 134346 | 3.63% cir.brcond | 112864 | 3.05% cir.cmp | 83532 | 2.25% ### Operator Breakdown (with `get_element`) name | count | % -- | -- | -- cir.load | 825221 | 22.74% cir.br | 429822 | 11.84% cir.const | 380381 | 10.48% cir.store | 309586 | 8.53% cir.cast | 248645 | 6.85% cir.get_member | 226895 | 6.25% cir.get_global | 186851 | 5.15% cir.call | 144522 | 3.98% cir.binop | 141142 | 3.89% cir.alloca | 134346 | 3.70% cir.brcond | 112864 | 3.11% cir.cmp | 83532 | 2.30% cir.ptr_stride | 81093 | 2.23% cir.get_elem | 77001 | 2.12% --------- Co-authored-by: Andy Kaylor <[email protected]> Co-authored-by: Henrich Lauko <[email protected]>

AdUhTkJm · 2025-07-25T13:24:23Z

Rebase conflicts are now resolved.

bcardosolopes · 2025-08-07T16:05:12Z

clang/lib/CIR/Lowering/ThroughMLIR/LowerCIRToMLIR.cpp

+        break;
+      case CaseOpKind::Range:
+      case CaseOpKind::Anyof:
+        mlir::emitError(op.getLoc(), "not yet implemented");


Why not return here and in all other places?

AdUhTkJm and others added 30 commits April 9, 2025 14:59

[CIR][CIRGen][Builtin][Neon] Lower neon vmaxv_f32 (llvm#1460)

5f03b07

Lower neon vmaxv_f32

[CIR][CUDA] implement cuda constant variables (llvm#1444)

cadb738

This PR implements \_\_constant\_\_ variables. llvm#1438 only implements \_\_device\_\_ and \_\_shared\_\_ variables, ~~This PR depends on llvm#1445~~

[CIR][CIRGen][Builtin][Neon] Lower neon vaddlv_s32 (llvm#1464)

49383ee

Lower neon vaddlv_s32

[CIR][CUDA] Support builtin CUDA variables (llvm#1458)

fe2206a

This PR adds support for compiling builtin variables like `threadIdx` down to the appropriate intrinsic. --------- Co-authored-by: Aidan Wong <[email protected]> Co-authored-by: anominos <[email protected]>

[CIR][CUDA] Support for inbuilt texture types (llvm#1469)

bf52b48

I have now fixed the test. Earlier I made some commits with other changes because we were testing something on my fork. This should be resolved now

[CIR] Lower signext and zeroext attributes (llvm#1473)

9b0b837

CIR is currently ignoring the `signext` and `zeroext` for function arguments and return types produced by CallConvLowering. This PR lowers them to LLVM IR.

[CIR][CIRGen][TBAA] Add support for vtable pointer (llvm#1463)

f896192

[CIR][CIRGen][builtin][X86] handle _mm_lfence (llvm#1474)

6e31c6e

I realized I committed a new file with CRLF before. Really sorry about that >_< Related: llvm#1404

[CIR][CUDA] Support device-side printf (llvm#1475)

0aef9ed

The choice of adding a separate file imitates that of OG.

[CIR][CIRGen][builtin] handle __popcnt (llvm#1479)

867d736

This PR removes a useless argument `convertToInt` and removes hardcoded `Sint32Type`. I realized I committed a new file with CRLF before. Really sorry about that >_<

[CIR][NFC] Fix test failures caused by double spaces in check line (l…

9e8806a

…lvm#1486) The pattern `call {{.*}} i32` mismatches `call i32` due to double spaces surrounding `{{.*}}`. This patch removes the first space to fix the failure.

[CIR][CIRGen][Builtin][Neon] Lower neon vabsd_s64 (llvm#1489)

bce7507

Lower neon vabsd_s64

[CIR][CIRGen][Builtin][Neon] Lower neon vcaled_f64 (llvm#1495)

e74e226

Lower neon vcaled_f64

[CIR][CIRGen][builtin] handle _mm_pause (llvm#1493)

dd37e38

[CIR] Emit nsw flag for unary integer operations (llvm#1485)

6a3e881

Resolves llvm#1477 .

[CIR][CIRGen][Builtin][Neon] Lower vcales_f32 (llvm#1500)

8f04109

Lower vcales_f32

[CIR][NFC] Fix a wrong test case in fc293bb (llvm#1503)

126956d

cir-translate: Use default target triple instead of x86 if no target …

778bedb

…was set explicitly (llvm#1482) This is backported from a change made in llvm/llvm-project#131181 --------- Co-authored-by: Morris Hafner <[email protected]>

AmrDeveloper and others added 2 commits July 14, 2025 15:53

[CIR] Backport functional cast to ComplexType (llvm#1737)

5d27d15

Backport functional cast to ComplexType

[CIR][CIRGen][Builtin][X86] Lower AVX mask-to-vector conversion intri…

577e995

…nsics (llvm#1738)

bcardosolopes reviewed Jul 14, 2025

View reviewed changes

clang/lib/CIR/Lowering/ThroughMLIR/LowerCIRToMLIR.cpp Outdated Show resolved Hide resolved

clang/include/clang/CIR/Passes.h Outdated Show resolved Hide resolved

clang/lib/CIR/Lowering/ThroughMLIR/MLIRLoweringPrepare.cpp Outdated Show resolved Hide resolved

boomanaiden154 and others added 7 commits July 14, 2025 16:33

[CIR] Reformat Ops to use common CIR_ prefix and definition traits …

17aa78d

…style (llvm#1741) - This adds common `CIR_` prefix to all operation disambiguating them when used with other dialects. - Unifies traits style in operation definitions

[CIR] Fix Global Ctor/Dtor priority attributes (llvm#1743)

ee785a0

- This fixes default value to be expected 65535 - Introduces DefaultGlobalCtorDtorPriority constant - Makes function to use I32Attr for priority instead of unnecessary attribute with reference to function

Revert "[CI][Github] Bump Windows Container to Server 2022"

b0901d3

Seems like this is the wrong approach. This reverts commit bc91ef4.

[CIR] Implement function alias lowering (llvm#1739)

504714a

This updates the lowering of CIR function aliases in such a way that they now actually become aliases in the final LLVM IR.

AdUhTkJm force-pushed the cir-switch branch from 56b69e6 to 1a7f9c9 Compare July 17, 2025 13:25

bcardosolopes reviewed Jul 17, 2025

View reviewed changes

clang/lib/CIR/Lowering/ThroughMLIR/MLIRCoreDialectsLoweringPrepare.cpp Outdated Show resolved Hide resolved

[CIR] Reformat Attr to use common CIR_ prefix and traits style (llvm#…

ae25175

…1746)

AdUhTkJm force-pushed the cir-switch branch from 1a7f9c9 to c05d426 Compare July 18, 2025 12:54

[CIR][NFC] Add placeholders for remaining x86 intrinsics (llvm#1754)

8f89224

lanza force-pushed the main branch 2 times, most recently from d2c4ab8 to 8f89224 Compare July 23, 2025 17:04

andykaylor and others added 3 commits July 24, 2025 09:24

[CIR] Fix array init loop condition (llvm#1758)

2a126d2

The LoweringPrepare pass was generating the wrong condition for loops when lowering the ArrayCtor op, causing only one element in an array of objects to be constructed. This fixes that problem.

[CIR][NFC] Pass enum kind directly to complex cast helpers (llvm#1757)

754a11a

Backporting passing enum kind directly to complex cast helpers

AdUhTkJm force-pushed the cir-switch branch from c05d426 to 3fa7128 Compare July 25, 2025 12:57

AdUhTkJm force-pushed the cir-switch branch 2 times, most recently from 04ef3d6 to d8968f9 Compare July 30, 2025 04:48

[CIR][ThroughMLIR] Lower simple SwitchOp

7dcc0ac

AdUhTkJm force-pushed the cir-switch branch from d8968f9 to 7dcc0ac Compare July 30, 2025 04:49

bcardosolopes reviewed Aug 7, 2025

View reviewed changes

lanza force-pushed the main branch from 942008c to aeac352 Compare August 11, 2025 06:15

lanza requested review from xlauko and andykaylor as code owners August 11, 2025 06:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CIR][ThroughMLIR] Lower simple SwitchOp #1742

[CIR][ThroughMLIR] Lower simple SwitchOp #1742

Uh oh!

terapines-osc-cir commented Jul 13, 2025

Uh oh!

bcardosolopes left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AdUhTkJm commented Jul 25, 2025

Uh oh!

bcardosolopes Aug 7, 2025

Uh oh!

Uh oh!

[CIR][ThroughMLIR] Lower simple SwitchOp #1742

Are you sure you want to change the base?

[CIR][ThroughMLIR] Lower simple SwitchOp #1742

Uh oh!

Conversation

terapines-osc-cir commented Jul 13, 2025

Uh oh!

bcardosolopes left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AdUhTkJm commented Jul 25, 2025

Uh oh!

bcardosolopes Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!