Skip to content

Conversation

xlauko
Copy link
Collaborator

@xlauko xlauko commented Aug 18, 2025

No description provided.

xlauko and others added 30 commits April 9, 2025 14:28
To give LoweringPrepare type information from `CIRGenTypeCache`, this PR
adds two attributes to ModuleOp:
```mlir
module attributes {
  cir.int_size = #cir.int_size<32>,
  cir.size_type_size = #cir.size_type_size<64>,
  ...
} {}
```
The `CIRDataLayout` class is also extended to have `getPtrDiffTy` and so
on.

Some tests that only expects `cir.lang` and `cir.sob` are also changed
to take this into account.
If type of operand is not integer, it can be handled like what I do in
`__builtin_elementwise_exp`.
This patch adds support for simple cast operations on pointers to member
functions, including: 1) casting pointers to member function values to
boolean values; 2) reinterpret casts between pointers to member
functions.
This uses the assembly format for the optional return type and keeps a
custom printer/parser only for function parameters, which still require
a custom form for ellipses.
for example, lower 
`cir.alloca !cir.array<!s32i x N>, !cir.ptr<!cir.array<!s32i x N>>` 
to `memref.alloca() : memref<Nxi32>`

see #1405
Get rid of the function `FuncOp::verifyType`. The function performed
three checks:

1. Check that `isa<cir::FuncType>(getFunctionType())`. This is a
tautology that is always true, since the return type of
`getFunctionType()` is already `cir::FuncType`.

2. Report an error if `type.isVarArg() && type.getNumInputs() == 0`,
i.e. when a variadic function has no named parameters. That check is
incorrect. In C++, variadic functions don't need to have any named
parameters. `void f(...) { }` is legal in C++ and ClangIR needs to be
able to compile it.

3. Report an error when the return type is `void`. This check is correct
(`void` return is represented as no return in MLIR), but it is
redundant. This is already checked in `FuncType::verify`.

Since `FuncOp::verifyType` serves no useful purpose, delete it, along
with the test for `int variadic(...)` that was in
`clang/test/CIR/IR/invalid.cir`.
Currently, the following code snippet fails during CIR codegen with
exceptions enabled:
```
#include <string>

void foo(const char *path) {
  std::string str = path;
  str = path;
  str = path;
}
```
using `bin/clang++ tmp.cpp -fclangir -Xclang -emit-cir -S`, the error: 
```
error: empty block: expect at least a terminator
```
Relevant part of the CIR before verification looks like: 
```
%118 = "cir.load"(%114) : (!cir.ptr<!cir.ptr<!cir.int<s, 8>>>) -> !cir.ptr<!cir.int<s, 8>>
"cir.try"() <{catch_types = [#cir.unwind], cleanup, synthetic}> ({
    %123 = "cir.call"(%115, %118) <{ast = #cir.call.expr.ast, callee = @_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEaSEPKc, calling_conv = 1 : i32, exception, extra_attrs = #cir<extra({})>, side_effect = 1 : i32}> ({
    "cir.call"(%115) <{callee = @_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEED1Ev, calling_conv = 1 : i32, extra_attrs = #cir<extra({nothrow = #cir.nothrow})>, side_effect = 1 : i32}> ({
    }) : (!cir.ptr<!cir.struct<class "std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>" {!cir.struct<struct "std::__cxx11::basic_string<char>::_Alloc_hider" {!cir.ptr<!cir.int<s, 8>>} #cir.record.decl.ast>, !cir.int<u, 64>, !cir.struct<union "anon.0" padded {!cir.array<!cir.int<s, 8> x 16>, !cir.int<u, 64>, !cir.array<!cir.int<u, 8> x 8>} #cir.record.decl.ast>} #cir.record.decl.ast>>) -> ()
    "cir.yield"() : () -> ()
    }) : (!cir.ptr<!cir.struct<class "std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>" {!cir.struct<struct "std::__cxx11::basic_string<char>::_Alloc_hider" {!cir.ptr<!cir.int<s, 8>>} #cir.record.decl.ast>, !cir.int<u, 64>, !cir.struct<union "anon.0" padded {!cir.array<!cir.int<s, 8> x 16>, !cir.int<u, 64>, !cir.array<!cir.int<u, 8> x 8>} #cir.record.decl.ast>} #cir.record.decl.ast>>, !cir.ptr<!cir.int<s, 8>>) -> !cir.ptr<!cir.struct<class "std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>" {!cir.struct<struct "std::__cxx11::basic_string<char>::_Alloc_hider" {!cir.ptr<!cir.int<s, 8>>} #cir.record.decl.ast>, !cir.int<u, 64>, !cir.struct<union "anon.0" padded {!cir.array<!cir.int<s, 8> x 16>, !cir.int<u, 64>, !cir.array<!cir.int<u, 8> x 8>} #cir.record.decl.ast>} #cir.record.decl.ast>>
    "cir.store"(%123, %116) : (!cir.ptr<!cir.struct<class "std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>" {!cir.struct<struct "std::__cxx11::basic_string<char>::_Alloc_hider" {!cir.ptr<!cir.int<s, 8>>} #cir.record.decl.ast>, !cir.int<u, 64>, !cir.struct<union "anon.0" padded {!cir.array<!cir.int<s, 8> x 16>, !cir.int<u, 64>, !cir.array<!cir.int<u, 8> x 8>} #cir.record.decl.ast>} #cir.record.decl.ast>>, !cir.ptr<!cir.ptr<!cir.struct<class "std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>" {!cir.struct<struct "std::__cxx11::basic_string<char>::_Alloc_hider" {!cir.ptr<!cir.int<s, 8>>} #cir.record.decl.ast>, !cir.int<u, 64>, !cir.struct<union "anon.0" padded {!cir.array<!cir.int<s, 8> x 16>, !cir.int<u, 64>, !cir.array<!cir.int<u, 8> x 8>} #cir.record.decl.ast>} #cir.record.decl.ast>>>) -> ()
    "cir.yield"() : () -> ()
}, {
^bb0:              <--- EMPTY BLOCK
}) : () -> ()
```
There is an empty block! If you extend the snippet with more `str =
path;`, you get more empty blocks...

The issue is the `cir.resume` ops which should be in those empty blocks
from synthetic TryOp's aren't linked properly during the cleanup.

My suggestion: We should explicitly add `cir.resume` for synthetic
tryOp's, because we already know they have just an [unwind
handler](https://github.com/llvm/clangir/blob/8746bd4bbe777352c2935e9937449637a8943767/clang/lib/CIR/CodeGen/CIRGenCall.cpp#L506).
So, during
[CIRGenCleanup](https://github.com/llvm/clangir/blob/8746bd4bbe777352c2935e9937449637a8943767/clang/lib/CIR/CodeGen/CIRGenCleanup.cpp#L667)
we don't need to add `cir.resume` for synthetic TryOp's. This PR adds
this and a test.
Run clang-tidy on `clang/lib/CIR/CodeGen/CIRGenModule.cpp`. Accept all
of the recommended fixes, except for one suggestion to use
`std::any_of`. The vast majority of the changes had to do with the case
of identifiers, changing variables and parameters from `VarName` to
`varName`.
I noticed that `AtomicFenceOp` doesn't use `OptionalAttr` like mlir
llvmir. As a result, `getLLVMSyncScope` does't return `std::optional`.
Should I use `Arg` instead?
Cleans up default linkage query implementations. Removes duplicities
from `extraClassDeclaration` that are now introduced through
`CIRGlobalValueInterface`. This makes it more consistent with the
`llvm::GlobalValue` methods.
… functions (#1424)

This PR adds CIRGen and LLVM lowering support for base-to-derived and
derived-to-base cast operations on pointers to member functions.

This PR includes a new operation `cir.update_member` to help the LLVM
lowering procedure of such cast operations.

Resolve #973 .
CIR didn't work on structs with destructor but without constructor. Now
it is fixed.

Moreover, CUDA kernels must be emitted if it was referred to in the
destructor of a non-device variable. It seems already working, so I just
unblocked the code path.
Currently, the following code snippet fails during CodeGen, using
`clang++ tmp.cpp -fclangir -Xclang -emit-cir -S`:
```
#include <fstream>

void foo(const char *path) {
  std::ofstream fout1(path);
  fout1 << path;
  std::ofstream fout2(path);
  fout2 << path;
}
```
It fails with: 
```
error: 'cir.yield' op expects parent op to be one of 'cir.if, cir.scope, cir.switch, cir.while, cir.for, cir.await, cir.ternary, cir.global, cir.do, cir.try, cir.array.ctor, cir.array.dtor, cir.call, cir.case'
```
The relevant part of the CIR dump before verification looks like: 
```
    "cir.br"()[^bb1] : () -> ()
  ^bb1:  // pred: ^bb0
    "cir.yield"() : () -> ()
    "cir.return"() : () -> ()
  }) : () -> ()
```
Two things are wrong: the YieldOp has `cir.func` as a parent and there
is a `cir.return` too. These come right after the second destructor for
`basic_ofstream`.

This PR fixes this by checking if there is a terminator and removing (if
it exists) before adding an implicit return. I have also added a test
that mimics the behavior of `std::basic_ofstream`.
Currently `__shared__` and `__constant__` variables are ignored by
CodeGen. This patch fixes this.
(It is also fixed in #1436 .)

Device and constant variables should be marked as
`externally_initialized`, as they might be initialized by host, rather
than on device. We can't identify which variables are device ones at
lowering stage, so this patch adds a new attribute for it in CodeGen.

Similar to `__global__` functions, global variables on device
corresponds to "shadow" variables on host, and they must be registered
to their counterpart. I added a `CUDAShadowNameAttr` in this patch for
later use, but I didn't insert code to actually generate it.
The generation is quite complicated so I plan to separate it into
several parts.

The registration function should be like:
```cpp
const char *__cuda_fatbin_str = /* Raw content of file in -fcuda-include-gpubinary */;
struct {
  int magicNum, version;
  void *binaryData, *unused;
} __cuda_fatbin_wrapper = { /*CUDA Magic Num*/, 1, __cuda_fatbin_str, nullptr };

void __cuda_module_ctor() {
  handle = __cudaRegisterFatBinary(&wrapper);
  __cuda_register_globals();
}
```
In this PR, we generate everything except the `__cuda_register_globals`
function.

OG doesn't give a name to `__cuda_fatbin_str`, which isn't allowed for
cir::GlobalOp, so I invented a name for it. Other names are kept
consistent with OG.
This PR adds the flag `-emit-mlir-llvm` to allow emitting of MLIR in the
LLVM dialect (cc @xlauko who asked me to do this).
I'm not sure if the naming of the flag is the best and maybe someone
will have a better idea.
Another solution would be to make the `-emit-mlir` flag have a value,
that specifies the target dialect (CIR/MLIR std dialects/LLVM Dialect).
GCC, unlike clang, issues a warning when one virtual function is
overridden in a derived class but one or more other virtual functions
with the same name and different signature from a base class are not
overridden. This leads to many warnings in the MLIR and ClangIR code
when using the OpenConversionPattern<>::matchAndRewrite() function in
the ordinary way. The "hiding" behavior is what we want.
terapines-osc-cir and others added 28 commits June 17, 2025 16:43
As the scf dialect does not support early exits, it might be necessary
to change the body of WhileOp to implement the semantics of ContinueOp.
I choose to add a guard `if (!cond)` for everything following the
`continue`.

Co-authored-by: Yue Huang <[email protected]>
This PR is related to #1685 and adds some basic support for the printf
function.

Limitations:
1. It only works if all variadic params are of basic interger/float type
(for more info why memref type operands don't work see #1685)
2. Only works if the format string is definied directly inside the
printf function

The downside of this PR is also that the handling this edge case adds
significant code bloat and reduces readability for the cir.call op
lowering (I tried to insert some meanigful comments to improve the
readability), but I think its worth to do this so we have some basic
printf support (without adding an extra cir operation) until upstream
support for variadic functions is added to the func dialect. Also a few
more test (which use such a basic form of printf) in the llvm Single
Source test suite are working with this PR:

before this PR:

Testing Time: 4.00s

Total Discovered Tests: 1833
Passed : 420 (22.91%)
Failed : 10 (0.55%)
Executable Missing: 1403 (76.54%)

with this PR:

Testing Time: 10.29s

Total Discovered Tests: 1833
  Passed            :  458 (24.99%)
  Failed            :    6 (0.33%)
  Executable Missing: 1369 (74.69%)
This PR addresses the feedback from
llvm/llvm-project#142041 (comment).
Our algorithm for accumulating bitfields has diverged from CodeGen since
Clang 19.

There is one key difference: in CIR, we use the function
`getBitfieldStorageType`, which checks whether the bit width of the
current accumulation run is a valid fundamental width (i.e., a power of
two: 8, 16, 32, 64). If it is, it returns a CIR type of that size
otherwise, it returns an array with the closest alignment.

For example, given the following struct:

```c
struct S {
  int a : 4;
  int b : 27;
  int c : 17;
  int d : 2;
  int e : 15;
  unsigned f;
};
```

The CodeGen output is:

```llvm
%struct.S = type { i64, i16, i32 }
```

Whereas the new CIR algorithm produces:

```mlir
!cir.record<struct "S" {!cir.array<!u8i x 7>, !u16i, !u32i}>
```
In CIR, the algorithm accumulates up to field `d`, resulting in 50
accumulated bits. Since 50 is not a fundamental width, the closest
alignment is 56 bits, which leads to the type `!cir.array<!u8i x 7>`.
The algorithm stops before accumulating field `e` because including it
would exceed the register size (64), which is not ideal.

At this point, it's unclear whether this divergence from CodeGen
represents an improvement. If we wanted to match CodeGen exactly, we
would need to replace the use of `getBitfieldStorageType` with
`getUIntNType`. The difference is that `getUIntNType` always returns the
closest power-of-two integer type instead of falling back to an array
when the size is not a fundamental width.

With this change, CIR would match CodeGen's layout exactly. It would
require the following small code change:
```diff
diff --git a/clang/lib/CIR/CodeGen/CIRRecordLayoutBuilder.cpp b/clang/lib/CIR/CodeGen/CIRRecordLayoutBuilder.cpp
index 7c1802b..17538b191738 100644
--- a/clang/lib/CIR/CodeGen/CIRRecordLayoutBuilder.cpp
+++ b/clang/lib/CIR/CodeGen/CIRRecordLayoutBuilder.cpp
@@ -616,7 +616,7 @@ CIRRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
       if (!InstallBest) {
         // Determine if accumulating the just-seen span will create an expensive
         // access unit or not.
-        mlir::Type Type = getBitfieldStorageType(astContext.toBits(AccessSize));
+        mlir::Type Type = getUIntNType(astContext.toBits(AccessSize));
         if (!astContext.getTargetInfo().hasCheapUnalignedBitFieldAccess())
           llvm_unreachable("NYI");

@@ -674,12 +674,12 @@ CIRRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
         // remain there after a stable sort.
         mlir::Type Type;
         if (BestClipped) {
-          assert(getSize(getBitfieldStorageType(
+          assert(getSize(getUIntNType(
                      astContext.toBits(AccessSize))) > AccessSize &&
                  "Clipped access need not be clipped");
           Type = getByteArrayType(AccessSize);
         } else {
-          Type = getBitfieldStorageType(astContext.toBits(AccessSize));
+          Type = getUIntNType(astContext.toBits(AccessSize));
           assert(getSize(Type) == AccessSize &&
                  "Unclipped access must be clipped");
         }
```
You can see a comparison between the two functions
https://godbolt.org/z/qjx1MaEWG.

I'm currently unsure whether using one function over the other has
performance implications. Regarding the **ARM error I mentioned in the
meeting: it was an `assert` I had forgotten to update. It's now fixed
sorry for the confusion.**
- Create CIR specific EnumAttr bases and prefix enum attributes with `CIR_` that automatically puts enum to `cir` namespace

- Removes unnecessary enum case definitions

- Unifies naming of enum values to use capitals consistently and make enumerations to start from 0

- Remove now unnecessary printers/parsers that gets to be generated automatically
)

No test cases provided for `lzcnt_u16` as presented in the OG codegen
equivalent: `test/CodeGen/X86/lzcnt-builtins.c`.

related: #1404
Implement base-2 exponential intrinsic as part of #1192
…1671)

Hi, This is my first here! Tried to mirror some of the patterns already
presented in both the codegen lib and its tests

I'm very excited to start contributing and potentially making an impact
in this project! feedback is much appreciated.
convert from codegen
```c++
      assert(!Base.isVirtual() && "should not see vbases here");
      auto *BaseRD = Base.getType()->getAsCXXRecordDecl();
      Address V = CGF.GetAddressOfDirectBaseInCompleteClass(
          Dest.getAddress(), CXXRD, BaseRD,
          /*isBaseVirtual*/ false);
      AggValueSlot AggSlot = AggValueSlot::forAddr(
          V, Qualifiers(),
          AggValueSlot::IsDestructed,
          AggValueSlot::DoesNotNeedGCBarriers,
          AggValueSlot::IsNotAliased,
          CGF.getOverlapForBaseInit(CXXRD, BaseRD, Base.isVirtual()));
      CGF.EmitAggExpr(InitExprs[curInitIndex++], AggSlot);

      if (QualType::DestructionKind dtorKind =
              Base.getType().isDestructedType())
        CGF.pushDestroyAndDeferDeactivation(dtorKind, V, Base.getType());
```
Moved rd related intrinsic tests, to a different file similar to
`clang/test/CodeGen/X86/rd-builtins.c`. Let me know if that's the right
call.

related: #1404
Update `__real__` operation to use ComplexRealOp and act directly on the
complex value.


Ref:
llvm/llvm-project#144235 (review)
Update `__imag__` operation to use ComplexRealOp and act directly on the
complex value.

Ref:
llvm/llvm-project#144235 (review)
… tzcnt_u64 (#1691)

Related: #1404

Implements codegen for the X86 builtins `tzcnt_u16`, `tzcnt_u32`, and
`tzcnt_u64`.

While adding tests for both the Intel and AMD variants of BMI
intrinsics, I ran into issues when placing them in the same file.

Both `_tzcnt_u16` (Intel) and `__tzcnt_u16`(AMD) map to the same inline
wrapper in <immintrin.h>. Whether they're isolated or both are present
in a test file, Clang emits only one definition (`__tzcnt_u16`) which I
think causes FileCheck mismatches i.e., the CHECK lines for the Intel
version (`test_tzcnt_u16`) would fail when looking for `_tzcnt_u16`.

I tried updating the CHECK lines for the Intel test to match the emitted
symbol (`__tzcnt_u16`), but it still failed unless the Intel test was
run in isolation, and only when CHECK was updated to `_tzcnt_u16` even
though `__tzcnt_u16` is what is emitted. I also experimented with
split-file to isolate the tests, but that didn’t resolve the issue
either.

To keep the tests independent, I split the Intel and AMD tests into
separate files. Was wondering if this was fine as in OG clang, both
Intel and AMD variants are in the same file
(https://github.com/llvm/clangir/blob/main/clang/test/CodeGen/X86/bmi-builtins.c)
As we need to preserve the ContinueOp for inner loops when we convert
for outer while-loops, we must not mark cir dialect as illegal.
Otherwise, MLIR rejects this kind of preservation and considers it as a
pass failure.
It seems we need another way to check whether the CIR is fully lowered.

Co-authored-by: Yue Huang <[email protected]>
Backport ChooseExpr for Scalar expr
Backporting the VecCreateOp Folder from the upstream
Backporting the VecSplatOp simplifier from the upstream
Implement ChooseExpr for ComplexType
Backporting the VecTernaryOp folder
In [libstdc++ std::variant
implementation](https://github.com/gcc-mirror/gcc/blob/b0419798447ae25de2f58d1a695db6dadb5d8547/libstdc%2B%2B-v3/include/std/variant#L387-L394),
union without any fields is used.

According to current CodeGen logic, append 1 byte padding for this kind
of union.
Handle this union in `mlir::RecordType` for getLargestMember` return
nullptr also.

The original LLVM IR
```llvm
%union.EmptyUnion = type { i8 }
@__const._Z2f0v.e = private unnamed_addr constant %union.EmptyUnion undef, align 1

define dso_local void @_Z2f0v() #0 {
entry:
  %e = alloca %union.EmptyUnion, align 1
  call void @llvm.memcpy.p0.p0.i64(ptr align 1 %e, ptr align 1 @__const._Z2f0v.e, i64 1, i1 false)
  ret void
}
```
The CIR lowered LLVM IR
```llvm
%union.EmptyUnion = type { i8 }

define dso_local void @_Z2f0v() #0 {
  %1 = alloca %union.EmptyUnion, i64 1, align 1
  store %union.EmptyUnion undef, ptr %1, align 1
  ret void
}
```
The major different is original use global const and memcpy, the current
use store. The difference between the two is not related to this
revision.
Two things:

1. Added some NYI placeholders
2. Tests for i386(x86) are pending as we haven't dealt with that triple
yet as compared to CG.
- Generalizes CIRFPTypeInterface files to CIRTypeInterfaces for future type interfaces additions.
- Renames CIRFPTypeInterface to FPTypeInterface.
- Fixes FPTypeInterface tablegen prefix.
@xlauko
Copy link
Collaborator Author

xlauko commented Aug 18, 2025

This stack of pull requests is managed by Graphite. Learn more about stacking.

@xlauko xlauko changed the title XXX [CIR] Refactor record type Aug 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.