-
Notifications
You must be signed in to change notification settings - Fork 165
[CIR] Refactor record type #1835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
xlauko
wants to merge
2,580
commits into
main
Choose a base branch
from
users/xlauko/cir-record-type
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Lower vrnd64x and vrnd64xq
To give LoweringPrepare type information from `CIRGenTypeCache`, this PR adds two attributes to ModuleOp: ```mlir module attributes { cir.int_size = #cir.int_size<32>, cir.size_type_size = #cir.size_type_size<64>, ... } {} ``` The `CIRDataLayout` class is also extended to have `getPtrDiffTy` and so on. Some tests that only expects `cir.lang` and `cir.sob` are also changed to take this into account.
If type of operand is not integer, it can be handled like what I do in `__builtin_elementwise_exp`.
This patch adds support for simple cast operations on pointers to member functions, including: 1) casting pointers to member function values to boolean values; 2) reinterpret casts between pointers to member functions.
This uses the assembly format for the optional return type and keeps a custom printer/parser only for function parameters, which still require a custom form for ellipses.
Lower vrnd64z and vrnd64zq
for example, lower `cir.alloca !cir.array<!s32i x N>, !cir.ptr<!cir.array<!s32i x N>>` to `memref.alloca() : memref<Nxi32>` see #1405
Get rid of the function `FuncOp::verifyType`. The function performed three checks: 1. Check that `isa<cir::FuncType>(getFunctionType())`. This is a tautology that is always true, since the return type of `getFunctionType()` is already `cir::FuncType`. 2. Report an error if `type.isVarArg() && type.getNumInputs() == 0`, i.e. when a variadic function has no named parameters. That check is incorrect. In C++, variadic functions don't need to have any named parameters. `void f(...) { }` is legal in C++ and ClangIR needs to be able to compile it. 3. Report an error when the return type is `void`. This check is correct (`void` return is represented as no return in MLIR), but it is redundant. This is already checked in `FuncType::verify`. Since `FuncOp::verifyType` serves no useful purpose, delete it, along with the test for `int variadic(...)` that was in `clang/test/CIR/IR/invalid.cir`.
Currently, the following code snippet fails during CIR codegen with exceptions enabled: ``` #include <string> void foo(const char *path) { std::string str = path; str = path; str = path; } ``` using `bin/clang++ tmp.cpp -fclangir -Xclang -emit-cir -S`, the error: ``` error: empty block: expect at least a terminator ``` Relevant part of the CIR before verification looks like: ``` %118 = "cir.load"(%114) : (!cir.ptr<!cir.ptr<!cir.int<s, 8>>>) -> !cir.ptr<!cir.int<s, 8>> "cir.try"() <{catch_types = [#cir.unwind], cleanup, synthetic}> ({ %123 = "cir.call"(%115, %118) <{ast = #cir.call.expr.ast, callee = @_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEaSEPKc, calling_conv = 1 : i32, exception, extra_attrs = #cir<extra({})>, side_effect = 1 : i32}> ({ "cir.call"(%115) <{callee = @_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEED1Ev, calling_conv = 1 : i32, extra_attrs = #cir<extra({nothrow = #cir.nothrow})>, side_effect = 1 : i32}> ({ }) : (!cir.ptr<!cir.struct<class "std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>" {!cir.struct<struct "std::__cxx11::basic_string<char>::_Alloc_hider" {!cir.ptr<!cir.int<s, 8>>} #cir.record.decl.ast>, !cir.int<u, 64>, !cir.struct<union "anon.0" padded {!cir.array<!cir.int<s, 8> x 16>, !cir.int<u, 64>, !cir.array<!cir.int<u, 8> x 8>} #cir.record.decl.ast>} #cir.record.decl.ast>>) -> () "cir.yield"() : () -> () }) : (!cir.ptr<!cir.struct<class "std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>" {!cir.struct<struct "std::__cxx11::basic_string<char>::_Alloc_hider" {!cir.ptr<!cir.int<s, 8>>} #cir.record.decl.ast>, !cir.int<u, 64>, !cir.struct<union "anon.0" padded {!cir.array<!cir.int<s, 8> x 16>, !cir.int<u, 64>, !cir.array<!cir.int<u, 8> x 8>} #cir.record.decl.ast>} #cir.record.decl.ast>>, !cir.ptr<!cir.int<s, 8>>) -> !cir.ptr<!cir.struct<class "std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>" {!cir.struct<struct "std::__cxx11::basic_string<char>::_Alloc_hider" {!cir.ptr<!cir.int<s, 8>>} #cir.record.decl.ast>, !cir.int<u, 64>, !cir.struct<union "anon.0" padded {!cir.array<!cir.int<s, 8> x 16>, !cir.int<u, 64>, !cir.array<!cir.int<u, 8> x 8>} #cir.record.decl.ast>} #cir.record.decl.ast>> "cir.store"(%123, %116) : (!cir.ptr<!cir.struct<class "std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>" {!cir.struct<struct "std::__cxx11::basic_string<char>::_Alloc_hider" {!cir.ptr<!cir.int<s, 8>>} #cir.record.decl.ast>, !cir.int<u, 64>, !cir.struct<union "anon.0" padded {!cir.array<!cir.int<s, 8> x 16>, !cir.int<u, 64>, !cir.array<!cir.int<u, 8> x 8>} #cir.record.decl.ast>} #cir.record.decl.ast>>, !cir.ptr<!cir.ptr<!cir.struct<class "std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>" {!cir.struct<struct "std::__cxx11::basic_string<char>::_Alloc_hider" {!cir.ptr<!cir.int<s, 8>>} #cir.record.decl.ast>, !cir.int<u, 64>, !cir.struct<union "anon.0" padded {!cir.array<!cir.int<s, 8> x 16>, !cir.int<u, 64>, !cir.array<!cir.int<u, 8> x 8>} #cir.record.decl.ast>} #cir.record.decl.ast>>>) -> () "cir.yield"() : () -> () }, { ^bb0: <--- EMPTY BLOCK }) : () -> () ``` There is an empty block! If you extend the snippet with more `str = path;`, you get more empty blocks... The issue is the `cir.resume` ops which should be in those empty blocks from synthetic TryOp's aren't linked properly during the cleanup. My suggestion: We should explicitly add `cir.resume` for synthetic tryOp's, because we already know they have just an [unwind handler](https://github.com/llvm/clangir/blob/8746bd4bbe777352c2935e9937449637a8943767/clang/lib/CIR/CodeGen/CIRGenCall.cpp#L506). So, during [CIRGenCleanup](https://github.com/llvm/clangir/blob/8746bd4bbe777352c2935e9937449637a8943767/clang/lib/CIR/CodeGen/CIRGenCleanup.cpp#L667) we don't need to add `cir.resume` for synthetic TryOp's. This PR adds this and a test.
Run clang-tidy on `clang/lib/CIR/CodeGen/CIRGenModule.cpp`. Accept all of the recommended fixes, except for one suggestion to use `std::any_of`. The vast majority of the changes had to do with the case of identifiers, changing variables and parameters from `VarName` to `varName`.
I noticed that `AtomicFenceOp` doesn't use `OptionalAttr` like mlir llvmir. As a result, `getLLVMSyncScope` does't return `std::optional`. Should I use `Arg` instead?
Cleans up default linkage query implementations. Removes duplicities from `extraClassDeclaration` that are now introduced through `CIRGlobalValueInterface`. This makes it more consistent with the `llvm::GlobalValue` methods.
CIR didn't work on structs with destructor but without constructor. Now it is fixed. Moreover, CUDA kernels must be emitted if it was referred to in the destructor of a non-device variable. It seems already working, so I just unblocked the code path.
…e.cpp (#1426) Makes it consistent with C++ conventions and [OG](https://github.com/advay168/clangir/blob/436c635af6c7ec3d184a1f7e92a624acdf856991/clang/lib/CodeGen/CodeGenModule.cpp#L108).
This PR make CIR's AtomicFenceOp similar to MLIR LLVMIR's FenceOp. MLIR LLVMIR FenceOp: https://github.com/llvm/clangir/blob/0bedc285dc6fbd8486877887939c742c2ddaecfa/mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td#L1925-L1947
Lower vcaged_f64
Currently, the following code snippet fails during CodeGen, using `clang++ tmp.cpp -fclangir -Xclang -emit-cir -S`: ``` #include <fstream> void foo(const char *path) { std::ofstream fout1(path); fout1 << path; std::ofstream fout2(path); fout2 << path; } ``` It fails with: ``` error: 'cir.yield' op expects parent op to be one of 'cir.if, cir.scope, cir.switch, cir.while, cir.for, cir.await, cir.ternary, cir.global, cir.do, cir.try, cir.array.ctor, cir.array.dtor, cir.call, cir.case' ``` The relevant part of the CIR dump before verification looks like: ``` "cir.br"()[^bb1] : () -> () ^bb1: // pred: ^bb0 "cir.yield"() : () -> () "cir.return"() : () -> () }) : () -> () ``` Two things are wrong: the YieldOp has `cir.func` as a parent and there is a `cir.return` too. These come right after the second destructor for `basic_ofstream`. This PR fixes this by checking if there is a terminator and removing (if it exists) before adding an implicit return. I have also added a test that mimics the behavior of `std::basic_ofstream`.
Currently `__shared__` and `__constant__` variables are ignored by CodeGen. This patch fixes this. (It is also fixed in #1436 .) Device and constant variables should be marked as `externally_initialized`, as they might be initialized by host, rather than on device. We can't identify which variables are device ones at lowering stage, so this patch adds a new attribute for it in CodeGen. Similar to `__global__` functions, global variables on device corresponds to "shadow" variables on host, and they must be registered to their counterpart. I added a `CUDAShadowNameAttr` in this patch for later use, but I didn't insert code to actually generate it.
The generation is quite complicated so I plan to separate it into several parts. The registration function should be like: ```cpp const char *__cuda_fatbin_str = /* Raw content of file in -fcuda-include-gpubinary */; struct { int magicNum, version; void *binaryData, *unused; } __cuda_fatbin_wrapper = { /*CUDA Magic Num*/, 1, __cuda_fatbin_str, nullptr }; void __cuda_module_ctor() { handle = __cudaRegisterFatBinary(&wrapper); __cuda_register_globals(); } ``` In this PR, we generate everything except the `__cuda_register_globals` function. OG doesn't give a name to `__cuda_fatbin_str`, which isn't allowed for cir::GlobalOp, so I invented a name for it. Other names are kept consistent with OG.
This PR adds the flag `-emit-mlir-llvm` to allow emitting of MLIR in the LLVM dialect (cc @xlauko who asked me to do this). I'm not sure if the naming of the flag is the best and maybe someone will have a better idea. Another solution would be to make the `-emit-mlir` flag have a value, that specifies the target dialect (CIR/MLIR std dialects/LLVM Dialect).
GCC, unlike clang, issues a warning when one virtual function is overridden in a derived class but one or more other virtual functions with the same name and different signature from a base class are not overridden. This leads to many warnings in the MLIR and ClangIR code when using the OpenConversionPattern<>::matchAndRewrite() function in the ordinary way. The "hiding" behavior is what we want.
As the scf dialect does not support early exits, it might be necessary to change the body of WhileOp to implement the semantics of ContinueOp. I choose to add a guard `if (!cond)` for everything following the `continue`. Co-authored-by: Yue Huang <[email protected]>
This PR is related to #1685 and adds some basic support for the printf function. Limitations: 1. It only works if all variadic params are of basic interger/float type (for more info why memref type operands don't work see #1685) 2. Only works if the format string is definied directly inside the printf function The downside of this PR is also that the handling this edge case adds significant code bloat and reduces readability for the cir.call op lowering (I tried to insert some meanigful comments to improve the readability), but I think its worth to do this so we have some basic printf support (without adding an extra cir operation) until upstream support for variadic functions is added to the func dialect. Also a few more test (which use such a basic form of printf) in the llvm Single Source test suite are working with this PR: before this PR: Testing Time: 4.00s Total Discovered Tests: 1833 Passed : 420 (22.91%) Failed : 10 (0.55%) Executable Missing: 1403 (76.54%) with this PR: Testing Time: 10.29s Total Discovered Tests: 1833 Passed : 458 (24.99%) Failed : 6 (0.33%) Executable Missing: 1369 (74.69%)
This PR addresses the feedback from llvm/llvm-project#142041 (comment). Our algorithm for accumulating bitfields has diverged from CodeGen since Clang 19. There is one key difference: in CIR, we use the function `getBitfieldStorageType`, which checks whether the bit width of the current accumulation run is a valid fundamental width (i.e., a power of two: 8, 16, 32, 64). If it is, it returns a CIR type of that size otherwise, it returns an array with the closest alignment. For example, given the following struct: ```c struct S { int a : 4; int b : 27; int c : 17; int d : 2; int e : 15; unsigned f; }; ``` The CodeGen output is: ```llvm %struct.S = type { i64, i16, i32 } ``` Whereas the new CIR algorithm produces: ```mlir !cir.record<struct "S" {!cir.array<!u8i x 7>, !u16i, !u32i}> ``` In CIR, the algorithm accumulates up to field `d`, resulting in 50 accumulated bits. Since 50 is not a fundamental width, the closest alignment is 56 bits, which leads to the type `!cir.array<!u8i x 7>`. The algorithm stops before accumulating field `e` because including it would exceed the register size (64), which is not ideal. At this point, it's unclear whether this divergence from CodeGen represents an improvement. If we wanted to match CodeGen exactly, we would need to replace the use of `getBitfieldStorageType` with `getUIntNType`. The difference is that `getUIntNType` always returns the closest power-of-two integer type instead of falling back to an array when the size is not a fundamental width. With this change, CIR would match CodeGen's layout exactly. It would require the following small code change: ```diff diff --git a/clang/lib/CIR/CodeGen/CIRRecordLayoutBuilder.cpp b/clang/lib/CIR/CodeGen/CIRRecordLayoutBuilder.cpp index 7c1802b..17538b191738 100644 --- a/clang/lib/CIR/CodeGen/CIRRecordLayoutBuilder.cpp +++ b/clang/lib/CIR/CodeGen/CIRRecordLayoutBuilder.cpp @@ -616,7 +616,7 @@ CIRRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field, if (!InstallBest) { // Determine if accumulating the just-seen span will create an expensive // access unit or not. - mlir::Type Type = getBitfieldStorageType(astContext.toBits(AccessSize)); + mlir::Type Type = getUIntNType(astContext.toBits(AccessSize)); if (!astContext.getTargetInfo().hasCheapUnalignedBitFieldAccess()) llvm_unreachable("NYI"); @@ -674,12 +674,12 @@ CIRRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field, // remain there after a stable sort. mlir::Type Type; if (BestClipped) { - assert(getSize(getBitfieldStorageType( + assert(getSize(getUIntNType( astContext.toBits(AccessSize))) > AccessSize && "Clipped access need not be clipped"); Type = getByteArrayType(AccessSize); } else { - Type = getBitfieldStorageType(astContext.toBits(AccessSize)); + Type = getUIntNType(astContext.toBits(AccessSize)); assert(getSize(Type) == AccessSize && "Unclipped access must be clipped"); } ``` You can see a comparison between the two functions https://godbolt.org/z/qjx1MaEWG. I'm currently unsure whether using one function over the other has performance implications. Regarding the **ARM error I mentioned in the meeting: it was an `assert` I had forgotten to update. It's now fixed sorry for the confusion.**
- Create CIR specific EnumAttr bases and prefix enum attributes with `CIR_` that automatically puts enum to `cir` namespace - Removes unnecessary enum case definitions - Unifies naming of enum values to use capitals consistently and make enumerations to start from 0 - Remove now unnecessary printers/parsers that gets to be generated automatically
) No test cases provided for `lzcnt_u16` as presented in the OG codegen equivalent: `test/CodeGen/X86/lzcnt-builtins.c`. related: #1404
Implement base-2 exponential intrinsic as part of #1192
…1671) Hi, This is my first here! Tried to mirror some of the patterns already presented in both the codegen lib and its tests I'm very excited to start contributing and potentially making an impact in this project! feedback is much appreciated.
convert from codegen ```c++ assert(!Base.isVirtual() && "should not see vbases here"); auto *BaseRD = Base.getType()->getAsCXXRecordDecl(); Address V = CGF.GetAddressOfDirectBaseInCompleteClass( Dest.getAddress(), CXXRD, BaseRD, /*isBaseVirtual*/ false); AggValueSlot AggSlot = AggValueSlot::forAddr( V, Qualifiers(), AggValueSlot::IsDestructed, AggValueSlot::DoesNotNeedGCBarriers, AggValueSlot::IsNotAliased, CGF.getOverlapForBaseInit(CXXRD, BaseRD, Base.isVirtual())); CGF.EmitAggExpr(InitExprs[curInitIndex++], AggSlot); if (QualType::DestructionKind dtorKind = Base.getType().isDestructedType()) CGF.pushDestroyAndDeferDeactivation(dtorKind, V, Base.getType()); ```
Moved rd related intrinsic tests, to a different file similar to `clang/test/CodeGen/X86/rd-builtins.c`. Let me know if that's the right call. related: #1404
Update `__real__` operation to use ComplexRealOp and act directly on the complex value. Ref: llvm/llvm-project#144235 (review)
Update `__imag__` operation to use ComplexRealOp and act directly on the complex value. Ref: llvm/llvm-project#144235 (review)
… tzcnt_u64 (#1691) Related: #1404 Implements codegen for the X86 builtins `tzcnt_u16`, `tzcnt_u32`, and `tzcnt_u64`. While adding tests for both the Intel and AMD variants of BMI intrinsics, I ran into issues when placing them in the same file. Both `_tzcnt_u16` (Intel) and `__tzcnt_u16`(AMD) map to the same inline wrapper in <immintrin.h>. Whether they're isolated or both are present in a test file, Clang emits only one definition (`__tzcnt_u16`) which I think causes FileCheck mismatches i.e., the CHECK lines for the Intel version (`test_tzcnt_u16`) would fail when looking for `_tzcnt_u16`. I tried updating the CHECK lines for the Intel test to match the emitted symbol (`__tzcnt_u16`), but it still failed unless the Intel test was run in isolation, and only when CHECK was updated to `_tzcnt_u16` even though `__tzcnt_u16` is what is emitted. I also experimented with split-file to isolate the tests, but that didn’t resolve the issue either. To keep the tests independent, I split the Intel and AMD tests into separate files. Was wondering if this was fine as in OG clang, both Intel and AMD variants are in the same file (https://github.com/llvm/clangir/blob/main/clang/test/CodeGen/X86/bmi-builtins.c)
As we need to preserve the ContinueOp for inner loops when we convert for outer while-loops, we must not mark cir dialect as illegal. Otherwise, MLIR rejects this kind of preservation and considers it as a pass failure. It seems we need another way to check whether the CIR is fully lowered. Co-authored-by: Yue Huang <[email protected]>
Backport ChooseExpr for Scalar expr
Backporting the VecCreateOp Folder from the upstream
Backporting the VecSplatOp simplifier from the upstream
Implement ChooseExpr for ComplexType
Backporting the VecTernaryOp folder
In [libstdc++ std::variant implementation](https://github.com/gcc-mirror/gcc/blob/b0419798447ae25de2f58d1a695db6dadb5d8547/libstdc%2B%2B-v3/include/std/variant#L387-L394), union without any fields is used. According to current CodeGen logic, append 1 byte padding for this kind of union. Handle this union in `mlir::RecordType` for getLargestMember` return nullptr also. The original LLVM IR ```llvm %union.EmptyUnion = type { i8 } @__const._Z2f0v.e = private unnamed_addr constant %union.EmptyUnion undef, align 1 define dso_local void @_Z2f0v() #0 { entry: %e = alloca %union.EmptyUnion, align 1 call void @llvm.memcpy.p0.p0.i64(ptr align 1 %e, ptr align 1 @__const._Z2f0v.e, i64 1, i1 false) ret void } ``` The CIR lowered LLVM IR ```llvm %union.EmptyUnion = type { i8 } define dso_local void @_Z2f0v() #0 { %1 = alloca %union.EmptyUnion, i64 1, align 1 store %union.EmptyUnion undef, ptr %1, align 1 ret void } ``` The major different is original use global const and memcpy, the current use store. The difference between the two is not related to this revision.
Two things: 1. Added some NYI placeholders 2. Tests for i386(x86) are pending as we haven't dealt with that triple yet as compared to CG.
…r calls to memcpy (#1677)
- Generalizes CIRFPTypeInterface files to CIRTypeInterfaces for future type interfaces additions. - Renames CIRFPTypeInterface to FPTypeInterface. - Fixes FPTypeInterface tablegen prefix.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.