[pull] main from llvm:main #693

pull · 2025-11-12T23:51:04Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

Our driver warns under various circumstances if SDK directories can't be found. This warning is not applicable in ThinLTO codegen mode (-fthinlto-index=). Suppress it. The motivation for doing this is that we sometimes see this warning emitted when DTLTO invokes the compiler on a remote machine to do the LTO backend compilations (with -fthinlto-index=). Internal Ref: TOOLCHAIN-20592

Follow up on c2d4c7c ([VPlan] Permit more users in narrowToSingleScalars) to fix an assert related to WidenStore users of the recipe being narrowed in narrowToSingleScalars.

Previously, we had 2 level of attributes: - HLSLUnparsedSemantic - N attributes, one for each known system semantic. The first was assigned during parsing, and carried no other meaning than "there is a semantic token". It was then converted to one of the N attributes later during Sema. Those attributes also carried informations like "is indexable" or "is index explicit". This had a few issues: - there was no difference between a semantic attribute applied to a decl, and the effective semantic in the entrypoint use context. - having the indexable bit was not useful. - semantic constraints checks were split between .td files and sema. Also, existing implementation had effective attributes attached to the type decl or parameters, meaning struct decl reuse across entrypoints of in a nested type was not supported, even if legal in HLSL. This PR tried to simplifies semantic attribute by having 3 attributes: - HLSLUnpasedSemantic - HLSLParsedSemantic - HLSLAppliedSemantic Initial parsing emits an `HLSLUnparsedSemantic`. We simply say "here is an HLSL semantic token", but we don't do any semantic check. Then, Sema does initial validation and transforms an UnparseSemantic into a ParsedSemantic. This validates a system semantic is known, or that the associated type is valid (like uint3 for a ThreadIndex). Then, once we parse an actual shader entrypoint, we can know how semantics are used in a real context. This step emits a list of AppliedSemantic. Those are the actual semantic in use for this specific entrypoint. Those attributes are attached to each entrypoint parameter, as a flat list matching the semantic structure flattening HLSL defines. At this stage of sema, index collision or other stage compabitility checkes are carried. This allows codegen to simply iterate over this list and emit the proper DXIL or SPIR-V codegen.

…am (#167724) This got exposed by `09262656f32ab3f2e1d82e5342ba37eecac52522`. The underlying stream of `m_os` is referenced by the `TextDiagnostic` member of `TextDiagnosticPrinter`. It got turned into a `llvm::formatted_raw_ostream` in the commit above. When `~TextDiagnosticPrinter` (and thus `~TextDiagnostic`) is invoked, we now call `~formatted_raw_ostream`, which tries to access the underlying stream. But `m_os` was already deleted because it is earlier in the order of destruction in `TextDiagnosticPrinter`. Move the `m_os` member before the `TextDiagnosticPrinter` to avoid a use-after-free. Drive-by: * Also move the `m_output` member which the `m_os` holds a reference to. The fact it's a reference indicates the expectation is most likely that the string outlives the stream. The ASAN macOS bot is currently failing with this: ``` 08:15:39 ================================================================= 08:15:39 ==61103==ERROR: AddressSanitizer: heap-use-after-free on address 0x60600012cf40 at pc 0x00012140d304 bp 0x00016eecc850 sp 0x00016eecc848 08:15:39 READ of size 8 at 0x60600012cf40 thread T0 08:15:39 #0 0x00012140d300 in llvm::formatted_raw_ostream::releaseStream() FormattedStream.h:205 08:15:39 #1 0x00012140d3a4 in llvm::formatted_raw_ostream::~formatted_raw_ostream() FormattedStream.h:145 08:15:39 #2 0x00012604abf8 in clang::TextDiagnostic::~TextDiagnostic() TextDiagnostic.cpp:721 08:15:39 #3 0x00012605dc80 in clang::TextDiagnosticPrinter::~TextDiagnosticPrinter() TextDiagnosticPrinter.cpp:30 08:15:39 #4 0x00012605dd5c in clang::TextDiagnosticPrinter::~TextDiagnosticPrinter() TextDiagnosticPrinter.cpp:27 08:15:39 #5 0x0001231fb210 in (anonymous namespace)::StoringDiagnosticConsumer::~StoringDiagnosticConsumer() ClangModulesDeclVendor.cpp:47 08:15:39 #6 0x0001231fb3bc in (anonymous namespace)::StoringDiagnosticConsumer::~StoringDiagnosticConsumer() ClangModulesDeclVendor.cpp:47 08:15:39 #7 0x000129aa9d70 in clang::DiagnosticsEngine::~DiagnosticsEngine() Diagnostic.cpp:91 08:15:39 #8 0x0001230436b8 in llvm::RefCountedBase<clang::DiagnosticsEngine>::Release() const IntrusiveRefCntPtr.h:103 08:15:39 #9 0x0001231fe6c8 in (anonymous namespace)::ClangModulesDeclVendorImpl::~ClangModulesDeclVendorImpl() ClangModulesDeclVendor.cpp:93 08:15:39 #10 0x0001231fe858 in (anonymous namespace)::ClangModulesDeclVendorImpl::~ClangModulesDeclVendorImpl() ClangModulesDeclVendor.cpp:93 ... 08:15:39 08:15:39 0x60600012cf40 is located 32 bytes inside of 56-byte region [0x60600012cf20,0x60600012cf58) 08:15:39 freed by thread T0 here: 08:15:39 #0 0x0001018abb88 in _ZdlPv+0x74 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x4bb88) 08:15:39 #1 0x0001231fb1c0 in (anonymous namespace)::StoringDiagnosticConsumer::~StoringDiagnosticConsumer() ClangModulesDeclVendor.cpp:47 08:15:39 #2 0x0001231fb3bc in (anonymous namespace)::StoringDiagnosticConsumer::~StoringDiagnosticConsumer() ClangModulesDeclVendor.cpp:47 08:15:39 #3 0x000129aa9d70 in clang::DiagnosticsEngine::~DiagnosticsEngine() Diagnostic.cpp:91 08:15:39 #4 0x0001230436b8 in llvm::RefCountedBase<clang::DiagnosticsEngine>::Release() const IntrusiveRefCntPtr.h:103 08:15:39 #5 0x0001231fe6c8 in (anonymous namespace)::ClangModulesDeclVendorImpl::~ClangModulesDeclVendorImpl() ClangModulesDeclVendor.cpp:93 08:15:39 #6 0x0001231fe858 in (anonymous namespace)::ClangModulesDeclVendorImpl::~ClangModulesDeclVendorImpl() ClangModulesDeclVendor.cpp:93 ... 08:15:39 08:15:39 previously allocated by thread T0 here: 08:15:39 #0 0x0001018ab760 in _Znwm+0x74 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x4b760) 08:15:39 #1 0x0001231f8dec in lldb_private::ClangModulesDeclVendor::Create(lldb_private::Target&) ClangModulesDeclVendor.cpp:732 08:15:39 #2 0x00012320af58 in lldb_private::ClangPersistentVariables::GetClangModulesDeclVendor() ClangPersistentVariables.cpp:124 08:15:39 #3 0x0001232111f0 in lldb_private::ClangUserExpression::PrepareForParsing(lldb_private::DiagnosticManager&, lldb_private::ExecutionContext&, bool) ClangUserExpression.cpp:536 08:15:39 #4 0x000123213790 in lldb_private::ClangUserExpression::Parse(lldb_private::DiagnosticManager&, lldb_private::ExecutionContext&, lldb_private::ExecutionPolicy, bool, bool) ClangUserExpression.cpp:647 08:15:39 #5 0x00012032b258 in lldb_private::UserExpression::Evaluate(lldb_private::ExecutionContext&, lldb_private::EvaluateExpressionOptions const&, llvm::StringRef, llvm::StringRef, std::__1::shared_ptr<lldb_private::ValueObject>&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, lldb_private::ValueObject*) UserExpression.cpp:280 08:15:39 #6 0x000120724010 in lldb_private::Target::EvaluateExpression(llvm::StringRef, lldb_private::ExecutionContextScope*, std::__1::shared_ptr<lldb_private::ValueObject>&, lldb_private::EvaluateExpressionOptions const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, lldb_private::ValueObject*) Target.cpp:2905 08:15:39 #7 0x00011fc7bde0 in lldb::SBTarget::EvaluateExpression(char const*, lldb::SBExpressionOptions const&) SBTarget.cpp:2305 08:15:39 ==61103==ABORTING ... ```

…run for any region. (#162025) During init of unclustered schedule stage, minOccupancy may be temporarily increased. But subsequently, if none of the regions are scheduled because they don't meet the conditions of initGCNRegion, minOccupancy remains incorrectly set. This patch avoids this incorrectness by delaying the change of minOccupancy until a region is about to be scheduled.

This adds handling for null base class initialization, but only for the trivial case where the class is empty. This also moves emitCXXConstructExpr to CIRGenExprCXX.cpp for consistency with classic codegen and the incubator repo.

…Dimensions from Affine Maps (#167587) This PR exposes `linalg::inferContractionDims(ArrayRef<AffineMap>)` to Python, allowing users to infer contraction dimensions (batch/m/n/k) directly from a list of affine maps without needing an operation. --------- Signed-off-by: Bangtian Liu <[email protected]>

…m growth (#167579) On Apple's platforms, the size of the shared cache grows steadily. As it grows, so does its list of ObjC classes. LLDB currently accepts an upper limit to the number of classes when it extracts the class information. Every few years we will hit the limit and increase the upper limit of classes. This approach is fundamentally unsustainable. On top of needing to manually adjust the number every few years, our current method requires us to allocate memory in the inferior process. On macOS this is usually not a problem, but on embedded devices there is usually a limit to how much memory a process can allocate before they are killed by the OS. My solution involves running the metadata extraction logic multiple times. I've added a new parameter to our utility function `start_idx` that keeps track of where it stopped during the previous run so that it may pick up again where it stopped. rdar://91398396

Adds `transform.xegpu.convert_layout` transform op that inserts an `xegpu.convert_layout` op for a given `Value`.

This test attempts to disassemble every Code symbol in Foundation. There's no need to disassemble every code symbol and this certainly does not scale. In some cases, this test can take multiple minutes to run or even time out.

…iagnosticManagerAdapter (#167731) This aligns `ClangDiagnosticManagerAdapter` with how we set up the diagnostics in `ClangModulesDeclVendor`. We fixed lifetime issues around the same kind of setup here: #167724 This class didn't suffer from the same lifetime issue because it used `shared_ptr`s. So the stream wasn't freed before `~TextDiagnosticPrinter` accessing it. But that begged the question of why these are `shared_ptr`s in the first place. This patch makes these `unique_ptr`s and fixes the destruction order that would now be an issue.

…ing in `ExpandStridedMetadata` (#167615) `RewriteExtractAlignedPointerAsIndexOfViewLikeOp` tries to propagate `extract_aligned_pointer_as_index` through the view ops. `ViewLikeOpInterface` by itself doesn't guarantee to preserve the base pointer and `memref.view` is one such example, so limit pattern to a few specific ops.

On some systems (probably those with a more recent clang), building :Host errors out with a layering check violation due to the histedit.h system include. Opt it out of layering checks for now, similar to other targets that depend on non standard library system includes.

When the backend for the host target isn't enabled, Clang would report the default target as `unknown`. This currently breaks the libc CMake build, but shouldn't in the case where we're cross-compiling since we're given an explicit target and the default one isn't being used.

Fixed the polymprphic check for copy-in/copy-out, added regression tests. Changed MayNeedCopy() to return std::optional<bool> and renamed it to ActualArgNeedsCopy(). This function now returns true/false when it's known that actual arguments needs copy in/out, or std::nullopt to signify that it's now known, whether copy in/out is needed. Fixes #159149

- Dangling pointer (from std::string) is created and trigger crash on some Linux distributions under different build types.

No tests modified as there are none that explicitly stop at DynAllocaExpander, and we do not have enough of a pipeline to run those yet anyways. Reviewers: phoebewang, RKSimon, paperchalice, arsenm Reviewed By: arsenm Pull Request: #167740

Test changes are mostly noise. There are a few improvements and a few regressions.

`llvm::TypeSize` uses 64bit integers, so we should cast the `recordSize` before multiplying by 8 to prevent an overflow.

…167682)

Now that the caching seems to be working reasonably well, enable building and testing the entirety of the project to actually catch most of the build configuration issues that this workflow is intended to catch.

After my previous change (#167579), the string exceeded 16380 single-byte characters. MSVC did not like this, so I'm splitting it up into two strings.

Enables the terminal rule for remaining targets

Add the following `FEAT_MOPS_GO` instructions: * `SETGOP`, `SETGOM`, `SETGOE` * `SETGOPN`, `SETGOMN`, `SETGOEN` * `SETGOPT`, `SETGOMT`, `SETGOET` * `SETGOPTN`, `SETGOMTN`, `SETGOETN` as blogged about here: * https://developer.arm.com/community/arm-community-blogs/b/architectures-and-processors-blog/posts/future-architecture-technologies-poe2-and-vmte and as documented here: * https://developer.arm.com/documentation/109697/2025_09/Future-Architecture-Technologies

…be directly evaluated into the destination even when it might alias the source (#167344) Evaluate all aggregate compound literals into a temporary and then copy it to the destination if aliasing is possible. This fixes a latent issue exposed by #154490, where evaluating the RHS directly into the destination could ignore potential aliasing. rdar://164094548

Reverting 2 commits from the mainline. The origin of the issue, and the tentative fix-forward.

This adds handling in CIR's ScalarExprEmitter for CK_DerivedToBase cast expressions.

This PR adds support for emitting the promise declaration in coroutines and obtaining the `get_return_object()`.

…166213) This is in preparation for future changes in AMDGPU that will make more substantial use of bundles pre-RA. For now, simply test this with degenerate (single-instruction) bundles.

Directly update induction increments with step value created for wide inductions in createWidenInductionRecipes, which does not require looking up via RecipeBuilder.

This PR extracts visitation of paths stored in `CompilerInvocation` into a member function. We already have a second copy of this downstream, and I'm in the need of adding a third one.

Replace direct access to underlying IR instructions with VPlan-level equivalents, i.e. VPTypeAnalysis and pattern matching on the recipe. Removes a few uses of accessing underlying IR.

Add support for `nvvm.barrier0.[popc|and|or]` operation. It is added as a separate operation since `Barrier0Op` has no result. https://docs.nvidia.com/cuda/nvvm-ir-spec/#barrier-and-memory-fence This will be used in CUDA Fortran lowering: https://github.com/llvm/llvm-project/blob/49f55f4991227f3c7a2b8161bbf45c74b7023944/flang/lib/Optimizer/Builder/CUDAIntrinsicCall.cpp#L1081 And could be used later in the CUDA C/C++ with CIR https://github.com/llvm/llvm-project/blob/49f55f4991227f3c7a2b8161bbf45c74b7023944/clang/lib/Headers/__clang_cuda_device_functions.h#L524 --------- Co-authored-by: Guray Ozen <[email protected]>

Add support for handling parenthesized expressions in lifetime safety analysis. Modified the `OriginManager::get` method to ignore parentheses when retrieving origins by recursively calling itself on the unparenthesized expression. This ensures that expressions with extra parentheses are properly analyzed for lifetime safety issues.

Lit has a number of options controlling the output, but they don't compose very well. This breaks the existing options down into smaller, orthogonal options, and makes the existing options aliases of the new ones. This introduces the following options: --test-output {off,failed,all} --print-result-after {off,failed,all} --diagnostic-level {error,warning,note} --terse-summary --no-terse-summary --progress-bar (mirroring --no-progress-bar) --test-output and --print-result-after are not entirely orthogonal, as '--test-output X' requires that --print-result-after is set to at least X, and implicitly does so if it isn't already. Conversely, '--print-result-after Y' requires that --test-output is at most Y, and implicitly lowers if it is higher. This means that the following invocations have different end results, as they are applied in order: '--test-output all --print-result-after off' '--print-result-after off --test-output all' The following existing options are now aliases as follows: -q, --quiet '--diagnostic-level error --test-output off --terse-summary' -s, --succinct '--progress-bar --print-result-after failed' -v, --verbose '--test-output failed' -a, --show-all '--test-output all' These where all completely separate options and would override each other in ad-hoc ways, with no regard to the order they were given. This fixes #106643 This is based on the RFC https://discourse.llvm.org/t/rfc-new-command-line-options-for-controlling-llvm-lit-output/ with the addition of --terse-summary, which was a behaviour of -q that was not captured by the original RFC. This also diverges from the RFC in that --debug is NOT folded into --diagnostic-level, because it can be useful to debug any configuration, including those specifying --diagnostic-level. Example combination that is possible now but wasn't before: '--diagnostic-level error --test-output all --progress-bar' Another use case is aliases, where you can alias e.g: alias lit=llvm-lit --quiet but still override the specified default options.

@f0

For tail-calls we want to re-use the caller stack-frame and potentially need to copy stack arguments. For large stack arguments, such as by-val structs, this can lead to overwriting incoming stack arguments when preparing outgoing ones by copying them. E.g., in cases like %"struct.s1" = type { [19 x i32] } define void @f0(ptr byval(%"struct.s1") %0, ptr %1) { tail call void @F1(ptr %1, ptr byval(%"struct.s1") %0) ret void } declare void @F1(ptr, ptr) that swap arguments, the last bytes of %0 are on the stack, followed by %1. To prepare the outgoing arguments, %0 needs to be copied and %1 needs to be loaded into r0. However, currently the copy of %0 overwrites the location of %1, resulting in loading garbage into r0. We fix that by forcing the load to the pointer stack argument to happen before the copy.

s-barannikov and others added 30 commits November 12, 2025 20:52

[CodeGen] Use MCRegUnit in two more TRI methods (NFC) (#167680)

905c7aa

[VPlan] Fix assert in store-user in narrowToSingleScalars (#167686)

9ba738a

Follow up on c2d4c7c ([VPlan] Permit more users in narrowToSingleScalars) to fix an assert related to WidenStore users of the recipe being narrowed in narrowToSingleScalars.

[Docs] Fix typo in vp.load.ff intrinsic documentation. NFC (#167721)

830f690

DAG: Fix assert on nofpclass call with aggregate return (#167725)

24be0ba

[CIR] Handle null base class initialization (#167023)

a22834a

This adds handling for null base class initialization, but only for the trivial case where the class is empty. This also moves emitCXXConstructExpr to CIRGenExprCXX.cpp for consistency with classic codegen and the incubator repo.

[X86] Remove Redundant Default Destructor

4d772de

AArch64: Add baseline test for treating exp as known positive (#167603)

201a461

[MLIR][XeGPU][TransformOps] Add convert_layout op (#167342)

7f4a3a9

Adds `transform.xegpu.convert_layout` transform op that inserts an `xegpu.convert_layout` op for a given `Value`.

[libc++] Simplify the implementation of aligned_storage (#162459)

43ca08d

DAG: exp opcodes cannotBeOrderedNegativeFP (#167604)

0385a18

[clang][HLSL] Fix crash issue due to Twine usage

cc54ee8

- Dangling pointer (from std::string) is created and trigger crash on some Linux distributions under different build types.

[X86][NewPM] Port DynAllocaExpander to New PM

0c0c1a7

No tests modified as there are none that explicitly stop at DynAllocaExpander, and we do not have enough of a pipeline to run those yet anyways. Reviewers: phoebewang, RKSimon, paperchalice, arsenm Reviewed By: arsenm Pull Request: #167740

DAG: Use poison when widening build_vector (#167631)

782759b

Test changes are mostly noise. There are a few improvements and a few regressions.

[libunwind] Fix build error because of wrong register size (#167743)

e5e9c3b

[CIR] Cast record size to uint64 to prevent overflow (#167525)

a799a8e

`llvm::TypeSize` uses 64bit integers, so we should cast the `recordSize` before multiplying by 8 to prevent an overflow.

[AsmPrinter] Replace improper use of Register with MCRegUnit (NFC) (#…

47cef55

…167682)

[Github] Make bazel workflow run all tests (#167576)

919bff7

Now that the caching seems to be working reasonably well, enable building and testing the entirety of the project to actually catch most of the build configuration issues that this workflow is intended to catch.

[lldb] Split up shared cache objc metadata extractor body (#167761)

6806349

After my previous change (#167579), the string exceeded 16380 single-byte characters. MSVC did not like this, so I'm splitting it up into two strings.

arsenm and others added 17 commits November 12, 2025 21:12

CodeGen: Remove target hook for terminal rule (#165962)

dfdada1

Enables the terminal rule for remaining targets

Revert "[HLSL] Rework semantic handling as attributes #166796" (#167759)

1d2429b

Reverting 2 commits from the mainline. The origin of the issue, and the tentative fix-forward.

[CIR] Handle scalar DerivedToBase cast expressions (#167370)

260df80

This adds handling in CIR's ScalarExprEmitter for CK_DerivedToBase cast expressions.

[CIR] Emit promise declaration in coroutine (#166683)

cf9cb54

This PR adds support for emitting the promise declaration in coroutines and obtaining the `get_return_object()`.

[flang][OpenMP] Delete include of unused header, NFC (#167762)

66da12a

CodeGen/AMDGPU: Allow 3-address conversion of bundled instructions (#…

6636659

…166213) This is in preparation for future changes in AMDGPU that will make more substantial use of bundles pre-RA. For now, simply test this with degenerate (single-instruction) bundles.

[VPlan] Don't look up recipe for IV step via RecipeBuilder. (NFC)

53a65ba

Directly update induction increments with step value created for wide inductions in createWidenInductionRecipes, which does not require looking up via RecipeBuilder.

[clang] Extract CompilerInvocation::visitPaths() (#167420)

71763a5

This PR extracts visitation of paths stored in `CompilerInvocation` into a member function. We already have a second copy of this downstream, and I'm in the need of adding a third one.

[VPlan] Get opcode & type from recipe in adjustRecipesForReduction (NFC)

b6bcfde

Replace direct access to underlying IR instructions with VPlan-level equivalents, i.e. VPTypeAnalysis and pattern matching on the recipe. Removes a few uses of accessing underlying IR.

CodeGen: Fix CodeView crashes with empty llvm.dbg.cu (#163286)

2e489f7

[AMDGPU] Regenerate gfx1250 wmma MC test. NFC (#167773)

4b05581

pull bot locked and limited conversation to collaborators Nov 12, 2025

pull bot added the ⤵️ pull label Nov 12, 2025

pull bot merged commit a01a921 into optimizecompile:main Nov 12, 2025
26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[pull] main from llvm:main #693

[pull] main from llvm:main #693

Uh oh!

pull bot commented Nov 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

32 participants

[pull] main from llvm:main #693

[pull] main from llvm:main #693

Uh oh!

Conversation

pull bot commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

32 participants

pull bot commented Nov 12, 2025 •

edited

Loading