Skip to content

Conversation

@pull
Copy link

@pull pull bot commented Nov 12, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

topperc and others added 30 commits November 11, 2025 22:32
…c. (#167417)

ctype_utils/wctype_utils were chaged in
120689e and
e7f7973, respectively to operate on
char/wchar_t. Now we can switch to the overloaded names (e.g. have noth
`isspace(char` and `isspace(wchar_t)`) to simplify the templatized
strtointeger implementation from
315dfe5 and make it easier to
potentially add templatized strtofloat implementation.
I'm considering a operator>(MCRegister, unsigned) and
operator<(MCRegister, unsigned) so I have not updated those lines. Such
comparisons are common on MCRegister.
We don't actually support Windows builds at this time, so this is not
needed. I plan to add a different implementation once the
release-binaries workflow supports Windows again.
This patch adds `printOperation()` functions for deferred emission ops
in order to unify the API used for emitting operations.
No functional change intended.
…div/rem (#154072)

Since div/rem operations don’t support a mask operand, the lanes of the
divisor that are masked out are currently replaced with 1 using
VPInstruction::Select before the predicated div/rem operation.
This patch replaces
```
  VPInstruction::Select(logical_and(header_mask, conditional_mask), LHS, RHS)
```
with
```
  vp.merge(conditional_mask, LHS, RHS, EVL)
```
so that the header mask can be replaced by EVL in this usage scenario
when tail folding with EVL.
…5573)

Support for multi-image features has begun to be integrated into LLVM
with the MIF dialect.
In this PR, you will find lowering and operations related to the TEAM
features (`SYNC TEAM`, `GET_TEAM`, `FORM TEAM`, `CHANGE TEAM`,
`TEAM_NUMBER`).

Note regarding the operation for `CHANGE TEAM` : This operation is
partial because it does not support the associated list of coarrays
because the allocation of a coarray and the lowering of PRIF's
`prif_alias_{create|destroy}` procedures are not yet supported in Flang.
This will be integrated later.

Any feedback is welcome.
This is a follow-up from #166926 that ensures the hints are only added
once, and ensures that hints inserted by the register allocator take
priority over hints to reduce movprfx.
…ement array to a 1-element array (#166950)

When compiling with `-fembed-bitcode-marker`, Clang inserts a
placeholder
for the bitcode. This placeholder is a `[0 x i8]` array, which we cannot
represent in SPIRV.

For AMD flavored SPIRV, we extend the `llvm.embedded.module` global to a
`zeroinitializer [1 x i8]` array.

To achieve this, this patch adds a new pass, `SPIRVPrepareGlobals`, that
we can use to write global variable's _non-trivial-to-lower-IR_ ->
_trivial-to-lower-IR_ mappings.

This is a second attempt at
#162082, but cleaner.

In the translator something similar is done for every 0-element array
since KhronosGroup/SPIRV-LLVM-Translator#2743 .
But I don't think we want to do this mapping for all cases.
…esses (#167649)

Revert #166005 due to breaking x86 iOS sims

We're sometimes hitting a allocator assert when running x86 iOS sim
tests. I don't believe this PR is at fault, but there's probably a
memory safety / allocator issue somewhere which the allocation pattern
here is exposing.
This applies `[[nodiscard]]` according to our coding guidelines to
`basic_string`.
…7662)

The new test fails on x86 and arm64 public macOS bots:
```
09:27:59  ======================================================================
09:27:59  FAIL: test_append_frames (TestScriptedFrameProvider.ScriptedFrameProviderTestCase)
09:27:59     Test that we can add frames after real stack.
09:27:59  ----------------------------------------------------------------------
09:27:59  Traceback (most recent call last):
09:27:59    File "/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/lldb/test/API/functionalities/scripted_frame_provider/TestScriptedFrameProvider.py", line 122, in test_append_frames
09:27:59      self.assertEqual(new_frame_count, original_frame_count + 1)
09:27:59  AssertionError: 5 != 6
09:27:59  Config=arm64-/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/lldb-build/bin/clang
09:27:59  ======================================================================
09:27:59  FAIL: test_applies_to_thread (TestScriptedFrameProvider.ScriptedFrameProviderTestCase)
09:27:59     Test that applies_to_thread filters which threads get the provider.
09:27:59  ----------------------------------------------------------------------
09:27:59  Traceback (most recent call last):
09:27:59    File "/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/lldb/test/API/functionalities/scripted_frame_provider/TestScriptedFrameProvider.py", line 218, in test_applies_to_thread
09:27:59      self.assertEqual(
09:27:59  AssertionError: 5 != 1 : Thread with ID 1 should have 1 synthetic frame
09:27:59  Config=arm64-/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/lldb-build/bin/clang
09:27:59  ======================================================================
09:27:59  FAIL: test_prepend_frames (TestScriptedFrameProvider.ScriptedFrameProviderTestCase)
09:27:59     Test that we can add frames before real stack.
09:27:59  ----------------------------------------------------------------------
09:27:59  Traceback (most recent call last):
09:27:59    File "/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/lldb/test/API/functionalities/scripted_frame_provider/TestScriptedFrameProvider.py", line 84, in test_prepend_frames
09:27:59      self.assertEqual(new_frame_count, original_frame_count + 2)
09:27:59  AssertionError: 5 != 7
09:27:59  Config=arm64-/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/lldb-build/bin/clang
09:27:59  ======================================================================
09:27:59  FAIL: test_remove_frame_provider_by_id (TestScriptedFrameProvider.ScriptedFrameProviderTestCase)
09:27:59     Test that RemoveScriptedFrameProvider removes a specific provider by ID.
09:27:59  ----------------------------------------------------------------------
09:27:59  Traceback (most recent call last):
09:27:59    File "/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/lldb/test/API/functionalities/scripted_frame_provider/TestScriptedFrameProvider.py", line 272, in test_remove_frame_provider_by_id
09:27:59      self.assertEqual(thread.GetNumFrames(), 3, "Should have 3 synthetic frames")
09:27:59  AssertionError: 5 != 3 : Should have 3 synthetic frames
09:27:59  Config=arm64-/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/lldb-build/bin/clang
09:27:59  ======================================================================
09:27:59  FAIL: test_replace_all_frames (TestScriptedFrameProvider.ScriptedFrameProviderTestCase)
09:27:59     Test that we can replace the entire stack.
09:27:59  ----------------------------------------------------------------------
09:27:59  Traceback (most recent call last):
09:27:59    File "/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/lldb/test/API/functionalities/scripted_frame_provider/TestScriptedFrameProvider.py", line 41, in test_replace_all_frames
09:27:59      self.assertEqual(thread.GetNumFrames(), 3, "Should have 3 synthetic frames")
09:27:59  AssertionError: 5 != 3 : Should have 3 synthetic frames
09:27:59  Config=arm64-/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/lldb-build/bin/clang
09:27:59  ======================================================================
09:27:59  FAIL: test_scripted_frame_objects (TestScriptedFrameProvider.ScriptedFrameProviderTestCase)
09:27:59     Test that provider can return ScriptedFrame objects.
09:27:59  ----------------------------------------------------------------------
09:27:59  Traceback (most recent call last):
09:27:59    File "/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/lldb/test/API/functionalities/scripted_frame_provider/TestScriptedFrameProvider.py", line 159, in test_scripted_frame_objects
09:27:59      self.assertEqual(frame0.GetFunctionName(), "custom_scripted_frame_0")
09:27:59  AssertionError: 'thread_func(int)' != 'custom_scripted_frame_0'
09:27:59  - thread_func(int)
09:27:59  + custom_scripted_frame_0
09:27:59  
09:27:59  Config=arm64-/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/lldb-build/bin/clang
09:27:59  ----------------------------------------------------------------------
09:27:59  Ran 6 tests in 14.242s
09:27:59  
09:27:59  FAILED (failures=6)
```

Reverts #161870
Fold
 or (fcmp uno %A, %A), (fcmp uno %B, %B), ... ->
 or (fcmp uno %A, %B), ...

This pattern is generated to check if any vector lane is NaN, and
combining multiple compares is beneficial on architectures that have
dedicated instructions.

Alive2 Proof: https://alive2.llvm.org/ce/z/vA_aoM

Combine suggested as part of #161735

PR: #167251
This commit extends the tosa.intdiv operand/result types to allow int64
tensors.
CHARACTER dummy arguments were treated as local variables in debug info.
This happened because our method to get the argument number was not
robust. It relied on `DeclareOp` having a direct reference to arguments
which was not the case for character arguments. This is fixed by storing
source-level argument positions in `DeclareOp`.

Fixes #112886
Adds `transform.xegpu.insert_prefetch` transform op that inserts
`xegpu.prefetch_nd` ops for the given `Value` in an `scf.for` loop.
This is a follow-up of PR #165558 and #165993.

This patch updates the remaining two Ops to use the AnyTypeOf[]
construct, completing the migration for the mbarrier family of Ops.
```
mbarrier.arrive.expect_tx
mbarrier.try_wait.parity
```

Signed-off-by: Durgadoss R <[email protected]>
…ultiversion resolvers (#167516)

- Fixes #163369
- Segmentation fault occurred because resolver was calling TSan
instrumentation functions (__tsan_func_entry, __tsan_func_exit) but as
the resolver is run by the dynamic linker at load time, TSan is not
initialized yet so the current thread pointer is null.
- This PR adds the DisableSanitizerInstrumentation attribute to the
multiversion function resolvers to avoid issues like this.
- Added regression test for TSan segfault.
…7378)

This commit fixes support for the argmax operation by allowing fp8/bf16
input operands with an int64 output type in the profile compilance such
that it aligns with the spec.
…to arrays with UINT32_MAX elements (#166952)

In HIP, dynamic LDS variables are represented using `0-element` global
arrays in the `__shared__` language address-space.

```cpp
  extern __shared__ int LDS[];
```

These are not representable in SPIRV directly.

To represent them, for AMD, we use an array with `UINT32_MAX`-elements.
These are reverse translated to 0-element arrays later in AMD's SPIRV
runtime pipeline (in
[SPIRVReader.cpp](https://github.com/ROCm/SPIRV-LLVM-Translator/blob/8cb74e264ddcde89f62354544803dc8cdbac148d/lib/SPIRV/SPIRVReader.cpp#L358)).
The LLVM-customized GTest has a dependency on LLVM to support
`llvm::raw_ostream` and hence has to link to LLVMSupport. The runtimes
use the LLVMSupport from the bootstrapping LLVM build. The problem is
that the boostrapping compiler and the runtimes target can diverge in
their ABI, even in the runtimes default build. For instance, Clang is
built using gcc which uses libstdc++, but the runtimes is built by Clang
which can be configured to use libcxx by default. Altough it does not
use gcc, this issue has caused
[flang-aarch64-libcxx](https://lab.llvm.org/buildbot/#/builders/89)) to
break, and is still (again?) broken.

This patch makes the runtimes' GTest independent from LLVMSupport so we
do not link any runtimes component with LLVM components.

Runtime projects that use GTest unittests:
 * flang-rt
 * libc
* compiler-rt: Adds `gtest-all.cpp` with
[GTEST_NO_LLVM_SUPPORT=1](https://github.com/llvm/llvm-project/blob/f801b6f67ea896d6e4d2de38bce9a79689ceb254/compiler-rt/CMakeLists.txt#L723)
to each unittest without using `llvm_gtest`. Not touched by this PR.
* openmp: Handled by #159416. Not touched for now by this PR to avoid
conflict.

The current state of this PR tries to reuse
https://github.com/llvm/llvm-project/blob/main/third-party/unittest/CMakeLists.txt
as much as possible, altough personally I would prefer to make it use
"modern CMake" style. third-party/unittest/CMakeLists.txt will detect
whether it is used in runtimes build and adjaust accordingly. It creates
a different target for LLVM (`llvm_gtest`, NFCI) and another one for the
runtimes (`runtimes_gtest`). It is not possible to reuse `llvm_gtest`
for both since `llvm_gtest` is imported using `find_package(LLVM)` if
configured using LLVM_INSTALL_GTEST. An alias `default_gtest` is used to
select between the two. `default_gtest` could also be used for openmp
which also supports standalone and
[LLVM_ENABLE_PROJECTS](#152189)
build mode.
#166561)

This patch enables `aarch64-split-sve-objects` to handle hazard padding
in functions that use the SVE CC even when there are no predicate
spills/locals.

This improves the codegen over the base hazard padding implementation,
as rather than placing the padding in the callee-save area, it is placed
at the start of the ZPR area.

E.g., Current lowering:

```
sub sp, sp, #1040
str x29, [sp, #1024] // 8-byte Folded Spill
addvl sp, sp, #-1
str z8, [sp] // 16-byte Folded Spill
sub sp, sp, #1040
```

New lowering:

```
str x29, [sp, #-16]! // 8-byte Folded Spill
sub sp, sp, #1024
addvl sp, sp, #-1
str z8, [sp] // 16-byte Folded Spill
sub sp, sp, #1040
```

This also re-enables paired stores for GPRs (as the offsets no longer
include the hazard padding).
…ars with VALUE attribute (#166682)

Scalars with VALUE attribute are likely passed in registers, so it's now
clear what lowering should do with the array actual argument in this
case. Fail this case with an error before getting to lowering.
lukel97 and others added 2 commits November 12, 2025 11:14
…167505)

On RISC-V narrowInterleaveGroups doesn't kick in because the wrong
VectorRegWidth is passed to isConsecutiveInterleaveGroup.

narrowInterleaveGroups is always passed the RGK_FixedWidthVector
register size, but on RISC-V the RGK_ScalableVector size is twice as
large because we want to use LMUL 2. This causes the `GroupSize ==
VectorRegWidth` check to fail.

This fixes it by using the scalable register size whenever the VF is
scalable and plumbing it through as a potentially scalable TypeSize.

Note that this only makes a difference when tail folding is disabled, as
narrowInterleaveGroups can't handle EVL based IVs yet.
…167340)

We're likely to get better code from custom legalisation, where we can
remove unpack instructions (plus SVE2p1 has BFMLSLB/T), but we get much
of benefit with these two small changes.

NOTE: LLVM has no support for FEAT_AFP in terms of feature detection or
ACLE builtins, so the compiler works under the assumption the feature is
not enabled.

Patch is also more aggressive when enabling bfloat fma construction
because it removes unnecessary rounding which is generally preferable
regardless of whether BFMLALB is used or not.
@pull pull bot locked and limited conversation to collaborators Nov 12, 2025
@pull pull bot added the ⤵️ pull label Nov 12, 2025
@pull pull bot merged commit 46e9d63 into optimizecompile:main Nov 12, 2025
20 of 21 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.