Skip to content

Conversation

@pull
Copy link

@pull pull bot commented Oct 20, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

davemgreen and others added 30 commits October 20, 2025 12:53
This patch pivots GPR32 and GPR64 zeroing into distinct branches to
simplify the code an improve the lowering.

Zeroing GPR moves are now handled differently than non-zeroing ones.
Zero source registers WZR and XZR do not require register annotations of
undef, implicit and kill. The non-zeroing source now cannot process WZR
removing the ternary expression. This patch also moves GPR64 logic right
after GPR32 for better organization.
…4071)

Add documentation for the no-rollback conversion driver. Also improve
the documentation of the old rollback driver. In particular: which
modifications are performed immediately and which are delayed.
Handle ptrtoaddr the same way as ptrtoint. The fold already only
operates on the index/address bits.
If a main instruction in the copyables is a div-like instruction, the
compiler cannot pack duplicates, extending with poisons, these
instructions, being vectorize, will result in undefined behavior.

Fixes #164185
`UnqualPtrTy` didn't always match `llvm::PointerType::getUnqual`:
sometimes it returned a pointer that is not in address space 0 (notably
for SPIRV).

Since `UnqualPtrTy` was used as the "generic" or "default" pointer type,
this patch renames it to `DefaultPtrTy` to avoid confusion with LLVM's
`PointerType::getUnqual`.
All the existing tests test code either in ConstantFolding or
InstSimplify, so move them to use -passes=instsimplify instead of
-passes=instcombine. This makes sure we keep InstSimplify coverage
even if there are subsuming InstCombine folds.

This requires writing some of the constant folding tests in a
different way, as InstSimplify does not try to re-fold already
existing constant expressions.
This reverts commit 1943c9e.

This took out quite a few buildbots. Some of the Z3 test cases are failing
and enabling this is causing some LLVM tests to begin failing.
Add parsing and semantic checks for DEVICE_SAFESYNC clause. No lowering.
This PR fixes a crash in the `bf_getbuffer` implementation of
`PyDenseElementsAttribute` that occurred when an element type was not
supported, such as `bf16`. I believe that supportion `bf16` is not
possible with that protocol but that's out of the scope of this PR.
Previsouly, the code raised an `std::exception` out of `bf_getbuffer`
that nanobind does not catch (see also pybind/pybind11#3336). The PR
makes the function catch all `std::exception`s and manually raises a
Python exception instead.

Signed-off-by: Ingo Müller <[email protected]>
Add test with urem guard with non-constant divisor and AddRec guards.

Extra test coverage for #163021
OpenACC 3.4 includes the ability to add an 'if' to an atomic operation.

From the change log:
`Added the if clause to the atomic construct to enable conditional
atomic operations based867
on the parallelism strategy employed`

In 2.12, the C/C++ grammar is changed to say: 
`#pragma acc atomic [ atomic-clause ] [ if( condition ) ] new-line`

With corresponding changes to the Fortran standard

This patch adds support to this for the dialect, so that Clang can use
it soon.
…es (#163972)

The lowering of `!$acc loop` loops with an early exit currently ends-up
"duplicating" the control flow in the acc.loop and inside it as explicit
control flow (as if each iteration executes each iteration until the
early exit).

Add a TODO for now.
Move getPreviousSCEVDivisibleByDivisor from a lambda to a static
function and clarify the name (DividesBy -> DivisibleBy).

Split off refactoring from #163021.
…162993)

Early if conversion can create instruction sequences such as
```
mov  x1, #1
csel x0, x1, x2, eq
```
which could be simplified into the following instead
```
csinc x0, x2, xzr, ne
```

One notable example that generates code like this is `cmpxchg weak`.

This is fixed by handling an immediate value of 1 as `add(wzr, 1)` so
that the addition can be folded into CSEL by using CSINC instead.
…ns (#164099)

The `MLInlineAdvisor`​ currently skips over recursive cases, except that when we delegate to the default policy for non-cold functions, that policy could allow such inlining. The code updating internal state afterwards needs to handle that case.

Fix for https://issues.chromium.org/issues/369637577#comment14
If there is a call inside a TEAMS construct, and that call contains a
DISTRIBUTE construct, the DISTRIBUTE region is considered to be enclosed
by the TEAMS region (based on the dynamic extent of the construct).
Currently, Flang diagnoses this as an error, which is incorrect.
For eg :
```
 subroutine f
  !$omp distribute
  do i = 1, 100
    ...
  end do
end subroutine

subroutine g
  !$omp teams
  call f ! this call is ok, distribute enclosed by teams
  !$omp end teams
end subroutine
```
This patch adjusts the nesting check for the OpenMP DISTRIBUTE
directive. It retains the error for DISTRIBUTE directives that are
incorrectly nested lexically but downgrades it to a warning for orphaned
directives to allow dynamic nesting, such as when a subroutine with
DISTRIBUTE is called from within a TEAMS region.

Co-authored-by: Chandra Ghale <[email protected]>
Also replace the undef values with function arguments.
If the type of the ParmVarDecl and the parameter type from the
FunctionProtoType don't match, we're in for trouble. Just reject those
functions.

Fixes #163568
Created new OpenACC utilities library (MLIROpenACCUtils) containing
helper functions for region analysis, value usage checking, default
attribute lookup, and type categorization. Includes comprehensive unit
tests and refactors existing getEnclosingComputeOp function into the new
library.
Per-entry-point metrics are captured during the path-sensitive analysis
time. For that reason, it is not trivial to add the syntax-only analysis
time as it runs in a separate stage. Luckily syntax-only analysis is
done before path-senstivie analysis.

I use the function summary field to keep the syntax-only anlaysis time
once syntax analysis is done, and then forward it to the per-EP metrics
snapshot during the path-sensitive analysis.

Note that some of the entry points that were analyzed by syntax-only
rules may be missing in the CSV export if they were never analyzed by
path-sensitive rules. Conversely, if a function is analyzed with
path-sensitive analysis but not syntax-only analysis, its
`SyntaxRunningTime` will be empty.

--

CPP-7099
jmmartinez and others added 27 commits October 20, 2025 17:55
…InteralLinkage/PrivateLinkage (#164240)

Same as #164236, but I found this one later.
#164173)

Update `.Cases` and `.CasesLower` with 4+ args to use the
`initializer_list` overload. The deprecation of these functions will
come in a separate PR.

For more context, see: #163405.
These tests were setting environment variables, which needs to be done
explicitly with env when using the internal shell.
Per Intel Architecture Instruction Set Extensions Programming Reference
rev. 59 (https://cdrdv2.intel.com/v1/dl/getContent/671368), table 1-2,
DMR doesn't support USER_MSR (URDMSR and UWRMSR instructions)
This PR exposes `translate_module_to_llvmir` in the Python bindings.
This test has loop iterating past (`61`) the array boundaries (`58`). So
far this didn't seem to matter, but recently with this change
#155253 the constraint
elimination in swift has been able to figure this out and is
transforming the loop into an infinite one like this
```
*** IR Dump After ConstraintEliminationPass on test_known_trip_count ***
define void @test_known_trip_count() local_unnamed_addr {
entry:
  br label %for.body

for.body:                                         ; preds = %entry, %for.body
  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
  %arrayidx = getelementptr inbounds nuw double, ptr @b, i64 %indvars.iv
  %0 = load double, ptr %arrayidx, align 8
  %arrayidx2 = getelementptr inbounds nuw double, ptr @c, i64 %indvars.iv
  %1 = load double, ptr %arrayidx2, align 8
  %add = fadd double %0, %1
  %arrayidx4 = getelementptr inbounds nuw double, ptr @A, i64 %indvars.iv
  store double %add, ptr %arrayidx4, align 8
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  br i1 false, label %exit, label %for.body

exit:                                             ; preds = %for.body
  ret void
}%
```
causing the test to fail. This is trying to address the root cause here.
Previously, invalid offset is set to UINT64_MAX, this is not right when
DWARF32, which leads to incorrect debug into in GSYM, the branch:

```
    if (StmtSeqVal != UNIT64_MAX)
      StmtSeqOffset = StmtSeqVal;
```

will always be true.

In this PR, [commit
1](b1983d6)
sets up a test that demonstrates the problem, [commit
2](0d58ce4)
fixes it.

[Diffing commit 1 and
2](0d58ce4#diff-019bdbc9922ad34fdfbcb524a9805f5af26c432540e76b87a6a5f73d9e0e853aL44)
in this PR shows how after the PR the symbolicated line number changed
from function definition to function body
Currently when peeling the first iteration, any mentioning of UB within the loop body is replaced with the new UB in the peeled out first iteration. This introduces a bug in the following scenario: Operations inside of the loop that intentionally use the original UB are incorrectly updated.
Remove support for long unsupported Ubuntu, Debian and RHEL.

Add support for RHEL 8, 9 and 10 and recognize Rocky and AlmaLinux
as RHEL.
…r in SPIRVUtils (#164248)

There was some repeated code that was used to deduce the
SPIRV::LinkageType from a GlobalVariable/Function.

At several related parts of the code we also had functions taking 2
parameters:
a 'hasLinkage' bool, and a 'LinkageType'. This is error-prone since the
later parameter's meaning depends on the first. This patch also merges
these
two options into a single `std::optional<SPIRV::LinkageType>`.
…lasses (#163588)

Extend CS rule to use namespace qualifiers to define previously declared
functions to variables and classes as well.
- Fix function names to conform to LLVM CS and mark local function
static.
- Use range for loops to simplify code.
- use `interleave` instead of manual loops to print lists.
- Use namespace qualifiers to define variables declared in `llvm`
namespace.
- move file local `TimeTracerRAII` struct into anonymous namespace.
- Use explicit types in a few places.
- Convert the loop over `PassList` to a range for loop.
- Use nested namespace definitions in header files.
- Mark file local function static and enclods file local structs in
anonymous namespace.
- Drop some unnecessary namespace qualifiers.
Upstream support ComplexType as a function return type

Issue #141365
This patch implements visitors for MemberExpr, UnaryDeref,
StringLiteral and CompoundLiteralExpr inside aggregate
expressions.
This hardens the unwinding logic and datastructures on systems
that support pointer authentication.

The approach taken to hardening is to harden the schemas of as many
high value fields in the myriad structs as possible, and then also
explicitly qualify local variables referencing privileged or security
critical values.

This does introduce ABI linkage between libcxx, libcxxabi, and
libunwind but those are in principle separate from the OS itself
so we've kept the schema definitions in the library specific headers
rather than ptrauth.h
Implement CXXDefaultArgExpr support for ComplexType

Issue #141365
… SSE41 phminposuw intrinsic to be used in constexp (#163041)

Fix #161336
Added support for ConditionalOperator, BinaryConditionalOperator and
OpaqueValueExpr as lvalue.

Implemented support for ternary operators with one branch being a throw
expression. This required weakening the requirement that the true and
false regions of the ternary operator must terminate with a `YieldOp`.
Instead the true and false regions are now allowed to terminate with an
`UnreachableOp` and no `YieldOp` gets emitted when the block throws.
This were all removed in #160028, but I apparently missed this one
instance in the documentation. Remove it given that it no longer works.
This patch adds a new script, premerge_advisor_explain.py that requests
test failure explanations from the premerge advisor. For now it just
prints them out to STDOUT. This allows for testing of the entire system
by looking at failure explanations in failed jobs before we do the rest
of the wiring to enable the premerge advisor to write out comments.
… AVX/AVX512 subvector extraction intrinsics to be used in constexpr #157712 (#162836)

**This PR supersedes and replaces PR #158853**

The original branch diverged too far from the main branch, resulting in
significant merge conflicts that were difficult to resolve cleanly. To
provide a clean and reviewable history, this new PR was created by
cherry-picking the necessary commits onto a fresh branch based on the
latest `main`.

---

*(Original Description)*

This patch enables the use of AVX/AVX512 subvector extraction intrinsics
within `constexpr` functions. This is achieved by implementing the
evaluation logic for these intrinsics in
`VectorExprEvaluator::VisitCallExpr` and `InterpretBuiltin`.

The original discussion and review comments can be found in the previous
pull request for context: #158853

Fixes #157712
The primary purpose of this commit is to enable marking loads to LDS
(global.load.lds, buffer.*.load.lds) volatile (using bit 31 of the aux
as with normal buffer loads) and to ensure that their !nontemporal
annotations translate to appropriate settings of te cache control bits.

However, in the process of implementing this feature, we also fixed
- Incorrect handling of buffer loads to LDS in GlobalISel
- Updating the handling of volatile on buffers in SIMemoryLegalizer:
previously, the mapping of address spaces would cause volatile on buffer
loads to be silently dropped on at least gfx10.

---------

Co-authored-by: Matt Arsenault <[email protected]>
@pull pull bot locked and limited conversation to collaborators Oct 20, 2025
@pull pull bot added the ⤵️ pull label Oct 20, 2025
@pull pull bot merged commit d371417 into optimizecompile:main Oct 20, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.