Skip to content

Conversation

@pull
Copy link

@pull pull bot commented Nov 18, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

grypp and others added 30 commits November 18, 2025 12:56
…8427)

This adds handling for f16 and f128 lround/llround under LP64 targets,
promoting the f16 where needed and using a libcall for f128. This
codegen is now identical to the selection dag version.
This helps cases where the immediate range of FDUP is not sufficient.
…is deduced (#164440)

Previously, the handling of the `cleanup` attribute had some checks
based on the type, but we were deducing the type after handling the
attribute.
This PR fixes the way the are dealing with type checks for the `cleanup`
attribute by delaying these checks after we are deducing the type.

It is also fixed in a way that the solution can be adapted for other
attributes that does some type based checks.
This is the list of C/C++ attributes that are doing type based checks
and will need to be fixed in additional PRs:
- CUDAShared
- MutualExclusions
- PassObjectSize
- InitPriority
- Sentinel
- AcquireCapability
- RequiresCapability
- LocksExcluded
- AcquireHandle

NB: Some attributes could have been missed in my shallow search.

Fixes #129631
This tells the build system to check and regenerate the
*GenRegisterInfo*.inc files, should any of them be missing for
whatever reason.

A follow-up from
<#167700>.
It was switched from a function pointer to std::function in

TableGen: Make 2nd arg MainFn of TableGenMain(argv0, MainFn) optional.
f675ec6

but there's no mention of any particular reason for that.
…sure amt doesn't depend on original load chain (#168400)

Relax fix for #165755 / #165850 - it doesn't matter if the amt is dependent on the original load value, just any users of the chain
- Detect cases where LHS & RHS values will not cause overflow
(when the Hi halfs are zero).
This pass aims to narrow i64 types on TOSA operations to i32. It can be
useful for legalizations from various frameworks. It comes with the
following options:
- "aggressive-rewrite" - This option is typically able to narrow more
values, but may impact numerical behaviour if not used carefully.
- "convert-function-boundaries" - If enabled, parameters/ results
to/from a function may be narrowed. Otherwise, casts are inserted to
preserve the I/O of the function.

Currently the non aggressive mode is very limited, targeting an argmax
-> cast sequence that has been observed during legalization as well as
some data layout operations that can always narrow. Support for more
operations will be added in the future.

Co-authored-by: Vitalii Shutov <[email protected]>
Co-authored-by: Shubham <[email protected]>
Co-authored-by: Declan Flavin <[email protected]>

Signed-off-by: Luke Hutton <[email protected]>
Co-authored-by: Vitalii Shutov <[email protected]>
Co-authored-by: Shubham <[email protected]>
Co-authored-by: Declan Flavin <[email protected]>
This patch fixes the only RTSan test that was broken by enabling lit's
internal shell on Darwin. This patch rewrites the test to prefix env
variables with `env` and to avoid the use of subshells.
…d() (#164392)

Use the implementation in libomptarget. If libomptarget is not
available, always return the UID / device number of the host / the
initial device.
…167519)

[andv, eorv, orv, s/uaddv, s/umaxv, s/uminv]
sve_reduce_##(none, ?) -> op's neutral value
sve_reduce_##(any, neutral) -> op's neutral value
    
[andv, orv, s/umaxv, s/uminv]
sve_reduce_##(all, splat(X)) -> X
    
[eorv]
sve_reduce_##(all, splat(X)) -> 0
Currently, there are no diagnostics issued when including a deprecated
header, since the diagnostic is issued inside a system header. This
patch fixes that by using `#warning` instead, which also simplifies the
implementation of the deprecation warnings.
Update VPlan to populate VPIRFlags during VPInstruction construction and
use it when creating widened recipes, instead of constructing VPIRFlags
from the underlying IR instruction each time. The VPRecipeWithIRFlags
constructor taking an underlying instruction and setting the flags based
on it has been removed.

This centralizes initial VPIRFlags creation and ensures flags are
consistently available throughout VPlan transformations and makes sure
we don't accidentally re-add flags from the underlying instruction that
already got dropped during transformations.

Follow-up to #167253, which did
the same for VPIRMetadata.

Should be NFC w.r.t. to the generated IR.

PR: #168450
Note that getCurrentUnwindRow does not change any state.

Identified with unused-local-non-trivial-variable.
Identified with modernize-loop-convert.
While I am at it, this patch switches to the constructor that takes
a container instead of a pair of begin/end.

Identified with readability-const-return-type.
- MemoryEffectsAttr in MLIR LLVM dialect is out of sync with LLVM
  itself.
Fixes: e1979ae ("Implement gd to ie relaxation for aarch64.")
This patch makes all tsan tests work with the internal shell on Darwin. Tests
were using various features not supported by the internal shell, mainly subshells
and not using env to set environment variables. This patch also fixes one of the
dynamiclib substitutions to not use a subshell.

Reviewers: ndrewh, DanBlackwell, fmayer, vitalybuka

Reviewed By: DanBlackwell

Pull Request: #168544
#162443)

In some cases, such as when recommending the compiler option
_FORTIFY_SOURCE, the current custom message format is clunky. Now, when
the reason starts with `>`, the replacement string is omitted., so only
the Reason is shown.

`^function$,,has a custom message;` - function 'function' has a custom
message; it should not be used
`^function$,,>has a custom message and no replacement suggestion;` -
function 'function' has a custom message and no replacement suggestion

---------

Co-authored-by: Donát Nagy <[email protected]>
…#168542)

Where possible:

* notifyMatchFailure happen first
* then op.emitOpError
* finally assertions / op creation.

---------

Co-authored-by: Jakub Kuderski <[email protected]>
Closes #99097
Closes #99100

As ddx and ddy are near identical implementations I've combined them in
this PR. This aims to unblock
#161378

---------

Co-authored-by: Alexander Johnston <[email protected]>
…d-bundler and AMD SPIR-V. (#168521)

`clang-linker-wrapper` was incorrectly calling `clang-offload-bundler`
for AMD SPIR-V. This resulted in a binary that couldn't be executed if
built using the new driver.

The runtime couldn't recognise the triple triggering this error at
execution time:

```
No compatible code objects found for: gfx90a:sramecc+:xnack-,
```

With this PR, this is solved:

```
Creating ISA for: gfx90a:sramecc+:xnack- from spirv
```
This is a simple translation of the current WORKSPACE file.

* External repos are replaced with `bazel_dep()`. The versions have been
bumped to newer versions.
* `maybe()` doesn't seem to be a thing, so I just removed that.
* Existing repos where we define our own BUILD file in third_party_build
have *not* been replaced due to compatibility issues. For example,
`nanobind_bazel` could replace the `nanobind` config we have, but
switching to that caused some build errors.
* For these existing repos, they have been specified as module
extensions

This should have no effect since `.bazelrc` defines `common
--enable_bzlmod=false --enable_workspace`

Tested locally: `bazel test --enable_bzlmod --noenable_workspace
--config=generic_clang @llvm-project//... //...`
ashermancinelli and others added 22 commits November 18, 2025 07:55
I missed these attributes when I added the wrapper for GPUFuncOp in
fbdd98f.
This patch makes Clang produce the crash reproducer shell script for IR
inputs as well.
…he shadow map (#167772)

The AddressSanitizer transform currently defaults to placing the shadow
map in address space 0, but it is desirable for some targets (namely
BPF) to select a different address space for the map. Add a compilation
option for specifying the address space of the target.
…PF target (#167768)

The AddressSanitizer transform does not have a default offset registered
for the shadow map. Set the default shadow map offset for BPF be
dynamically set by the KASAN implementation.
The BPF LLVM target currently doesn't support turning on the
AddressSanitizer pass, either for userspace ASAN or KASAN. Enable the
KASAN option for the BPF target in anticipation of a KASAN
implementation for BPF.
This patch fixes most of the ASan tests that were failing on Darwin when
running under the internal shell. There are still a couple left that
are more interesting cases that I'll do in a follow up patch. The
tests that still need to be done:
```
TestCases/Darwin/duplicate_os_log_reports.cpp
TestCases/Darwin/dyld_insert_libraries_reexec.cpp
TestCases/Darwin/interface_symbols_darwin.cpp
```

Reviewers: thetruestblue, fhahn, vitalybuka, DanBlackwell, ndrewh

Reviewed By: DanBlackwell

Pull Request: #168545
Only the fortran source files in flang/test/Lower/PowerPC and some in
flang/test/Lower have been modified. The other files in the directory
will be cleaned up in subsequent commits
…ific address spaces (#167770)

For some backends, e.g., BPF, it is desirable to only sanitize memory
belonging to specific address spaces. More specifically, it is sometimes
desirable to only apply address sanitization for arena memory belonging
to address space 1. However, AddressSanitizer currently does not support
selectively sanitizing address spaces. Add a new option to select which
address spaces to apply AddressSanitizer to.

No functional change for existing targets (namely AMD GPU) that hardcode
which address spaces to sanitize
In this PR we are proposing to change LLDB codebase so that LLDB is able
to print values of integer registers that have more than 64-bits (even
if the number of bits is not equal to 128).

---------

Co-authored-by: Matej Košík <[email protected]>
Co-authored-by: Jonas Devlieghere <[email protected]>
…to non-vectors (#168081)

Updates the demanded elements before recursing through copies in case
the type of the source register changes from a non-vector register to a
vector register.

Fixes #167842.
* original change #162730
* with windows fix #164843
* remove timeout that was pointed out in the comment above
* Remove test that starts and listens on a socket to avoid timeout
issues
…68165)

and make (#165264)

Truely recover Executor::getDefaultExecutor. The previous change missed
std::unique_ptr, which is needed in a normal program exit, since only
with that ThreadPoolExecutor destructor will be called in a normal
program exit, where it ensures the executor has been stopped and waits
for worker threads to finish. The wait is important as it prevents
intermittent crashes on Windows when the process is doing a full exit.
In line with a std proposal to introduce std::clmul, and in preparation
to introduce a clmul intrinsic, implement carry-less multiply primitives
for APIntOps, clmul[rh].

Ref: https://isocpp.org/files/papers/P3642R3.html
Identified with modernize-loop-convert.
https://alive2.llvm.org/ce/z/YGT5SN
https://alive2.llvm.org/ce/z/PVDxCw
https://alive2.llvm.org/ce/z/8buR2N

This is tricky because with positive numbers, we only go up, so we can
in fact always hit the signed_max boundary. This is important because
the intrinsic we use has the behavior of going the OTHER way, aka clamp
to INT_MIN if it goes in that direction.

And the range checking we do only works for positive numbers.

Because of this issue, we can only do this for constants as well.
When building just the runtimes (eg a patch only touches compiler-rt),
we do not actually run any normal check targets. This ends up causing an
empty ninja invocation, which builds more targets than necessary. Gate
the ninja build for normal check-* targets under an if statement to fix
this.
The AArch64 backend converts trees formed by conjunctions/disjunctions
of comparisons into sequences of `CCMP` instructions. The implementation
before this change checks whether a sub-tree must be processed first. If
not, it processes the operations in the order they occur in the DAG.

This may not be optimal if there is a corresponding `SUB` node for one
of the comparisons. In this case, we should process this comparison
first because we can then use the same instruction for the `SUB` node
and the comparison.

To achieve this, this commit comprises the following changes:

- Extend `canEmitConjunction` with a new output parameter `PreferFirst`,
  which reports to the caller whether the sub-tree should preferably be
  processed first.
- Set `PreferFirst` to `true` if we can find a corresponding `SUB` node
  in the DAG.
- If we can process a sub-tree with `PreferFirst = true` first (i.e., we
  do not violate any `MustBeFirst` constraint by doing so), we swap the
  sub-trees.
- The already existing code for performing the common subexpression
  elimination takes care to use only a single instruction for the
  comparison and the `SUB` node if possible.

Closes #149685.
In general, "Flat instructions look at the per-workitem address and
determine for each work item if the target memory address is in global,
private or scratch memory." (RDNA2 ISA) That means that FLAT
instructions need to be considered for VMEM hazards even without
"specific segment". Also, LDS DMA should be considered for LDS hazard
detection.

See also #137148
…8549)

Move `GetInnermostExecPart` and `IsStrictlyStructuredBlock` from
Semantics/openmp-utils.* to Parser/openmp-utils.*. These two only depend
on the AST contents and properties.
This reverts commit b3d6264.

This broke the workflow because the sync-labels flag was set to a
zero-length string to work around an issue. The underlying issue has
been fixed and the value is now required to be a boolean. We can just
drop the value because we want the default behavior anyways. This should
be the last remaining breaking change from v5 that we need to migrate.
@pull pull bot locked and limited conversation to collaborators Nov 18, 2025
@pull pull bot added the ⤵️ pull label Nov 18, 2025
@pull pull bot merged commit bd8c941 into optimizecompile:main Nov 18, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.