Skip to content

Conversation

@pull
Copy link

@pull pull bot commented Oct 16, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

bgergely0 and others added 30 commits October 16, 2025 08:07
PR #120064 added several MCPlusBuilder helpers for recognising
instructions which sign or authenticate the link register.

This patch adds MCPlusBuilder unittests for these helpers.
Lower v16i8 to v4i32 partial_smla to relaxed_dot_add. I'm still unsure
whether we could/should take advantage of the unknown signedness of the
rhs, and also lower the partial_sumla operation too.
…163208)

Depends on:
* #163348
* #162632

With this patch Clang will start emitting `DW_AT_language_{name,
version}` for C++/C/Objective-C/Objective-C++ when using `-gdwarf-6`. We
adjust the `DISourceLanguageName` (which we pass to `DICompileUnit`) to
hold a `DW_AT_language_name_` and version code when in DWARFv6.
Otherwise we continue using the `DW_LANG_` version of
`DISourceLanguageName`.

We didn't back-port emitting
`DW_AT_language_name`/`DW_AT_language_version` to DWARFv5 (unlike GCC,
which emits both the new and old language attributes in DWARFv5) because
there wasn't a compelling reason to do so (yet).
For this specific case, when catching a pointer data type, by reference,
Clang generates a special code pattern, which directly accesses the
exception data by skipping past the `_Unwind_Exception` manually (rather
than using the return value of `__cxa_begin_catch`).

On most platforms, `_Unwind_Exception` is 32 bytes, but in some
configurations it's different. (ARM EHABI is one preexisting case.) In
the case of SEH, it's also different - it is 48 bytes in 32 bit mode and
64 bytes in 64 bit mode. (See the SEH ifdef in `_Unwind_Exception` in
`clang/lib/Headers/unwind.h`.)

Handle this case in `TargetCodeGenInfo::getSizeOfUnwindException`,
fixing the code generation for catching pointers by reference.

This fixes mstorsjo/llvm-mingw#522.
When working on privatization, it is easier to work with fir.box
explicitly in memory, otherwise, there is no way to express that the
fir.box will end-up being a descriptor address in FIR which makes it
hard to deal with data management.

However, introducing fir.ref<fir.box> early can pessimize early HLFIR
optimization because it is harder to reason about the aliasing of
`fir.ref<fir.box>` because of the extra memory indirection.

This patch introduces a pass that turns acc `!fir.box<T>` recipes into
`!fir.ref<!fir.box<T>>` recipes and updated the related recipe usages to
use `!fir.ref<!fir.box<T>>` (creating new alloca+store+load).

It is added to flang and not OpenACC because it is specific to the
`fir.box` type, so it makes little sense to make it an OpenACC generic
pass and to create a new OpenACC dialect type interface for this use
case.
`__alignas_is_defined` and `__alignof_is_defined` are a C++11 feature
which we only recently added. I don't think it will break anybody if we
don't provide these macros in C++03, so this simply disable the test
instead.
…#162856)

The `|| __bc == 0` case will never be relevant, since we know that
`size() + 1` will always be exactly 1 if `__bc == 0` and `0 *
max_load_factor()` will be zero, so the branch will already be taken due
to the first condition.
…#163707)

SimpleRemoteMemoryMapper is a MemoryMapper implementation that manages
remote memory via EPC-calls to reserve, initialize, deinitialize, and
release operations. It is compatible with the
SimpleExecutorMemoryManager backend, and its introduction allows
MapperJITLinkMemoryManager to use this backend.

It is also intended to be compatible with the
orc_rt::SimpleNativeMemoryMap backend.
This PR adds a testcase where pipeliner bails out early because the
number of the store instructions exceeds the threshold set by
`pipeliner-max-num-stores`. The test should have been added in #154940,
but it was missed.
* Added high-level section labels in linalg-ops-with-patterns.mlir.
* Moved tests for `memref.copy` to the bottom, after all Linalg ops.
* Removed duplicate `@test_vectorize_padded_pack_no_vector_sizes` tests
  - they differed only in tensor dimensions (both static).
* Updated comments and test names for `linalg.pack` to improve clarity
  and align with https://mlir.llvm.org/getting_started/TestingGuide/.
  * Re-grouped tests for `linalg.pack`.

For a broader context, I plan to update the vectorization logic for
`linalg.pack`. This clean-up will make the following PRs easier to
review.
On powerpc long double may be ppc_fp128, so add corresponding
cases to the test.
…e improved (#151332)

Fix #65136 

|Benchmark | Baseline | Candidate | Difference | % Difference
|------------------------- | ---------- | ----------- | ------------ |
--------------
|BM_CmpEqual_int_int | 0.46 | 0.46 | -0.00 | -0.62
|BM_CmpEqual_int_schar | 0.45 | 0.45 | -0.00 | -0.40
|BM_CmpEqual_int_short | 0.45 | 0.45 | 0.00 | 0.34
|BM_CmpEqual_int_uchar | 0.78 | 0.44 | -0.34 | -43.18
|BM_CmpEqual_int_uint | 0.90 | 0.66 | -0.24 | -26.84
|BM_CmpEqual_int_ushort | 0.78 | 0.45 | -0.33 | -42.20
|BM_CmpEqual_schar_int | 0.45 | 0.45 | -0.00 | -0.77
|BM_CmpEqual_schar_schar | 0.54 | 0.57 | 0.03 | 5.64
|BM_CmpEqual_schar_short | 0.92 | 0.88 | -0.04 | -4.80
|BM_CmpEqual_schar_uchar | 1.84 | 0.66 | -1.18 | -64.16
|BM_CmpEqual_schar_uint | 0.78 | 0.66 | -0.12 | -15.18
|BM_CmpEqual_schar_ushort | 1.01 | 0.66 | -0.35 | -34.53
|BM_CmpEqual_short_int | 0.45 | 0.45 | 0.00 | 0.03
|BM_CmpEqual_short_schar | 0.89 | 0.88 | -0.01 | -0.80
|BM_CmpEqual_short_short | 0.47 | 0.46 | -0.01 | -1.28
|BM_CmpEqual_short_uchar | 1.11 | 0.66 | -0.45 | -40.63
|BM_CmpEqual_short_uint | 0.77 | 0.66 | -0.12 | -14.88
|BM_CmpEqual_short_ushort | 1.76 | 0.66 | -1.10 | -62.64
|BM_CmpEqual_uchar_int | 0.79 | 0.44 | -0.35 | -44.06
|BM_CmpEqual_uchar_schar | 1.76 | 0.66 | -1.11 | -62.68
|BM_CmpEqual_uchar_short | 1.11 | 0.66 | -0.45 | -40.33
|BM_CmpEqual_uchar_uchar | 0.57 | 0.51 | -0.06 | -10.61
|BM_CmpEqual_uchar_uint | 0.45 | 0.44 | -0.01 | -1.74
|BM_CmpEqual_uchar_ushort | 0.77 | 0.77 | -0.00 | -0.64
|BM_CmpEqual_uint_int | 0.88 | 0.66 | -0.23 | -25.69
|BM_CmpEqual_uint_schar | 0.77 | 0.66 | -0.11 | -14.85
|BM_CmpEqual_uint_short | 0.77 | 0.66 | -0.11 | -14.56
|BM_CmpEqual_uint_uchar | 0.44 | 0.44 | -0.00 | -0.57
|BM_CmpEqual_uint_uint | 0.47 | 0.51 | 0.04 | 8.62
|BM_CmpEqual_uint_ushort | 0.45 | 0.44 | -0.00 | -0.47
|BM_CmpEqual_ushort_int | 0.77 | 0.45 | -0.33 | -42.02
|BM_CmpEqual_ushort_schar | 1.02 | 0.66 | -0.36 | -35.30
|BM_CmpEqual_ushort_short | 1.76 | 0.66 | -1.10 | -62.60
|BM_CmpEqual_ushort_uchar | 0.78 | 0.77 | -0.01 | -1.84
|BM_CmpEqual_ushort_uint | 0.45 | 0.45 | 0.00 | 0.24
|BM_CmpEqual_ushort_ushort | 0.46 | 0.51 | 0.05 | 11.00
|BM_CmpLess_int_int | 0.67 | 0.66 | -0.01 | -0.99
|BM_CmpLess_int_schar | 0.66 | 0.66 | -0.01 | -0.86
|BM_CmpLess_int_short | 0.66 | 0.66 | -0.00 | -0.57
|BM_CmpLess_int_uchar | 0.88 | 0.66 | -0.23 | -25.48
|BM_CmpLess_int_uint | 1.76 | 0.66 | -1.11 | -62.68
|BM_CmpLess_int_ushort | 0.89 | 0.66 | -0.23 | -25.50
|BM_CmpLess_schar_int | 0.66 | 0.66 | -0.00 | -0.44
|BM_CmpLess_schar_schar | 0.66 | 0.66 | -0.00 | -0.40
|BM_CmpLess_schar_short | 0.88 | 0.88 | -0.00 | -0.50
|BM_CmpLess_schar_uchar | 1.10 | 0.71 | -0.39 | -35.24
|BM_CmpLess_schar_uint | 0.89 | 0.66 | -0.23 | -25.66
|BM_CmpLess_schar_ushort | 0.99 | 0.77 | -0.22 | -22.49
|BM_CmpLess_short_int | 0.66 | 0.66 | -0.00 | -0.35
|BM_CmpLess_short_schar | 0.89 | 0.88 | -0.00 | -0.48
|BM_CmpLess_short_short | 0.66 | 0.66 | -0.00 | -0.34
|BM_CmpLess_short_uchar | 1.10 | 0.71 | -0.39 | -35.36
|BM_CmpLess_short_uint | 0.88 | 0.66 | -0.22 | -25.39
|BM_CmpLess_short_ushort | 1.77 | 0.77 | -1.00 | -56.42
|BM_CmpLess_uchar_int | 0.97 | 0.66 | -0.31 | -31.95
|BM_CmpLess_uchar_schar | 1.11 | 0.66 | -0.44 | -40.17
|BM_CmpLess_uchar_short | 1.19 | 0.66 | -0.53 | -44.59
|BM_CmpLess_uchar_uchar | 0.66 | 0.66 | -0.00 | -0.67
|BM_CmpLess_uchar_uint | 0.67 | 0.66 | -0.01 | -1.19
|BM_CmpLess_uchar_ushort | 0.77 | 0.77 | -0.00 | -0.40
|BM_CmpLess_uint_int | 1.76 | 0.66 | -1.10 | -62.59
|BM_CmpLess_uint_schar | 0.89 | 0.66 | -0.23 | -25.99
|BM_CmpLess_uint_short | 0.88 | 0.66 | -0.22 | -25.41
|BM_CmpLess_uint_uchar | 0.66 | 0.66 | -0.01 | -0.81
|BM_CmpLess_uint_uint | 0.66 | 0.66 | -0.00 | -0.71
|BM_CmpLess_uint_ushort | 0.66 | 0.66 | -0.00 | -0.29
|BM_CmpLess_ushort_int | 0.98 | 0.66 | -0.32 | -33.00
|BM_CmpLess_ushort_schar | 1.29 | 0.77 | -0.52 | -40.56
|BM_CmpLess_ushort_short | 1.77 | 0.77 | -1.00 | -56.55
|BM_CmpLess_ushort_uchar | 0.77 | 0.77 | -0.01 | -0.72
|BM_CmpLess_ushort_uint | 0.66 | 0.66 | -0.00 | -0.46
|BM_CmpLess_ushort_ushort | 0.66 | 0.66 | -0.00 | -0.71
Split off from PR #163525, this standalone patch replaces `ret * undef`
returns with `ret void` in order to reduce the likelihood of
contributors hitting the `undef deprecator` warning in github.
)

This commit adds a new "specification_version" field to the TOSA target
environment attribute. This allows a user to specify which version of
the TOSA specification they would like to target during lowering.

A leading example in the validation pass has also been added. This
addition adds a version to each profile compliance entry to track which
version of the specification the entry was added. This allows a
backwards compatibility check to be implemented between the target
version and the profile compliance entry version.

For now a default version of "1.0" is assumed. "1.1.draft" is added to
denote an in-development version of the specification targeting the next
release.
PolyhedralInfo is tied to the legacy pass manager. With the eventual
removal of the legacy pass manager it will not be useful anymore.

PolyhedralInfo was an experiment to make Polly's analysis available to
other passes. Its power is limited due to not being able to make
assumptions for which regular Polly would emit a runtime condition/code
versioning during optimization.

When eventually porting such an API to the new pass manager, we will
have to invent a new API.
InstCombine currently fails to call into InstSimplify for cast
instructions. I noticed this because the transform from
#98649 can be triggered via
`-passes=instsimplify` but not `-passes=instcombine`, which is not
supposed to happen.
While people look into it, xfail the tests.
2.x had ListType and StringTypes (https://docs.python.org/2.7/library/types.html),
3.x removed these (https://docs.python.org/3.0/library/types.html).

We can use "str" and "list" directly as in 3.x all strings are
just "str", and ListType was always an alias to "list".
Python3 removed "unichr" when string encoding was changed,
so this code tried to import that then defaulted to "chr"
if it couldn't.

Since LLVM requires >=3.8 we can use "chr" directly.
… AVX512 conflict intrinsics to be used in constexpr (#163293)

Resolves #160524
paulwalker-arm and others added 17 commits October 16, 2025 11:03
We might be able to do better by using SVE2 and perhaps even NEON for
the final stages, but this version works everywhere so seems like is a
good place to start.

Fixes #155468
…ents (#163590)

As noticed on #163567 - if the constant pool data wasn't the expected element size for the instruction, we weren't adding the asm comment at all
Fixes round-tripping where literals used to be reassembled into
inline constants.

Also fix the %extract-encodings substitution in lit tests to emit
each instruction code once and not twice.

Eliminate the Literal64 field.
…3324)

We have `noinline` and `alwaysinline` present as first class function
attributes. Add `inline_hint` to the list of function attributes as
well.

Update the module import and translation to support the new attribute.

The verifier does not need to be changed as `inlinehint` does not
conflict with `noinline` or `alwaysinline`.

`inline_hint` is needed to support the `inline` C/C++ keyword in CIR.
To update Python2 print statements to Python3 print function calls.
The test will fail if libc++ starts to use a lambda in `<array>`. This
will become the case because
- libc++'s `array::fill` uses `std::fill_n`, and
- `std::fill_n` is to be optimized for segment iterators, and
- the natural approach for such optimization uses lambdas.

Until ASTImport of `clang::LambdaExpr` nodes gets properly fix, this
will need to be skipped.
This updates Python2 print statements to Python3 print functions,
and makes lists out of some things that are iterators in Python3.

The latter we could not bother with as some code is fine with
iterators, but it does keep the script behaving exactly as it was
in case anyone does try to use this.

(and it's clear it was purely 2to3 changes, no hand editing)
These imports were moved around in Python 3.0
(https://docs.python.org/3/whatsnew/3.0.html#library-changes).

LLVM requires Python >= 3.8 so we can expect the Python3 names
to exist.
When building llvm from a subdirectory (like clspv does)
`CMAKE_BINARY_DIR` is at the top of the build directory.

When building runtimes (libclc for example), the build fails looking for
clang (through `find_package` looking at `LLVM_BINARY_DIR` with
`NO_DEFAULT_PATH` & `NO_CMAKE_FIND_ROOT_PATH`) because clang is not in
`LLVM_BINARY_DIR`.

Fix that issue by setting `clang_cmake_builddir` the same way we set
`llvm_cmake_builddir` from `LLVM_BINARY_DIR`.

For default llvm build (using llvm as the main cmake project), it should
not change anything.
For standalone clang build, keep the actual value as libclc cannot be
built that way.
X64 triples include SSE2 by default, which we already test this, and it was causing check prefix clash warnings in update_llc_test_checks.py
…163745)

Fix check prefix clash warnings in update_llc_test_checks.py by adding an additional prefix for AVX512F and AVX512BW capable targets
These imports got moved around in Python 3.0
(https://docs.python.org/3/whatsnew/3.0.html#library-changes).

LLVM requires Python >= 3.8 so we can assume the Python3 names
are available.
REQUIRES clauses apply to the compilation unit, which the OpenMP spec
defines as the program unit in Fortran.

Don't set REQUIRES flags on all containing scopes, only on the containng
program unit, where flags coming from different directives are gathered.
If we wanted to set the flags on subprograms, we would need to first
accummulate all of them, then propagate them down to all subprograms.
That is not done as it is not necessary (the containing program unit is
always available).
Recipes in licm are safe to hoist if the legality check passes, and the
recipe is guaranteed to execute; the single successor of the vector
preheader is the vector loop region. Clarify this in the code structure
and comments.
Only a couple of changes, including adding two empty comments to resolve
differences between different versions of clang-format.
@pull pull bot locked and limited conversation to collaborators Oct 16, 2025
@pull pull bot added the ⤵️ pull label Oct 16, 2025
@pull pull bot merged commit c8b8fa2 into optimizecompile:main Oct 16, 2025
13 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.