Skip to content

Conversation

@ronlieb
Copy link
Collaborator

@ronlieb ronlieb commented Oct 23, 2025

No description provided.

kuhar and others added 30 commits October 22, 2025 12:47
…utions (llvm#164515)

Some operations like the gpu.func have arguments that need to stay in
place while rewriting the signature. This is the case for the workgroup
and private attribution.
Update the target rewrite pass to be aware of that when adding argument
at the end of the function signature. If any trailing arguments are
present, the new argument will be inserted just before them.
Since llvm#158381 the
`CompilerInstance` is aware of the VFS and co-owns it. To reduce scope
of that PR, the VFS was being inherited from the `FileManager` during
`setFileManager()` if it wasn't configured before. However, the
implementation of that setter was buggy. This PR fixes the bug, and
moves us closer to the long-term goal of `CompilerInstance` requiring
the VFS to be configured explicitly and owned by the instance.
…get-tasks (llvm#155348)

This PR adds support for translation of the private clause on deferred
target tasks - that is `omp.target` operations with the `nowait` clause.

An offloading call for a deferred target-task is not blocking - the
offloading (target-generating) host task continues its execution after issuing the offloading
call. Therefore, the key problem we need to solve is to ensure that the
data needed for private variables to be initialized in the target task
persists even after the host task has completed.
We do this in a new pass called `PrepareForOMPOffloadPrivatizationPass`.
For a privatized variable that needs its host counterpart for
initialization (such as the shape of the data from the descriptor when
an allocatable is privatized or the value of the data when an
allocatable is firstprivatized),
  - the pass allocates memory on the heap.
- it then initializes this memory by using the `init` and `copy` (for
firstprivate) regions of the corresponding `omp::PrivateClauseOp`.
- Finally the memory allocated on the heap is freed using the `dealloc`
region of the same `omp::PrivateClauseOp` instance. This step is not
straightforward though, because we cannot simply free the memory that's
going to be used by another thread without any synchronization. So, for
deallocation, we create a `omp.task` after the `omp.target` and
synchronize the two with a dummy dependency (using the `depend` clause).
In this newly created `omp.task` we do the deallocation.
The comment here pointed out that RAUW would fall over given a
constantexpr, but then proceeded to just do what RAUW does by hand,
which falls over in the same way. Instead, convert constantexprs
involving cbuffer globals to instructions before processing them.

The test update just modifies the existing cbuffer test, since it
implied it was trying to test this exact case anyways.
DXILResource was falling over trying to name a resource type that
contained an array, such as `StructuredBuffer<float[3][2]>`. Handle this
by walking through array types to gather the dimensions.
To make the CI happy again.
)

The function result in a device function is not a host array. Avoid
triggering the error `Host array 'res' cannot be present in device
context` for this.
Handling opcodes in embedding computation.

- Revamped MIR Vocabulary with four sections - `Opcodes`, `Common Operands`, `Physical Registers`, and `Virtual Registers`
- Operands broadly fall into 3 categories -- the generic MO types that are common across architectures, physical and virtual register classes. We handle these categories separately in MIR2Vec. (Though we have same classes for both physical and virtual registers, their embeddings vary).
…y" (llvm#164670)

Reverts llvm#164048

This led to a regression in clang-format where a space gets added in
between the parameter type and `&`. For example, this

```
::test_anonymous::FunctionApplication& ::test_anonymous::FunctionApplication::operator=(const ::test_anonymous::FunctionApplication& other) noexcept {
```

becomes

```
::test_anonymous::FunctionApplication& ::test_anonymous::FunctionApplication::operator=(const ::test_anonymous::FunctionApplication & other) noexcept {
```
* Add FILE type declaration, as it should be presented in `<wchar.h>`,
as well as in `<stdio.h>`
* Fix argument type in `wcsrtombs` / `wcsnrtombs` function - it should
be restrict pointer to `mbstate_t`. Add restrict qualifier to internal
implementation as well.

This brings us closer to being able to build libcxx with wide-character
support against llvm-libc headers.
Relates to llvm#119281

Note:

1) As this PR enables `-Werror` for `libc` tests, it's very likely some
downstream CI's may fail / start failing, so it's very likely this PR
may need to be reverted and re-applied.

P.S.

I do not have merge permissions, so I will need one of the reviews to
merge it for me. Thank you!
Add a flag to the GlobalValueSummaryInfo indicating whether the
associated SummaryList (all summaries with the same GUID) contains any
summaries with local linkage. This flag is set when building the index,
so it is associated with the original linkage type before
internalization and promotion. Consumers should check the
withInternalizeAndPromote() flag on the index before using it.

In most cases we expect a 1-1 mapping between a GUID and a summary with
local linkage, because for locals the GUID is computed from the hash of
"modulepath;name". However, there can be multiple locals with the same
GUID if translation units are not compiled with enough path. And in rare
but theoretically possible cases, there can be hash collisions on the
underlying MD5 computation. So to be safe when looking for local
summaries, analyses currently look through all summaries in the list.
These lists can be extremely long in the case of large binaries with
template function defs in widely used headers (i.e. linkonce_odr).

A follow on change will use this flag to reduce ThinLTO analysis time in
WPD by 5-6% for a large target (details in PR164046 which will be
reworked to use this flag).

Note that in the past we have tried to keep bits related to the GUID in
the ValueInfo (which has a pointer to the associated
GlobalValueSummaryInfo), via its PointerIntPair. However, we are out of
bits there. This change does add a byte to every GlobalValueSummaryInfo
instance, which I measured as a little under 0.90% overhead in a large
target. However, it enables adding 7 bits of other per-GUID flags in the
future without adding more overhead. Note that it was lower overhead to
add this to the GlobalValueSummaryInfo than the ValueInfo, which tends
to be copied into other maps.
…m#164678)

The recently added ulimit_reset.txt section in shtest-ulimit.py was
failing on some builders if the default file descriptor limit started
with 50. This patch fixes that by explicitly checking that the file
descriptor limit is equal to the default value.
…lvm#164649)

These have been soft-deprecated since July:
https://discourse.llvm.org/t/psa-opty-create-now-with-100-more-tab-complete/87339

Add a deprecation attribute to prevent new uses from creeping in.
The --source option was broken when using the --macho flag because
DisassembleMachO() only initialized debug info when UseDbg was true, and
would return early if no dSYM was found.
This patch provides an approximation of the memory locations touched by
`llvm.matrix.column.major.load` and `llvm.matrix.column.major.store`,
enabling dead store elimination and GVN to remove redundant loads and
dead stores.

PR: llvm#163368
…m#164676)

In 4368616 we accidentally moved uses of command-line args saved
into a bump pointer allocator during response file expansion out of
scope of the allocator. Also, the test that should have caught this (at
least with asan) was not working correctly because clang-scan-deps was
expanding response files itself during argument adjustment rather than
the underlying scanner library.

rdar://162720059
Reviewers: 

Pull Request: llvm#164693
rnk and others added 16 commits October 22, 2025 19:57
We already have a matching constructor from ArrayRef, so add support for
assigning from ArrayRef as well.
This testcase shows that adding a ubsan check and then removing it
during the LowerAllowCheck pass does not entirely undo the effects of
adding the check.
…izations. (llvm#149706)"

This reverts commit 8d29d09.

There have been reports of mis-compiles
in llvm#149706.

Revert while I investigate.
)

This adds guards on the ptrauth feature checks so that they are only
performed if __has_feature is actually available.
…lvm#164677)

The OpenACC data clause operation `acc.copyin` used for mapping
variables to device memory includes bookkeeping required by the OpenACC
spec for updating present counters. However, for firstprivate variables,
no counters should be updated since this clause creates a private copy
on the device initialized with the original value from the host (as
described in OpenACC 3.4 section 2.5.14: "the copy will be initialized
with the value of that item on the local thread").

This PR introduces the `acc.firstprivate_map` operation to capture these
mapping semantics without counter updates. A test is included
demonstrating how this operation can be used to initialize a
materialized private variable (represented by `memref.alloca` inside an
`acc.parallel` region).
Test update was missed in bfc322d due a codegen test running
loop-vectorize directly. The loop does not get vectorized any longer.
This PR refactors `ASTUnit::LoadFromASTFile()` to be easier to follow.
Conceptually, it tries to read an AST file, adopt the serialized
options, and set up `Sema` and `ASTContext` to deserialize the AST file
contents on-demand.

The implementation of this used to be spread across an
`ASTReaderListener` and the function in question. Figuring out what
listener method gets called when and how it's supposed to interact with
the rest of the functionality was very unclear. The `FileManager`'s VFS
was being swapped-out during deserialization, the options were being
adopted by `Preprocessor` and others just-in-time to pass `ASTReader`'s
validation checks, and the target was being initialized somewhere in
between all of this. This lead to a very muddy semantics.

This PR splits `ASTUnit::LoadFromASTFile()` into three distinct steps:
1. Read out the options from the AST file.
2. Initialize objects from the VFS to the `ASTContext`.
3. Load the AST file and hook it up with the compiler objects.

This should be much easier to understand, and I've done my best to
clearly document the remaining gotchas.

(This was originally motivated by the desire to remove
`FileManager::setVirtualFileSystem()` and make it impossible to swap out
VFSs from underneath `FileManager` mid-compile.)
Make call graph section to have a dedicated type instead of the generic
progbits type.
@skganesan008
Copy link
Collaborator

@skganesan008
Copy link
Collaborator

@ronlieb ronlieb merged commit cfb6bfc into amd-staging Oct 23, 2025
13 checks passed
@ronlieb ronlieb deleted the amd/merge/upstream_merge_20251022171705 branch October 23, 2025 05:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.