Skip to content

Conversation

Sa4dUs
Copy link
Contributor

@Sa4dUs Sa4dUs commented Jun 15, 2025

This PR handles ABI changes for autodiff input arguments to improve Enzyme compatibility. Fundamentally this adjusts activities when a function argument is lowered as an ScalarPair, so there's no mismatch between diff activities and args. Also removes activities corresponding to ZSTs.

fixes: #144025

r? @ZuseZ4

@rustbot rustbot added A-attributes Area: Attributes (`#[…]`, `#![…]`) F-autodiff `#![feature(autodiff)]` T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 15, 2025
@rust-log-analyzer

This comment has been minimized.

Comment on lines 139 to 142
/// `#[rustc_autodiff_no_abi_opt]`: internal marker applied to `#[rustc_autodiff]` primal functions
/// whose argument layout may be sensitive to ABI-level optimizations. This marker prevents certain
/// optimizations that could otherwise break compatibility with Enzyme's expectations.
const RUSTC_AUTODIFF_NO_ABI_OPT = 1 << 16;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't say "certain optimizations", or the next person who comes along is going to make it so that these functions are treated as -O0. Identify the actual problem: LLVM will modify the ABI of functions if it can identify them as fully internalized.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A change that bad should be caught by a reviewer, but I review a lot of PRs and I enjoy it when the codebase informs people of what is actually going on so they are more likely to have made changes that are consistent with the existing situation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks for the feedback. I’ll make sure to keep comments as specific as possible to avoid any ambiguity going forward.

Comment on lines 924 to 958
fn is_abi_opt_sensitive<'tcx>(tcx: TyCtxt<'tcx>, ty: Ty<'tcx>) -> bool {
match ty.kind() {
ty::Ref(_, inner, _) | ty::RawPtr(inner, _) => {
match inner.kind() {
ty::Slice(_) => {
// Since we cannot guarantee that the slice length is large enough
// to avoid optimization, we assume it is ABI-opt sensitive.
return true;
}
ty::Array(elem_ty, len) => {
let Some(len_val) = len.try_to_target_usize(tcx) else {
return false;
};

let pci = PseudoCanonicalInput {
typing_env: TypingEnv::fully_monomorphized(),
value: *elem_ty,
};

if elem_ty.is_scalar() {
let elem_size =
tcx.layout_of(pci).ok().map(|layout| layout.size).unwrap_or(Size::ZERO);

if elem_size.bytes() * len_val <= tcx.data_layout.pointer_size.bytes() * 2 {
return true;
}
}
}
_ => {}
}

false
}
ty::FnPtr(_, _) => true,
_ => false,
Copy link
Member

@workingjubilee workingjubilee Jun 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should only matter when Enzyme is on, right?

Why are you ignoring ty::Array when it is not through ty::Ref? Does Enzyme not even deal in simple aggregates like that? I'm not even sure this is the correct layer to be examining things like this at, since it's well above the LLVM IR type layer. Multiple types in Rust source can wind up being lowered to the naive equivalent of this in the IR.

Anyway, please name this fn as specific to Enzyme, at least.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this logic is only for Enzyme. I didn't add the ty::Array logic yet because I'm still not sure on how to handle it as, for example, [f32; 2] is lowered to i64. As the number of args does not change, Enzyme may not have issues with that.

Copy link
Member

@workingjubilee workingjubilee Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I have determined this is blocked on me to have a proper solution for it. I would like it if you opened an issue before this PR lands that points to https://github.com/rust-lang/rust/blame/86d0aef80403f095d8bbabf44d9fdecfcd45f076/compiler/rustc_target/src/callconv/mod.rs#L708 and your new code here, and says that a new variant of the adjust_for_rust_abi code that doesn't just mutate the arguments needs to exist so that this query can be answered without relying on mutable state that cannot be invoked idempotently.

self.flags.contains(CodegenFnAttrFlags::NO_MANGLE)
|| self.flags.contains(CodegenFnAttrFlags::RUSTC_STD_INTERNAL_SYMBOL)
|| self.export_name.is_some()
|| self.flags.contains(CodegenFnAttrFlags::RUSTC_AUTODIFF_NO_ABI_OPT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also suppress the dead_code lint. I think the reason the symbol is still getting marked as dso_local even with this change is because it has SymbolExportLevel::Rust. Only for SymbolExportLevel::C do we tell LTO to export the symbol.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll look into that and try to ensure it has the minimal number of side effects possible. Thank you :)

@bors
Copy link
Collaborator

bors commented Jun 20, 2025

☔ The latest upstream changes (presumably #142770) made this pull request unmergeable. Please resolve the merge conflicts.

@bors bors added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Jun 20, 2025
@rust-cloud-vms rust-cloud-vms bot force-pushed the prevent-abi-changes branch from 1035486 to e243a3c Compare June 20, 2025 18:30
@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 marked this pull request as ready for review June 24, 2025 18:48
@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 24, 2025
@rustbot
Copy link
Collaborator

rustbot commented Jun 24, 2025

Some changes occurred in compiler/rustc_monomorphize/src/partitioning/autodiff.rs

cc @ZuseZ4

@rustbot

This comment has been minimized.

// debug-NEXT: %_2 = load float, ptr %0, align 4, !alias.scope !7, !noalias !4
// debug-NEXT: %"'ipg2" = getelementptr inbounds float, ptr %"x'", i64 1
// debug-NEXT: %1 = getelementptr inbounds nuw float, ptr %x, i64 1
// debug-NEXT: %"_5'ipl" = load float, ptr %"'ipg2", align 4, !alias.scope !4, !noalias !7
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove , !alias.scope !4, !noalias !7 and similar scope and metadata annotations. align can stay.

They are fragile, numbers might change and we want to avoid test failures because of it.

//@ no-prefer-dynamic
//@ needs-enzyme

// This does only test the funtion attribute handling for autodiff.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't about function attributes, or? It's more about verifying that Rust types are lowered to LLVM-IR types in a way that we expect and which enzyme can handle. We also explicitely check release mode, to verify that LLVM's O3 pipeline does not rewrite function signatures into something that Enzyme can not handle anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was initially and forgot to remove that. I'll fix it and add a more detailed comment about what's this test for.

.non_enum_variant()
.fields
.iter()
.map(|f| count_scalar_fields(tcx, f.ty(tcx, substs)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really want to recursively count and sum?

I think that anything behind a double indirection probably won't affect the size on the function abi, or?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After testing a bit, i think the recursive sumation (or any other way of counting the "non-splittable" fields) is necessary becase if the aggregate has more than 2 fields when flattened, it's behaving slightly different, even when under the pointer size. I'll adjust it to consider this cases.


let is_product = |t: Ty<'tcx>| matches!(t.kind(), ty::Tuple(_) | ty::Adt(_, _));

if layout.size() <= pointer_size * 2 && is_product(*ty) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a note that this is the magic number based on which LLVM might optimize?

@ZuseZ4
Copy link
Member

ZuseZ4 commented Jun 24, 2025

I am not 100% sure about the recursive summation here. In general also, @oli-obk can you review this, as I just don't know if PseudoCanonicalInput and fully_monomorphize is something we want here.

@oli-obk oli-obk self-assigned this Jun 25, 2025
@rust-cloud-vms rust-cloud-vms bot force-pushed the prevent-abi-changes branch from 07b10dd to 39d1efc Compare June 26, 2025 15:00
@Sa4dUs
Copy link
Contributor Author

Sa4dUs commented Jun 26, 2025

Once the solution is decent, I can optimize minor things. I leave it for the end to not optimize on something that is not correct.

@JohnCSimon
Copy link
Member

@Sa4dUs
ping from triage -

I'm unsure if this is waiting on review but -
when a PR is ready for review, send a message containing
@rustbot ready to switch to S-waiting-on-review so the PR is in the reviewer's backlog.

@Dylan-DPC Dylan-DPC added -Zfixed-x18 Unstable option: -Zfixed-x18 and removed -Zfixed-x18 Unstable option: -Zfixed-x18 S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jul 13, 2025
@rustbot

This comment has been minimized.

@rust-cloud-vms rust-cloud-vms bot force-pushed the prevent-abi-changes branch from 7d7ad34 to d8c09d3 Compare July 14, 2025 17:08
Comment on lines 85 to 86
new_activities.push(da[i].clone());
new_positions.push(i + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document why we're adding an entry here.

Is it intended, that the scalar pair entries both have the same diffactivity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it's intended. if we changed the activity of an individual field respect the original field for some reason, this could potentially affect the return function signature

new_activities.push(da[i].clone());
new_positions.push(i + 1);
}
_ => {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use an exhaustive match and either span_bug! things, report an error, or explain why this one is ok

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll move it to an if let since we only need to apply corrections to ScalarPair args, that way i don't leave unmatched variants at the match

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intention was to document on the other arms why they need no adjustment or are unreachable. Because it isn't clear to me how nonscalar(pair) layouts are handled

Copy link
Contributor Author

@Sa4dUs Sa4dUs Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaik, other cases don't change the number of args, so doesn't cause a missmatch between the function args and the diff activities, and they don't need to be adjusted (and slices where already handled above)

Copy link
Contributor Author

@Sa4dUs Sa4dUs Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

da: &mut Vec<DiffAcitivity> already contains a diff activity per source code arg, so in adjust_activity_to_abi we are only adjusting activities to prevent errors on codegen

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should i explicitly specify that on the code?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand enough here to have an educated opinion. @ZuseZ4 is this literally sth only scalar pairs can ever hit, or can there be situations with aggregates or SIMD vectors?

@rustbot
Copy link
Collaborator

rustbot commented Aug 16, 2025

This PR was rebased onto a different master commit! Check out the changes with our range-diff.

@rust-log-analyzer

This comment has been minimized.

@rust-cloud-vms rust-cloud-vms bot force-pushed the prevent-abi-changes branch from 4ab800b to 58d77c0 Compare August 16, 2025 12:28
@rust-log-analyzer

This comment has been minimized.

@rust-cloud-vms rust-cloud-vms bot force-pushed the prevent-abi-changes branch from 58d77c0 to a7654e8 Compare August 16, 2025 14:10
@rustbot
Copy link
Collaborator

rustbot commented Aug 16, 2025

This PR was rebased onto a different master commit! Check out the changes with our range-diff.

@rust-cloud-vms rust-cloud-vms bot force-pushed the prevent-abi-changes branch from a7654e8 to aca678e Compare August 19, 2025 18:51
// CHECK-NEXT: Function Attrs
// debug-NEXT: define internal float @_ZN12abi_handling2f117hd2edd01111f953c8E
// debug-SAME: (ptr align 4 %x)
// release-NEXT: define internal fastcc noundef float @_ZN12abi_handling2f117hd2edd01111f953c8E
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am worried that the tests could fail in the future because of mangling name updates. At the same time, I remember us discussing no_mangle having an effect, because it maces a function publicly visible and hence limiting what LLVM will optimize. I would probably just not match the mangled name itself, unless you also see matches on mangled names in other tests?

Also, could you use CHECK-LABEL for the matching of the debug info name like abi_handling::df2? I think that suitable here and e.g. the consts.rs file uses it.
https://llvm.org/docs/CommandGuide/FileCheck.html#the-check-label-directive

Copy link
Contributor Author

@Sa4dUs Sa4dUs Aug 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. kept mode-NEXT mode-SAME since it's still needed in some functions to pass tidy checks and keeps things consistent. also, given the goal of the test, having args check on a separate line sounds good for me.

@rust-cloud-vms rust-cloud-vms bot force-pushed the prevent-abi-changes branch 2 times, most recently from f8265e8 to ac8749d Compare August 20, 2025 08:06
fn f(_zst: (), _x: &mut f64) {}

#[unsafe(no_mangle)]
pub extern "C" fn fd(x: &mut f64, xd: &mut f64) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this is extern "C"? Together with the Rust references, that looks a bit weird.

Copy link
Contributor Author

@Sa4dUs Sa4dUs Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just copied the code from this issue #144025, the one reporting the ZST args failure

i can just remove the extern clause to make it more "general" if u want

Copy link
Member

@ZuseZ4 ZuseZ4 Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that makes sense. Yes, removing the extern "C" sounds good, I don't think it's relevant or needed. Afterwards lgtm, besides bjorn's c API hint (if it's too much work feel free to postpone it to a follow-up pr though and leave a fixme).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no worries, i have access to all the needed items when calling adjust_activity_to_abi so it shouldn't be a problem :)

// For ZST, just ignore and don't add its activity, as this arg won't be present
// in the LLVM passed to Enzyme.
// FIXME(Sa4dUs): Enforce ZST corresponding diff activity be `Const`
if layout.is_zst() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On some targets we do actually pass ZST arguments indirectly in the C ABI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wasn't aware of that, is there any place i can look deeper at this? because i'm not entirely sure on how to distinguish those cases in code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FnAbi contains this information in the mode field of the ArgAbi for the respective argument. This is a PassMode enum describing exactly how the argument is passed. You can use cx.fn_abi_of_instance(instance, ty::List::empty()) to get the FnAbi for an Instance. (the ty::List::empty() argument represents all extra varargs passed to a variadic function when calling it. I assume autodiff doesn't allow varargs_

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enzyme generally (should) support variadics/varargs, it looks like on the MLIR side it's even somewhat tested. However, I didn't consider them when writing my frontend, and I think it's fine to leave them unsupported till someone actually asks for it. If you want you can add support for fun, but just throwing an error if you encounter extra vararg (or even a variadic function in general) is also more than enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, i'll try to upload the fix in the following hours

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. i've removed the layout zst check as, afaik, it's not relevant anymore if we are already checking the arg's pass mode

@rustbot
Copy link
Collaborator

rustbot commented Sep 17, 2025

This PR was rebased onto a different master commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@ZuseZ4
Copy link
Member

ZuseZ4 commented Sep 18, 2025

Thanks! A few more cases handled, more tests are always good, and variadic users now get an error so they can open an issue about it.

@bors r+

@bors
Copy link
Collaborator

bors commented Sep 18, 2025

📌 Commit e04567c has been approved by ZuseZ4

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 18, 2025
@bors
Copy link
Collaborator

bors commented Sep 18, 2025

⌛ Testing commit e04567c with merge 97a987f...

@bors
Copy link
Collaborator

bors commented Sep 18, 2025

☀️ Test successful - checks-actions
Approved by: ZuseZ4
Pushing 97a987f to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Sep 18, 2025
@bors bors merged commit 97a987f into rust-lang:master Sep 18, 2025
11 checks passed
@rustbot rustbot added this to the 1.92.0 milestone Sep 18, 2025
Copy link
Contributor

What is this? This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.

Comparing 4793ef5 (parent) -> 97a987f (this PR)

Test differences

Show 9 test diffs

Stage 1

  • [codegen] tests/codegen-llvm/autodiff/abi_handling.rs#debug: [missing] -> ignore (ignored when LLVM Enzyme is disabled or LLVM is not the default codegen backend) (J1)
  • [codegen] tests/codegen-llvm/autodiff/abi_handling.rs#release: [missing] -> ignore (ignored when LLVM Enzyme is disabled or LLVM is not the default codegen backend) (J1)
  • [ui] tests/ui/autodiff/zst.rs: [missing] -> ignore (ignored when LLVM Enzyme is disabled or LLVM is not the default codegen backend) (J1)

Stage 2

  • [codegen] tests/codegen-llvm/autodiff/abi_handling.rs#debug: [missing] -> ignore (ignored when LLVM Enzyme is disabled or LLVM is not the default codegen backend) (J0)
  • [codegen] tests/codegen-llvm/autodiff/abi_handling.rs#release: [missing] -> ignore (ignored when LLVM Enzyme is disabled or LLVM is not the default codegen backend) (J0)
  • [ui] tests/ui/autodiff/zst.rs: [missing] -> ignore (ignored when LLVM Enzyme is disabled or LLVM is not the default codegen backend) (J2)

Additionally, 3 doctest diffs were found. These are ignored, as they are noisy.

Job group index

Test dashboard

Run

cargo run --manifest-path src/ci/citool/Cargo.toml -- \
    test-dashboard 97a987f14c5bd948f7ee8dba75999f104a6f03a7 --output-dir test-dashboard

And then open test-dashboard/index.html in your browser to see an overview of all executed tests.

Job duration changes

  1. dist-aarch64-linux: 6313.8s -> 8602.4s (36.2%)
  2. pr-check-1: 1691.5s -> 1402.0s (-17.1%)
  3. pr-check-2: 2538.1s -> 2153.9s (-15.1%)
  4. x86_64-rust-for-linux: 2983.6s -> 2572.8s (-13.8%)
  5. armhf-gnu: 5652.7s -> 4977.6s (-11.9%)
  6. dist-x86_64-netbsd: 4644.8s -> 5154.6s (11.0%)
  7. dist-powerpc64le-linux-musl: 5260.3s -> 5827.0s (10.8%)
  8. dist-aarch64-apple: 7486.2s -> 6692.9s (-10.6%)
  9. i686-gnu-nopt-1: 7891.3s -> 7058.6s (-10.6%)
  10. x86_64-gnu-tools: 3755.1s -> 3372.4s (-10.2%)
How to interpret the job duration changes?

Job durations can vary a lot, based on the actual runner instance
that executed the job, system noise, invalidated caches, etc. The table above is provided
mostly for t-infra members, for simpler debugging of potential CI slow-downs.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (97a987f): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
0.2% [0.0%, 0.3%] 2
Improvements ✅
(primary)
-0.2% [-0.2%, -0.2%] 1
Improvements ✅
(secondary)
-0.2% [-0.2%, -0.1%] 18
All ❌✅ (primary) -0.2% [-0.2%, -0.2%] 1

Max RSS (memory usage)

Results (primary 2.1%, secondary -3.7%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
2.1% [2.1%, 2.1%] 1
Regressions ❌
(secondary)
2.0% [2.0%, 2.0%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-4.5% [-4.8%, -4.1%] 7
All ❌✅ (primary) 2.1% [2.1%, 2.1%] 1

Cycles

Results (primary -2.3%, secondary 2.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
3.7% [2.7%, 5.0%] 6
Improvements ✅
(primary)
-2.3% [-2.3%, -2.3%] 1
Improvements ✅
(secondary)
-2.4% [-2.7%, -2.1%] 2
All ❌✅ (primary) -2.3% [-2.3%, -2.3%] 1

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 470.288s -> 471.46s (0.25%)
Artifact size: 388.07 MiB -> 387.92 MiB (-0.04%)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-attributes Area: Attributes (`#[…]`, `#![…]`) A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. F-autodiff `#![feature(autodiff)]` merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Autodiff often breaks with ZST arguments