Skip to content

Commit 70e1a3c

Browse files
[AMDGPU] Check legality of both operands before swap (#148843)
When trying to fold an SGPR into the second operand to a DPP add, si-fold-operands correctly determines that this is not possible and attempts to swap the second and third operand. This succeeds even if the third operand is an SGPR, creating an illegal dpp add with two SGPR operands. We need to check both operands if they are legal in their new position. This causes a crash at compile time for a test in triton on gfx12: https://github.com/triton-lang/triton/blob/345c633787e90a7f94864de3035346eb5de1781f/python/test/unit/language/test_core.py#L2718 Co-authored-by: Paul Trojahn <[email protected]>
1 parent 52432b0 commit 70e1a3c

File tree

2 files changed

+28
-2
lines changed

2 files changed

+28
-2
lines changed

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2813,12 +2813,14 @@ bool SIInstrInfo::isLegalToSwap(const MachineInstr &MI, unsigned OpIdx0,
28132813
if ((int)OpIdx1 != Src0Idx && MO0->isReg()) {
28142814
if (!DefinedRC1)
28152815
return OpInfo1.OperandType == MCOI::OPERAND_UNKNOWN;
2816-
return isLegalRegOperand(MI, OpIdx1, *MO0);
2816+
return isLegalRegOperand(MI, OpIdx1, *MO0) &&
2817+
(!MO1->isReg() || isLegalRegOperand(MI, OpIdx0, *MO1));
28172818
}
28182819
if ((int)OpIdx0 != Src0Idx && MO1->isReg()) {
28192820
if (!DefinedRC0)
28202821
return OpInfo0.OperandType == MCOI::OPERAND_UNKNOWN;
2821-
return isLegalRegOperand(MI, OpIdx0, *MO1);
2822+
return (!MO0->isReg() || isLegalRegOperand(MI, OpIdx1, *MO0)) &&
2823+
isLegalRegOperand(MI, OpIdx0, *MO1);
28222824
}
28232825

28242826
// No need to check 64-bit literals since swapping does not bring new
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
2+
# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1200 -run-pass=si-fold-operands -verify-machineinstrs -o - %s | FileCheck %s
3+
4+
---
5+
name: fold_commute_sgprs
6+
body: |
7+
bb.0:
8+
liveins: $sgpr0, $sgpr1
9+
; CHECK-LABEL: name: fold_commute_sgprs
10+
; CHECK: liveins: $sgpr0, $sgpr1
11+
; CHECK-NEXT: {{ $}}
12+
; CHECK-NEXT: [[COPY:%[0-9]+]]:sreg_32 = COPY $sgpr0
13+
; CHECK-NEXT: [[DEF:%[0-9]+]]:sreg_32 = IMPLICIT_DEF
14+
; CHECK-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY [[DEF]]
15+
; CHECK-NEXT: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]]
16+
; CHECK-NEXT: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr1
17+
; CHECK-NEXT: [[V_ADD_NC_U16_fake16_e64_dpp:%[0-9]+]]:vgpr_32 = V_ADD_NC_U16_fake16_e64_dpp [[COPY1]], 0, [[COPY2]], 0, [[COPY3]], 0, 0, 280, 15, 15, 1, implicit $exec
18+
%0:sreg_32 = COPY $sgpr0
19+
%1:sreg_32 = IMPLICIT_DEF
20+
%2:vgpr_32 = COPY %1:sreg_32
21+
%3:vgpr_32 = COPY %0:sreg_32
22+
%4:sreg_32 = COPY $sgpr1
23+
%5:vgpr_32 = V_ADD_NC_U16_fake16_e64_dpp %2:vgpr_32, 0, %3:vgpr_32, 0, %4:sreg_32, 0, 0, 280, 15, 15, 1, implicit $exec
24+
...

0 commit comments

Comments
 (0)