Skip to content

Commit f4c952a

Browse files
[AMDGPU] Check legality of both operands before commute
When trying to fold an SGPR into a DPP add, si-fold-operands correctly realizes that this is not possible and then tries to commute which mistakenly succeeds, creating a dpp add with two SGPRs. We need to check both operands if they are legal in their new position. This crashes a test in triton on gfx12: ttps://github.com/triton-lang/triton/blob/345c633787e90a7f94864de3035346eb5de1781f/python/test/unit/language/test_core.py#L2718
1 parent 70bc7d1 commit f4c952a

File tree

2 files changed

+28
-2
lines changed

2 files changed

+28
-2
lines changed

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2807,12 +2807,14 @@ bool SIInstrInfo::isLegalToSwap(const MachineInstr &MI, unsigned OpIdx0,
28072807
if ((int)OpIdx1 != Src0Idx && MO0->isReg()) {
28082808
if (!DefinedRC1)
28092809
return OpInfo1.OperandType == MCOI::OPERAND_UNKNOWN;
2810-
return isLegalRegOperand(MI, OpIdx1, *MO0);
2810+
return isLegalRegOperand(MI, OpIdx1, *MO0) &&
2811+
(!MO1->isReg() || isLegalRegOperand(MI, OpIdx0, *MO1));
28112812
}
28122813
if ((int)OpIdx0 != Src0Idx && MO1->isReg()) {
28132814
if (!DefinedRC0)
28142815
return OpInfo0.OperandType == MCOI::OPERAND_UNKNOWN;
2815-
return isLegalRegOperand(MI, OpIdx0, *MO1);
2816+
return (!MO0->isReg() || isLegalRegOperand(MI, OpIdx1, *MO0)) &&
2817+
isLegalRegOperand(MI, OpIdx0, *MO1);
28162818
}
28172819

28182820
// No need to check 64-bit literals since swapping does not bring new
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
2+
# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1200 -run-pass=si-fold-operands -verify-machineinstrs -o - %s | FileCheck %s
3+
4+
---
5+
name: fold_commute_sgprs
6+
body: |
7+
bb.0:
8+
liveins: $sgpr0, $sgpr1
9+
; CHECK-LABEL: name: fold_commute_sgprs
10+
; CHECK: liveins: $sgpr0, $sgpr1
11+
; CHECK-NEXT: {{ $}}
12+
; CHECK-NEXT: [[COPY:%[0-9]+]]:sreg_32 = COPY $sgpr0
13+
; CHECK-NEXT: [[DEF:%[0-9]+]]:sreg_32 = IMPLICIT_DEF
14+
; CHECK-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY [[DEF]]
15+
; CHECK-NEXT: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]]
16+
; CHECK-NEXT: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr1
17+
; CHECK-NEXT: [[V_ADD_NC_U16_fake16_e64_dpp:%[0-9]+]]:vgpr_32 = V_ADD_NC_U16_fake16_e64_dpp [[COPY1]], 0, [[COPY2]], 0, [[COPY3]], 0, 0, 280, 15, 15, 1, implicit $exec
18+
%0:sreg_32 = COPY $sgpr0
19+
%1:sreg_32 = IMPLICIT_DEF
20+
%2:vgpr_32 = COPY %1:sreg_32
21+
%3:vgpr_32 = COPY %0:sreg_32
22+
%4:sreg_32 = COPY $sgpr1
23+
%5:vgpr_32 = V_ADD_NC_U16_fake16_e64_dpp %2:vgpr_32, 0, %3:vgpr_32, 0, %4:sreg_32, 0, 0, 280, 15, 15, 1, implicit $exec
24+
...

0 commit comments

Comments
 (0)