-
Notifications
You must be signed in to change notification settings - Fork 14.5k
[LLVM] Improve the DemandedBits Analysis #148853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-llvm-analysis Author: Panagiotis K (karouzakisp) ChangesThis patch adds support for missing operators inside the DemandedBits Analysis. Those operators are SDiv, UDiv, URem, SRem. Also, other operators such as Shl and Ashr are improved to handle non constant argument shift amount. Multiplication is also improved. Comparison with the upstream version of llvm with the Oz pipeline showed up to 10% code size reduction in the llvm test suite. Patch is 27.61 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/148853.diff 6 Files Affected:
diff --git a/llvm/lib/Analysis/DemandedBits.cpp b/llvm/lib/Analysis/DemandedBits.cpp
index 6694d5cc06c8c..1fa94e95cbceb 100644
--- a/llvm/lib/Analysis/DemandedBits.cpp
+++ b/llvm/lib/Analysis/DemandedBits.cpp
@@ -36,6 +36,7 @@
#include "llvm/Support/Casting.h"
#include "llvm/Support/Debug.h"
#include "llvm/Support/KnownBits.h"
+#include "llvm/Support/MathExtras.h"
#include "llvm/Support/raw_ostream.h"
#include <algorithm>
#include <cstdint>
@@ -164,10 +165,24 @@ void DemandedBits::determineLiveOperandBits(
}
break;
case Instruction::Mul:
- // Find the highest live output bit. We don't need any more input
- // bits than that (adds, and thus subtracts, ripple only to the
- // left).
- AB = APInt::getLowBitsSet(BitWidth, AOut.getActiveBits());
+ const APInt *C;
+ if (OperandNo == 0) {
+ // to have output bits 0...H-1 we need the input bits
+ // 0...(H - ceiling(log_2))
+ if (match(UserI->getOperand(1), m_APInt(C))) {
+ auto LogC = C->isOne() ? 0 : C->logBase2() + 1;
+ unsigned Need =
+ AOut.getActiveBits() > LogC ? AOut.getActiveBits() - LogC : 0;
+ AB = APInt::getLowBitsSet(BitWidth, Need);
+ } else { // TODO: we can possibly check for Op0 constant too
+ AB = APInt::getLowBitsSet(BitWidth, AOut.getActiveBits());
+ }
+ } else {
+ // Find the highest live output bit. We don't need any more input
+ // bits than that (adds, and thus subtracts, ripple only to the
+ // left).
+ AB = APInt::getLowBitsSet(BitWidth, AOut.getActiveBits());
+ }
break;
case Instruction::Shl:
if (OperandNo == 0) {
@@ -183,6 +198,17 @@ void DemandedBits::determineLiveOperandBits(
AB |= APInt::getHighBitsSet(BitWidth, ShiftAmt+1);
else if (S->hasNoUnsignedWrap())
AB |= APInt::getHighBitsSet(BitWidth, ShiftAmt);
+ } else {
+ ComputeKnownBits(BitWidth, UserI->getOperand(1), nullptr);
+ unsigned Min = Known.getMinValue().getLimitedValue(BitWidth - 1);
+ unsigned Max = Known.getMaxValue().getLimitedValue(BitWidth - 1);
+ // similar to Lshr case
+ AB = (AOut.lshr(Min) | AOut.lshr(Max));
+ const auto *S = cast<ShlOperator>(UserI);
+ if (S->hasNoSignedWrap())
+ AB |= APInt::getHighBitsSet(BitWidth, Max + 1);
+ else if (S->hasNoUnsignedWrap())
+ AB |= APInt::getHighBitsSet(BitWidth, Max);
}
}
break;
@@ -197,6 +223,19 @@ void DemandedBits::determineLiveOperandBits(
// (they must be zero).
if (cast<LShrOperator>(UserI)->isExact())
AB |= APInt::getLowBitsSet(BitWidth, ShiftAmt);
+ } else {
+ ComputeKnownBits(BitWidth, UserI->getOperand(1), nullptr);
+ unsigned Min = Known.getMinValue().getLimitedValue(BitWidth - 1);
+ unsigned Max = Known.getMaxValue().getLimitedValue(BitWidth - 1);
+ // Suppose AOut == 0b0000 0011
+ // [min, max] = [1, 3]
+ // shift by 1 we get 0b0000 0110
+ // shift by 2 we get 0b0000 1100
+ // shift by 3 we get 0b0001 1000
+ // we take the or here because need to cover all the above possibilities
+ AB = (AOut.shl(Min) | AOut.shl(Max));
+ if (cast<LShrOperator>(UserI)->isExact())
+ AB |= APInt::getLowBitsSet(BitWidth, Max);
}
}
break;
@@ -217,6 +256,27 @@ void DemandedBits::determineLiveOperandBits(
// (they must be zero).
if (cast<AShrOperator>(UserI)->isExact())
AB |= APInt::getLowBitsSet(BitWidth, ShiftAmt);
+ } else {
+ ComputeKnownBits(BitWidth, UserI->getOperand(1), nullptr);
+ unsigned Min = Known.getMinValue().getLimitedValue(BitWidth - 1);
+ unsigned Max = Known.getMaxValue().getLimitedValue(BitWidth - 1);
+ AB = (AOut.shl(Min) | AOut.shl(Max));
+
+ if (Max) {
+ // Suppose AOut = 0011 1100
+ // [min, max] = [1, 3]
+ // ShiftAmount = 1 : Mask is 1000 0000
+ // ShiftAmount = 2 : Mask is 1100 0000
+ // ShiftAmount = 3 : Mask is 1110 0000
+ // The Mask with Max covers every case in [min, max],
+ // so we are done
+ if ((AOut & APInt::getHighBitsSet(BitWidth, Max)).getBoolValue())
+ AB.setSignBit();
+ }
+ // If the shift is exact, then the low bits are not dead
+ // (they must be zero).
+ if (cast<AShrOperator>(UserI)->isExact())
+ AB |= APInt::getLowBitsSet(BitWidth, Max);
}
}
break;
@@ -246,6 +306,35 @@ void DemandedBits::determineLiveOperandBits(
else
AB &= ~(Known.One & ~Known2.One);
break;
+ case Instruction::UDiv:
+ case Instruction::URem:
+ case Instruction::SDiv:
+ case Instruction::SRem: {
+ auto Opc = UserI->getOpcode();
+ auto IsDiv = Opc == Instruction::UDiv || Opc == Instruction::SDiv;
+ bool IsSigned = Opc == Instruction::SDiv || Opc == Instruction::SRem;
+ if (OperandNo == 0) {
+ const APInt *DivAmnt;
+ if (match(UserI->getOperand(1), m_APInt(DivAmnt))) {
+ uint64_t D = DivAmnt->getZExtValue();
+ if (isPowerOf2_64(D)) {
+ unsigned Sh = Log2_64(D);
+ if (IsDiv) {
+ AB = AOut.shl(Sh);
+ } else {
+ AB = AOut & APInt::getLowBitsSet(BitWidth, Sh);
+ }
+ } else { // Non power of 2 constant div
+ unsigned LowQ = AOut.getActiveBits();
+ unsigned Need = LowQ + Log2_64_Ceil(D);
+ if (IsSigned)
+ Need++;
+ AB = APInt::getLowBitsSet(BitWidth, std::min(BitWidth, Need));
+ }
+ }
+ }
+ break;
+ }
case Instruction::Xor:
case Instruction::PHI:
AB = AOut;
diff --git a/llvm/test/Analysis/DemandedBits/basic.ll b/llvm/test/Analysis/DemandedBits/basic.ll
index 4dc59c5392935..62eba9eaa81c5 100644
--- a/llvm/test/Analysis/DemandedBits/basic.ll
+++ b/llvm/test/Analysis/DemandedBits/basic.ll
@@ -25,3 +25,28 @@ define i8 @test_mul(i32 %a, i32 %b) {
%6 = add nsw i8 %3, %5
ret i8 %6
}
+; CHECK-LABEL: Printing analysis 'Demanded Bits Analysis' for function 'test_mul_constant':
+; CHECK-DAG: DemandedBits: 0xff for %3 = trunc i32 %2 to i8
+; CHECK-DAG: DemandedBits: 0xff for %2 in %3 = trunc i32 %2 to i8
+; CHECK-DAG: DemandedBits: 0xff for %2 = mul nsw i32 %1, 6
+; CHECK-DAG: DemandedBits: 0x1f for %1 in %2 = mul nsw i32 %1, 6
+; CHECK-DAG: DemandedBits: 0xff for 6 in %2 = mul nsw i32 %1, 6
+; CHECK-DAG: DemandedBits: 0x1 for %4 = trunc i32 %2 to i1
+; CHECK-DAG: DemandedBits: 0x1 for %2 in %4 = trunc i32 %2 to i1
+; CHECK-DAG: DemandedBits: 0x1f for %1 = add nsw i32 %a, 12
+; CHECK-DAG: DemandedBits: 0x1f for %a in %1 = add nsw i32 %a, 12
+; CHECK-DAG: DemandedBits: 0x1f for 12 in %1 = add nsw i32 %a, 12
+; CHECK-DAG: DemandedBits: 0xff for %5 = zext i1 %4 to i8
+; CHECK-DAG: DemandedBits: 0x1 for %4 in %5 = zext i1 %4 to i8
+; CHECK-DAG: DemandedBits: 0xff for %6 = add nsw i8 %3, %5
+; CHECK-DAG: DemandedBits: 0xff for %3 in %6 = add nsw i8 %3, %5
+; CHECK-DAG: DemandedBits: 0xff for %5 in %6 = add nsw i8 %3, %5
+define i8 @test_mul_constant(i32 %a, i32 %b){
+ %1 = add nsw i32 %a, 12
+ %2 = mul nsw i32 %1, 6
+ %3 = trunc i32 %2 to i8
+ %4 = trunc i32 %2 to i1
+ %5 = zext i1 %4 to i8
+ %6 = add nsw i8 %3, %5
+ ret i8 %6
+}
diff --git a/llvm/test/Analysis/DemandedBits/div_rem.ll b/llvm/test/Analysis/DemandedBits/div_rem.ll
new file mode 100644
index 0000000000000..818cba17dc1a6
--- /dev/null
+++ b/llvm/test/Analysis/DemandedBits/div_rem.ll
@@ -0,0 +1,261 @@
+; RUN: opt -S -disable-output -passes="print<demanded-bits>" < %s 2>&1 | FileCheck %s
+
+define i8 @test_sdiv_const_amount_4(i32 %a) {
+; CHECK-LABEL: 'test_sdiv_const_amount_4'
+; CHECK-DAG: DemandedBits: 0xff for %div.t = trunc i32 %div to i8
+; CHECK-DAG: DemandedBits: 0xff for %div in %div.t = trunc i32 %div to i8
+; CHECK-DAG: DemandedBits: 0xff for %div = sdiv i32 %a, 4
+; CHECK-DAG: DemandedBits: 0x3fc for %a in %div = sdiv i32 %a, 4
+; CHECK-DAG: DemandedBits: 0xffffffff for 4 in %div = sdiv i32 %a, 4
+;
+ %div = sdiv i32 %a, 4
+ %div.t = trunc i32 %div to i8
+ ret i8 %div.t
+}
+
+define i8 @test_sdiv_const_amount_5(i32 %a) {
+; CHECK-LABEL: 'test_sdiv_const_amount_5'
+; CHECK-DAG: DemandedBits: 0xff for %div = sdiv i32 %a, 5
+; CHECK-DAG: DemandedBits: 0xfff for %a in %div = sdiv i32 %a, 5
+; CHECK-DAG: DemandedBits: 0xffffffff for 5 in %div = sdiv i32 %a, 5
+; CHECK-DAG: DemandedBits: 0xff for %div.t = trunc i32 %div to i8
+; CHECK-DAG: DemandedBits: 0xff for %div in %div.t = trunc i32 %div to i8
+;
+ %div = sdiv i32 %a, 5
+ %div.t = trunc i32 %div to i8
+ ret i8 %div.t
+}
+
+define i8 @test_sdiv_const_amount_8(i32 %a) {
+; CHECK-LABEL: 'test_sdiv_const_amount_8'
+; CHECK-DAG: DemandedBits: 0xff for %div = sdiv i32 %a, 8
+; CHECK-DAG: DemandedBits: 0x7f8 for %a in %div = sdiv i32 %a, 8
+; CHECK-DAG: DemandedBits: 0xffffffff for 8 in %div = sdiv i32 %a, 8
+; CHECK-DAG: DemandedBits: 0xff for %div.t = trunc i32 %div to i8
+; CHECK-DAG: DemandedBits: 0xff for %div in %div.t = trunc i32 %div to i8
+;
+ %div = sdiv i32 %a, 8
+ %div.t = trunc i32 %div to i8
+ ret i8 %div.t
+}
+
+define i8 @test_sdiv_const_amount_9(i32 %a) {
+; CHECK-LABEL: 'test_sdiv_const_amount_9'
+; CHECK-DAG: DemandedBits: 0xff for %div = udiv i32 %a, 9
+; CHECK-DAG: DemandedBits: 0xfff for %a in %div = udiv i32 %a, 9
+; CHECK-DAG: DemandedBits: 0xffffffff for 9 in %div = udiv i32 %a, 9
+; CHECK-DAG: DemandedBits: 0xff for %div.t = trunc i32 %div to i8
+; CHECK-DAG: DemandedBits: 0xff for %div in %div.t = trunc i32 %div to i8
+;
+ %div = udiv i32 %a, 9
+ %div.t = trunc i32 %div to i8
+ ret i8 %div.t
+}
+
+define i8 @test_sdiv(i32 %a, i32 %b) {
+; CHECK-LABEL: 'test_sdiv'
+; CHECK-DAG: DemandedBits: 0xff for %div = sdiv i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xffffffff for %a in %div = sdiv i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xffffffff for %b in %div = sdiv i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xff for %div.t = trunc i32 %div to i8
+; CHECK-DAG: DemandedBits: 0xff for %div in %div.t = trunc i32 %div to i8
+;
+ %div = sdiv i32 %a, %b
+ %div.t = trunc i32 %div to i8
+ ret i8 %div.t
+}
+
+define i8 @test_udiv_const_amount_4(i32 %a) {
+; CHECK-LABEL: 'test_udiv_const_amount_4'
+; CHECK-DAG: DemandedBits: 0xff for %div = udiv i32 %a, 4
+; CHECK-DAG: DemandedBits: 0x3fc for %a in %div = udiv i32 %a, 4
+; CHECK-DAG: DemandedBits: 0xffffffff for 4 in %div = udiv i32 %a, 4
+; CHECK-DAG: DemandedBits: 0xff for %div.t = trunc i32 %div to i8
+; CHECK-DAG: DemandedBits: 0xff for %div in %div.t = trunc i32 %div to i8
+;
+ %div = udiv i32 %a, 4
+ %div.t = trunc i32 %div to i8
+ ret i8 %div.t
+}
+
+define i8 @test_udiv_const_amount_5(i32 %a) {
+; CHECK-LABEL: 'test_udiv_const_amount_5'
+; CHECK-DAG: DemandedBits: 0xff for %div.t = trunc i32 %div to i8
+; CHECK-DAG: DemandedBits: 0xff for %div in %div.t = trunc i32 %div to i8
+; CHECK-DAG: DemandedBits: 0xff for %div = udiv i32 %a, 5
+; CHECK-DAG: DemandedBits: 0x7ff for %a in %div = udiv i32 %a, 5
+; CHECK-DAG: DemandedBits: 0xffffffff for 5 in %div = udiv i32 %a, 5
+;
+ %div = udiv i32 %a, 5
+ %div.t = trunc i32 %div to i8
+ ret i8 %div.t
+}
+
+define i8 @test_udiv_const_amount_8(i32 %a) {
+; CHECK-LABEL: 'test_udiv_const_amount_8'
+; CHECK-DAG: DemandedBits: 0xff for %div = udiv i32 %a, 8
+; CHECK-DAG: DemandedBits: 0x7f8 for %a in %div = udiv i32 %a, 8
+; CHECK-DAG: DemandedBits: 0xffffffff for 8 in %div = udiv i32 %a, 8
+; CHECK-DAG: DemandedBits: 0xff for %div.t = trunc i32 %div to i8
+; CHECK-DAG: DemandedBits: 0xff for %div in %div.t = trunc i32 %div to i8
+;
+ %div = udiv i32 %a, 8
+ %div.t = trunc i32 %div to i8
+ ret i8 %div.t
+}
+
+define i8 @test_udiv_const_amount_9(i32 %a) {
+; CHECK-LABEL: 'test_udiv_const_amount_9'
+; CHECK-DAG: DemandedBits: 0xff for %div.t = trunc i32 %div to i8
+; CHECK-DAG: DemandedBits: 0xff for %div in %div.t = trunc i32 %div to i8
+; CHECK-DAG: DemandedBits: 0xff for %div = udiv i32 %a, 9
+; CHECK-DAG: DemandedBits: 0xfff for %a in %div = udiv i32 %a, 9
+; CHECK-DAG: DemandedBits: 0xffffffff for 9 in %div = udiv i32 %a, 9
+;
+ %div = udiv i32 %a, 9
+ %div.t = trunc i32 %div to i8
+ ret i8 %div.t
+}
+
+define i8 @test_udiv(i32 %a, i32 %b) {
+; CHECK-LABEL: 'test_udiv'
+; CHECK-DAG: DemandedBits: 0xff for %div = udiv i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xffffffff for %a in %div = udiv i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xffffffff for %b in %div = udiv i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xff for %div.t = trunc i32 %div to i8
+; CHECK-DAG: DemandedBits: 0xff for %div in %div.t = trunc i32 %div to i8
+;
+ %div = udiv i32 %a, %b
+ %div.t = trunc i32 %div to i8
+ ret i8 %div.t
+}
+
+define i8 @test_srem_const_amount_4(i32 %a) {
+; CHECK-LABEL: 'test_srem_const_amount_4'
+; CHECK-DAG: DemandedBits: 0xff for %rem = srem i32 %a, 4
+; CHECK-DAG: DemandedBits: 0x3 for %a in %rem = srem i32 %a, 4
+; CHECK-DAG: DemandedBits: 0xffffffff for 4 in %rem = srem i32 %a, 4
+; CHECK-DAG: DemandedBits: 0xff for %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for %rem in %rem.t = trunc i32 %rem to i8
+;
+ %rem = srem i32 %a, 4
+ %rem.t = trunc i32 %rem to i8
+ ret i8 %rem.t
+}
+
+define i8 @test_srem_const_amount_5(i32 %a) {
+; CHECK-LABEL: 'test_srem_const_amount_5'
+; CHECK-DAG: DemandedBits: 0xff for %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for %rem in %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for %rem = srem i32 %a, 5
+; CHECK-DAG: DemandedBits: 0xfff for %a in %rem = srem i32 %a, 5
+; CHECK-DAG: DemandedBits: 0xffffffff for 5 in %rem = srem i32 %a, 5
+;
+ %rem = srem i32 %a, 5
+ %rem.t = trunc i32 %rem to i8
+ ret i8 %rem.t
+}
+
+define i8 @test_srem_const_amount_8(i32 %a) {
+; CHECK-LABEL: 'test_srem_const_amount_8'
+; CHECK-DAG: DemandedBits: 0xff for %rem = srem i32 %a, 8
+; CHECK-DAG: DemandedBits: 0x7 for %a in %rem = srem i32 %a, 8
+; CHECK-DAG: DemandedBits: 0xffffffff for 8 in %rem = srem i32 %a, 8
+; CHECK-DAG: DemandedBits: 0xff for %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for %rem in %rem.t = trunc i32 %rem to i8
+;
+ %rem = srem i32 %a, 8
+ %rem.t = trunc i32 %rem to i8
+ ret i8 %rem.t
+}
+
+define i8 @test_srem_const_amount_9(i32 %a) {
+; CHECK-LABEL: 'test_srem_const_amount_9'
+; CHECK-DAG: DemandedBits: 0xff for %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for %rem in %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for %rem = srem i32 %a, 9
+; CHECK-DAG: DemandedBits: 0x1fff for %a in %rem = srem i32 %a, 9
+; CHECK-DAG: DemandedBits: 0xffffffff for 9 in %rem = srem i32 %a, 9
+;
+ %rem = srem i32 %a, 9
+ %rem.t = trunc i32 %rem to i8
+ ret i8 %rem.t
+}
+
+define i8 @test_srem(i32 %a, i32 %b) {
+; CHECK-LABEL: 'test_srem'
+; CHECK-DAG: DemandedBits: 0xff for %rem = srem i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xffffffff for %a in %rem = srem i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xffffffff for %b in %rem = srem i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xff for %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for %rem in %rem.t = trunc i32 %rem to i8
+;
+ %rem = srem i32 %a, %b
+ %rem.t = trunc i32 %rem to i8
+ ret i8 %rem.t
+}
+
+define i8 @test_urem_const_amount_4(i32 %a) {
+; CHECK-LABEL: 'test_urem_const_amount_4'
+; CHECK-DAG: DemandedBits: 0xff for %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for %rem in %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for %rem = urem i32 %a, 4
+; CHECK-DAG: DemandedBits: 0x3 for %a in %rem = urem i32 %a, 4
+; CHECK-DAG: DemandedBits: 0xffffffff for 4 in %rem = urem i32 %a, 4
+;
+ %rem = urem i32 %a, 4
+ %rem.t = trunc i32 %rem to i8
+ ret i8 %rem.t
+}
+
+define i8 @test_urem_const_amount_5(i32 %a) {
+; CHECK-LABEL: 'test_urem_const_amount_5'
+; CHECK-DAG: DemandedBits: 0xff for %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for %rem in %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for %rem = urem i32 %a, 5
+; CHECK-DAG: DemandedBits: 0x7ff for %a in %rem = urem i32 %a, 5
+; CHECK-DAG: DemandedBits: 0xffffffff for 5 in %rem = urem i32 %a, 5
+;
+ %rem = urem i32 %a, 5
+ %rem.t = trunc i32 %rem to i8
+ ret i8 %rem.t
+}
+
+define i8 @test_urem_const_amount_8(i32 %a) {
+; CHECK-LABEL: 'test_urem_const_amount_8'
+; CHECK-DAG: DemandedBits: 0xff for %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for %rem in %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for %rem = urem i32 %a, 8
+; CHECK-DAG: DemandedBits: 0x7 for %a in %rem = urem i32 %a, 8
+; CHECK-DAG: DemandedBits: 0xffffffff for 8 in %rem = urem i32 %a, 8
+;
+ %rem = urem i32 %a, 8
+ %rem.t = trunc i32 %rem to i8
+ ret i8 %rem.t
+}
+
+define i8 @test_urem_const_amount_9(i32 %a) {
+; CHECK-LABEL: 'test_urem_const_amount_9'
+; CHECK-DAG: DemandedBits: 0xff for %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for %rem in %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for %rem = urem i32 %a, 9
+; CHECK-DAG: DemandedBits: 0xfff for %a in %rem = urem i32 %a, 9
+; CHECK-DAG: DemandedBits: 0xffffffff for 9 in %rem = urem i32 %a, 9
+;
+ %rem = urem i32 %a, 9
+ %rem.t = trunc i32 %rem to i8
+ ret i8 %rem.t
+}
+
+define i8 @test_urem(i32 %a, i32 %b) {
+; CHECK-LABEL: 'test_urem'
+; CHECK-DAG: DemandedBits: 0xff for %rem = urem i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xffffffff for %a in %rem = urem i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xffffffff for %b in %rem = urem i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xff for %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for %rem in %rem.t = trunc i32 %rem to i8
+;
+ %rem = urem i32 %a, %b
+ %rem.t = trunc i32 %rem to i8
+ ret i8 %rem.t
+}
diff --git a/llvm/test/Analysis/DemandedBits/shl.ll b/llvm/test/Analysis/DemandedBits/shl.ll
index e41f5f4107735..c3313a93c1e85 100644
--- a/llvm/test/Analysis/DemandedBits/shl.ll
+++ b/llvm/test/Analysis/DemandedBits/shl.ll
@@ -57,10 +57,56 @@ define i8 @test_shl(i32 %a, i32 %b) {
; CHECK-DAG: DemandedBits: 0xff for %shl.t = trunc i32 %shl to i8
; CHECK-DAG: DemandedBits: 0xff for %shl in %shl.t = trunc i32 %shl to i8
; CHECK-DAG: DemandedBits: 0xff for %shl = shl i32 %a, %b
-; CHECK-DAG: DemandedBits: 0xffffffff for %a in %shl = shl i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xff for %a in %shl = shl i32 %a, %b
; CHECK-DAG: DemandedBits: 0xffffffff for %b in %shl = shl i32 %a, %b
;
%shl = shl i32 %a, %b
%shl.t = trunc i32 %shl to i8
ret i8 %shl.t
}
+
+define i8 @test_shl_var_amount(i32 %a, i32 %b){
+; CHECK-LABEL: 'test_shl_var_amount'
+; CHECK-DAG: DemandedBits: 0xff for %5 = trunc i32 %4 to i8
+; CHECK-DAG: DemandedBits: 0xff for %4 in %5 = trunc i32 %4 to i8
+; CHECK-DAG: DemandedBits: 0xff for %4 = shl i32 %1, %3
+; CHECK-DAG: DemandedBits: 0xff for %1 in %4 = shl i32 %1, %3
+; CHECK-DAG: DemandedBits: 0xffffffff for %3 in %4 = shl i32 %1, %3
+; CHECK-DAG: DemandedBits: 0xff for %2 = trunc i32 %1 to i8
+; CHECK-DAG: DemandedBits: 0xff for %1 in %2 = trunc i32 %1 to i8
+; CHECK-DAG: DemandedBits: 0xffffffff for %3 = zext i8 %2 to i32
+; CHECK-DAG: DemandedBits: 0xff for %2 in %3 = zext i8 %2 to i32
+; CHECK-DAG: DemandedBits: 0xff for %1 = add nsw i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xff for %a in %1 = add nsw i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xff for %b in %1 = add nsw i32 %a, %b
+;
+ %1 = add nsw i32 %a, %b
+ %2 = trunc i32 %1 to i8
+ %3 = zext i8 %2 to i32
+ %4 = shl i32 %1, %3
+ %5 = trunc i32 %4 to i8
+ ret i8 %5
+}
+
+define i8 @test_shl_var_amount_nsw(i32 %a, i32 %b){
+ ; CHECK-LABEL 'test_shl_var_amount_nsw'
+ ; CHECK-DAG: DemandedBits: 0xff for %5 = trunc i32 %4 to i8
+ ; CHECK-DAG: DemandedBits: 0xff for %4 in %5 = trunc i32 %4 to i8
+ ; CHECK-DAG: DemandedBits: 0xff for %4 = shl nsw i32 %1, %3
+ ; CHECK-DAG: DemandedBits: 0xffffffff for %1 in %4 = shl nsw i32 %1, %3
+ ; CHECK-DAG: DemandedBits: 0xffffffff for %3 in %4 = shl nsw i32 %1, %3
+ ; CHECK-DAG: DemandedBits: 0xffffffff for %3 = zext i8 %2 to i32
+ ; CHECK-DAG: DemandedBits: 0xff for %2 in %3 = zext i8 %2 to i32
+ ; CHECK-DAG: DemandedBits: 0xff for %2 = ...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch! Can you split the improvements, one per operation, so we can ensure that each change has sufficient test coverage? You can club common code in one of the patches (SDiv, UDiv, URem, SRem).
What do you mean split the improvements? |
You can split the patch into multiple independent patches, restricting this PR to just introduce the div/rem code. |
5fdd1f0
to
fdae5ad
Compare
llvm/lib/Analysis/DemandedBits.cpp
Outdated
AB = AOut & APInt::getLowBitsSet(BitWidth, Sh); | ||
} | ||
} else { // Non power of 2 constant div | ||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use //
comments
llvm/lib/Analysis/DemandedBits.cpp
Outdated
k = LowQ - 1; | ||
TopIndex = k + m-1 = 3 + 2 = 5; | ||
The dividend bits b5...b0 are enough we don't care for b6 and b7. | ||
The same applies to Urem/SRem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surely not! The result of x % 7
is affected by arbitrarily high-order bits of x
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the catch; somehow I missed it.
fdae5ad
to
78db179
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide the alive2 proof? See also https://llvm.org/docs/InstCombineContributorGuide.html#proofs.
You can use an extra integer parameter as the source of garbage bits:
define i32 @src(i32 %x, i32 %y, i32 noundef %z) {
%div = udiv i32 %x, %y
ret i32 %div
}
define i32 @tgt(i32 %x, i32 %y, i32 noundef %z) {
%demanded_mask = ...
%demanded_mask_inv = xor %demanded_mask, -1
%x_demanded = and i32 %x, %demanded_mask
%x_garbage = and i32 %z, %demanded_mask_inv
%x_new = or disjoint %x_demanded, %x_garbage
%div = udiv i32 %x_new, %y
ret i32 %div
}
I can provide Alive Proofs, but I am not sure on which transformations I should focus on. As DemandedBits is an Analysis. You mean on the transformation you gave above? |
@@ -36,6 +36,7 @@ | |||
#include "llvm/Support/Casting.h" | |||
#include "llvm/Support/Debug.h" | |||
#include "llvm/Support/KnownBits.h" | |||
#include "llvm/Support/MathExtras.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include "llvm/Support/MathExtras.h" |
if (OperandNo == 0) { | ||
const APInt *DivAmnt; | ||
if (match(UserI->getOperand(1), m_APInt(DivAmnt))) { | ||
uint64_t D = DivAmnt->getZExtValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uint64_t D = DivAmnt->getZExtValue(); |
Unused variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Miscompilation reproducer: https://alive2.llvm.org/ce/z/NiRcHk
; bin/opt -passes=bdce reduced.ll -S
define i8 @src(i8 %x) {
%ext = sext i8 %x to i32
%rem = srem i32 %ext, 2
%trunc = trunc i32 %rem to i8
ret i8 %trunc
}
Output:
define i8 @src(i8 %x) {
%ext1 = zext i8 %x to i32
%rem = srem i32 %ext1, 2
%trunc = trunc i32 %rem to i8
ret i8 %trunc
}
@dtcxzyw Can you share the non-reduced src? |
This patch adds support for missing operators inside the DemandedBits Analysis. Those operators are SDiv, UDiv, URem, SRem. Also, other operators such as Shl and Ashr are improved to handle non constant argument shift amount. Multiplication is also improved. Comparison with the upstream version of llvm with the Oz pipeline showed up to 10% code size reduction in the llvm test suite.