[LLVM] Improve the DemandedBits Analysis #148853

karouzakisp · 2025-07-15T14:12:21Z

This patch adds support for missing operators inside the DemandedBits Analysis. Those operators are SDiv, UDiv, URem, SRem. Also, other operators such as Shl and Ashr are improved to handle non constant argument shift amount. Multiplication is also improved. Comparison with the upstream version of llvm with the Oz pipeline showed up to 10% code size reduction in the llvm test suite.

github-actions · 2025-07-15T14:12:49Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2025-07-15T14:13:20Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: Panagiotis K (karouzakisp)

Changes

This patch adds support for missing operators inside the DemandedBits Analysis. Those operators are SDiv, UDiv, URem, SRem. Also, other operators such as Shl and Ashr are improved to handle non constant argument shift amount. Multiplication is also improved. Comparison with the upstream version of llvm with the Oz pipeline showed up to 10% code size reduction in the llvm test suite.

Patch is 27.61 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/148853.diff

6 Files Affected:

(modified) llvm/lib/Analysis/DemandedBits.cpp (+93-4)
(modified) llvm/test/Analysis/DemandedBits/basic.ll (+25)
(added) llvm/test/Analysis/DemandedBits/div_rem.ll (+261)
(modified) llvm/test/Analysis/DemandedBits/shl.ll (+47-1)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/induction-costs-sve.ll (+19-19)
(modified) llvm/test/Transforms/LoopVectorize/scalable-trunc-min-bitwidth.ll (+3-3)

diff --git a/llvm/lib/Analysis/DemandedBits.cpp b/llvm/lib/Analysis/DemandedBits.cpp
index 6694d5cc06c8c..1fa94e95cbceb 100644
--- a/llvm/lib/Analysis/DemandedBits.cpp
+++ b/llvm/lib/Analysis/DemandedBits.cpp
@@ -36,6 +36,7 @@
 #include "llvm/Support/Casting.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/KnownBits.h"
+#include "llvm/Support/MathExtras.h"
 #include "llvm/Support/raw_ostream.h"
 #include <algorithm>
 #include <cstdint>
@@ -164,10 +165,24 @@ void DemandedBits::determineLiveOperandBits(
     }
     break;
   case Instruction::Mul:
-    // Find the highest live output bit. We don't need any more input
-    // bits than that (adds, and thus subtracts, ripple only to the
-    // left).
-    AB = APInt::getLowBitsSet(BitWidth, AOut.getActiveBits());
+    const APInt *C;
+    if (OperandNo == 0) {
+      // to have output bits 0...H-1 we need the input bits
+      // 0...(H - ceiling(log_2))
+      if (match(UserI->getOperand(1), m_APInt(C))) {
+        auto LogC = C->isOne() ? 0 : C->logBase2() + 1;
+        unsigned Need =
+            AOut.getActiveBits() > LogC ? AOut.getActiveBits() - LogC : 0;
+        AB = APInt::getLowBitsSet(BitWidth, Need);
+      } else { // TODO: we can possibly check for Op0 constant too
+        AB = APInt::getLowBitsSet(BitWidth, AOut.getActiveBits());
+      }
+    } else {
+      // Find the highest live output bit. We don't need any more input
+      // bits than that (adds, and thus subtracts, ripple only to the
+      // left).
+      AB = APInt::getLowBitsSet(BitWidth, AOut.getActiveBits());
+    }
     break;
   case Instruction::Shl:
     if (OperandNo == 0) {
@@ -183,6 +198,17 @@ void DemandedBits::determineLiveOperandBits(
           AB |= APInt::getHighBitsSet(BitWidth, ShiftAmt+1);
         else if (S->hasNoUnsignedWrap())
           AB |= APInt::getHighBitsSet(BitWidth, ShiftAmt);
+      } else {
+        ComputeKnownBits(BitWidth, UserI->getOperand(1), nullptr);
+        unsigned Min = Known.getMinValue().getLimitedValue(BitWidth - 1);
+        unsigned Max = Known.getMaxValue().getLimitedValue(BitWidth - 1);
+        // similar to Lshr case
+        AB = (AOut.lshr(Min) | AOut.lshr(Max));
+        const auto *S = cast<ShlOperator>(UserI);
+        if (S->hasNoSignedWrap())
+          AB |= APInt::getHighBitsSet(BitWidth, Max + 1);
+        else if (S->hasNoUnsignedWrap())
+          AB |= APInt::getHighBitsSet(BitWidth, Max);
       }
     }
     break;
@@ -197,6 +223,19 @@ void DemandedBits::determineLiveOperandBits(
         // (they must be zero).
         if (cast<LShrOperator>(UserI)->isExact())
           AB |= APInt::getLowBitsSet(BitWidth, ShiftAmt);
+      } else {
+        ComputeKnownBits(BitWidth, UserI->getOperand(1), nullptr);
+        unsigned Min = Known.getMinValue().getLimitedValue(BitWidth - 1);
+        unsigned Max = Known.getMaxValue().getLimitedValue(BitWidth - 1);
+        // Suppose AOut == 0b0000 0011
+        // [min, max] = [1, 3]
+        // shift by 1 we get 0b0000 0110
+        // shift by 2 we get 0b0000 1100
+        // shift by 3 we get 0b0001 1000
+        // we take the or here because need to cover all the above possibilities
+        AB = (AOut.shl(Min) | AOut.shl(Max));
+        if (cast<LShrOperator>(UserI)->isExact())
+          AB |= APInt::getLowBitsSet(BitWidth, Max);
       }
     }
     break;
@@ -217,6 +256,27 @@ void DemandedBits::determineLiveOperandBits(
         // (they must be zero).
         if (cast<AShrOperator>(UserI)->isExact())
           AB |= APInt::getLowBitsSet(BitWidth, ShiftAmt);
+      } else {
+        ComputeKnownBits(BitWidth, UserI->getOperand(1), nullptr);
+        unsigned Min = Known.getMinValue().getLimitedValue(BitWidth - 1);
+        unsigned Max = Known.getMaxValue().getLimitedValue(BitWidth - 1);
+        AB = (AOut.shl(Min) | AOut.shl(Max));
+
+        if (Max) {
+          // Suppose AOut = 0011 1100
+          // [min, max] = [1, 3]
+          // ShiftAmount = 1 : Mask is 1000 0000
+          // ShiftAmount = 2 : Mask is 1100 0000
+          // ShiftAmount = 3 : Mask is 1110 0000
+          // The Mask with Max covers every case in [min, max],
+          // so we are done
+          if ((AOut & APInt::getHighBitsSet(BitWidth, Max)).getBoolValue())
+            AB.setSignBit();
+        }
+        // If the shift is exact, then the low bits are not dead
+        // (they must be zero).
+        if (cast<AShrOperator>(UserI)->isExact())
+          AB |= APInt::getLowBitsSet(BitWidth, Max);
       }
     }
     break;
@@ -246,6 +306,35 @@ void DemandedBits::determineLiveOperandBits(
     else
       AB &= ~(Known.One & ~Known2.One);
     break;
+  case Instruction::UDiv:
+  case Instruction::URem:
+  case Instruction::SDiv:
+  case Instruction::SRem: {
+    auto Opc = UserI->getOpcode();
+    auto IsDiv = Opc == Instruction::UDiv || Opc == Instruction::SDiv;
+    bool IsSigned = Opc == Instruction::SDiv || Opc == Instruction::SRem;
+    if (OperandNo == 0) {
+      const APInt *DivAmnt;
+      if (match(UserI->getOperand(1), m_APInt(DivAmnt))) {
+        uint64_t D = DivAmnt->getZExtValue();
+        if (isPowerOf2_64(D)) {
+          unsigned Sh = Log2_64(D);
+          if (IsDiv) {
+            AB = AOut.shl(Sh);
+          } else {
+            AB = AOut & APInt::getLowBitsSet(BitWidth, Sh);
+          }
+        } else { // Non power of 2 constant div
+          unsigned LowQ = AOut.getActiveBits();
+          unsigned Need = LowQ + Log2_64_Ceil(D);
+          if (IsSigned)
+            Need++;
+          AB = APInt::getLowBitsSet(BitWidth, std::min(BitWidth, Need));
+        }
+      }
+    }
+    break;
+  }
   case Instruction::Xor:
   case Instruction::PHI:
     AB = AOut;
diff --git a/llvm/test/Analysis/DemandedBits/basic.ll b/llvm/test/Analysis/DemandedBits/basic.ll
index 4dc59c5392935..62eba9eaa81c5 100644
--- a/llvm/test/Analysis/DemandedBits/basic.ll
+++ b/llvm/test/Analysis/DemandedBits/basic.ll
@@ -25,3 +25,28 @@ define i8 @test_mul(i32 %a, i32 %b) {
   %6 = add nsw i8 %3, %5
   ret i8 %6
 }
+; CHECK-LABEL: Printing analysis 'Demanded Bits Analysis' for function 'test_mul_constant':
+; CHECK-DAG: DemandedBits: 0xff for   %3 = trunc i32 %2 to i8
+; CHECK-DAG: DemandedBits: 0xff for %2 in   %3 = trunc i32 %2 to i8
+; CHECK-DAG: DemandedBits: 0xff for   %2 = mul nsw i32 %1, 6
+; CHECK-DAG: DemandedBits: 0x1f for %1 in   %2 = mul nsw i32 %1, 6
+; CHECK-DAG: DemandedBits: 0xff for 6 in   %2 = mul nsw i32 %1, 6
+; CHECK-DAG: DemandedBits: 0x1 for   %4 = trunc i32 %2 to i1
+; CHECK-DAG: DemandedBits: 0x1 for %2 in   %4 = trunc i32 %2 to i1
+; CHECK-DAG: DemandedBits: 0x1f for   %1 = add nsw i32 %a, 12
+; CHECK-DAG: DemandedBits: 0x1f for %a in   %1 = add nsw i32 %a, 12
+; CHECK-DAG: DemandedBits: 0x1f for 12 in   %1 = add nsw i32 %a, 12
+; CHECK-DAG: DemandedBits: 0xff for   %5 = zext i1 %4 to i8
+; CHECK-DAG: DemandedBits: 0x1 for %4 in   %5 = zext i1 %4 to i8
+; CHECK-DAG: DemandedBits: 0xff for   %6 = add nsw i8 %3, %5
+; CHECK-DAG: DemandedBits: 0xff for %3 in   %6 = add nsw i8 %3, %5
+; CHECK-DAG: DemandedBits: 0xff for %5 in   %6 = add nsw i8 %3, %5
+define i8 @test_mul_constant(i32 %a, i32 %b){
+  %1 = add nsw i32 %a, 12
+  %2 = mul nsw i32 %1, 6
+  %3 = trunc i32 %2 to i8
+  %4 = trunc i32 %2 to i1
+  %5 = zext i1 %4 to i8
+  %6 = add nsw i8 %3, %5
+  ret i8 %6
+}
diff --git a/llvm/test/Analysis/DemandedBits/div_rem.ll b/llvm/test/Analysis/DemandedBits/div_rem.ll
new file mode 100644
index 0000000000000..818cba17dc1a6
--- /dev/null
+++ b/llvm/test/Analysis/DemandedBits/div_rem.ll
@@ -0,0 +1,261 @@
+; RUN: opt -S -disable-output -passes="print<demanded-bits>" < %s 2>&1 | FileCheck %s
+
+define i8 @test_sdiv_const_amount_4(i32 %a) {
+; CHECK-LABEL: 'test_sdiv_const_amount_4'
+; CHECK-DAG: DemandedBits: 0xff for   %div.t = trunc i32 %div to i8
+; CHECK-DAG: DemandedBits: 0xff for %div in   %div.t = trunc i32 %div to i8
+; CHECK-DAG: DemandedBits: 0xff for   %div = sdiv i32 %a, 4
+; CHECK-DAG: DemandedBits: 0x3fc for %a in   %div = sdiv i32 %a, 4
+; CHECK-DAG: DemandedBits: 0xffffffff for 4 in   %div = sdiv i32 %a, 4
+;
+  %div = sdiv i32 %a, 4
+  %div.t = trunc i32 %div to i8
+  ret i8 %div.t
+}
+
+define i8 @test_sdiv_const_amount_5(i32 %a) {
+; CHECK-LABEL: 'test_sdiv_const_amount_5'
+; CHECK-DAG: DemandedBits: 0xff for   %div = sdiv i32 %a, 5
+; CHECK-DAG: DemandedBits: 0xfff for %a in   %div = sdiv i32 %a, 5
+; CHECK-DAG: DemandedBits: 0xffffffff for 5 in   %div = sdiv i32 %a, 5
+; CHECK-DAG: DemandedBits: 0xff for   %div.t = trunc i32 %div to i8
+; CHECK-DAG: DemandedBits: 0xff for %div in   %div.t = trunc i32 %div to i8
+;
+  %div = sdiv i32 %a, 5
+  %div.t = trunc i32 %div to i8
+  ret i8 %div.t
+}
+
+define i8 @test_sdiv_const_amount_8(i32 %a) {
+; CHECK-LABEL: 'test_sdiv_const_amount_8'
+; CHECK-DAG: DemandedBits: 0xff for   %div = sdiv i32 %a, 8
+; CHECK-DAG: DemandedBits: 0x7f8 for %a in   %div = sdiv i32 %a, 8
+; CHECK-DAG: DemandedBits: 0xffffffff for 8 in   %div = sdiv i32 %a, 8
+; CHECK-DAG: DemandedBits: 0xff for   %div.t = trunc i32 %div to i8
+; CHECK-DAG: DemandedBits: 0xff for %div in   %div.t = trunc i32 %div to i8
+;
+  %div = sdiv i32 %a, 8
+  %div.t = trunc i32 %div to i8
+  ret i8 %div.t
+}
+
+define i8 @test_sdiv_const_amount_9(i32 %a) {
+; CHECK-LABEL: 'test_sdiv_const_amount_9'
+; CHECK-DAG: DemandedBits: 0xff for   %div = udiv i32 %a, 9
+; CHECK-DAG: DemandedBits: 0xfff for %a in   %div = udiv i32 %a, 9
+; CHECK-DAG: DemandedBits: 0xffffffff for 9 in   %div = udiv i32 %a, 9
+; CHECK-DAG: DemandedBits: 0xff for   %div.t = trunc i32 %div to i8
+; CHECK-DAG: DemandedBits: 0xff for %div in   %div.t = trunc i32 %div to i8
+;
+  %div = udiv i32 %a, 9
+  %div.t = trunc i32 %div to i8
+  ret i8 %div.t
+}
+
+define i8 @test_sdiv(i32 %a, i32 %b) {
+; CHECK-LABEL: 'test_sdiv'
+; CHECK-DAG: DemandedBits: 0xff for   %div = sdiv i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xffffffff for %a in   %div = sdiv i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xffffffff for %b in   %div = sdiv i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xff for   %div.t = trunc i32 %div to i8
+; CHECK-DAG: DemandedBits: 0xff for %div in   %div.t = trunc i32 %div to i8
+;
+  %div = sdiv i32 %a, %b
+  %div.t = trunc i32 %div to i8
+  ret i8 %div.t
+}
+
+define i8 @test_udiv_const_amount_4(i32 %a) {
+; CHECK-LABEL: 'test_udiv_const_amount_4'
+; CHECK-DAG: DemandedBits: 0xff for   %div = udiv i32 %a, 4
+; CHECK-DAG: DemandedBits: 0x3fc for %a in   %div = udiv i32 %a, 4
+; CHECK-DAG: DemandedBits: 0xffffffff for 4 in   %div = udiv i32 %a, 4
+; CHECK-DAG: DemandedBits: 0xff for   %div.t = trunc i32 %div to i8
+; CHECK-DAG: DemandedBits: 0xff for %div in   %div.t = trunc i32 %div to i8
+;
+  %div = udiv i32 %a, 4
+  %div.t = trunc i32 %div to i8
+  ret i8 %div.t
+}
+
+define i8 @test_udiv_const_amount_5(i32 %a) {
+; CHECK-LABEL: 'test_udiv_const_amount_5'
+; CHECK-DAG: DemandedBits: 0xff for   %div.t = trunc i32 %div to i8
+; CHECK-DAG: DemandedBits: 0xff for %div in   %div.t = trunc i32 %div to i8
+; CHECK-DAG: DemandedBits: 0xff for   %div = udiv i32 %a, 5
+; CHECK-DAG: DemandedBits: 0x7ff for %a in   %div = udiv i32 %a, 5
+; CHECK-DAG: DemandedBits: 0xffffffff for 5 in   %div = udiv i32 %a, 5
+;
+  %div = udiv i32 %a, 5
+  %div.t = trunc i32 %div to i8
+  ret i8 %div.t
+}
+
+define i8 @test_udiv_const_amount_8(i32 %a) {
+; CHECK-LABEL: 'test_udiv_const_amount_8'
+; CHECK-DAG: DemandedBits: 0xff for   %div = udiv i32 %a, 8
+; CHECK-DAG: DemandedBits: 0x7f8 for %a in   %div = udiv i32 %a, 8
+; CHECK-DAG: DemandedBits: 0xffffffff for 8 in   %div = udiv i32 %a, 8
+; CHECK-DAG: DemandedBits: 0xff for   %div.t = trunc i32 %div to i8
+; CHECK-DAG: DemandedBits: 0xff for %div in   %div.t = trunc i32 %div to i8
+;
+  %div = udiv i32 %a, 8
+  %div.t = trunc i32 %div to i8
+  ret i8 %div.t
+}
+
+define i8 @test_udiv_const_amount_9(i32 %a) {
+; CHECK-LABEL: 'test_udiv_const_amount_9'
+; CHECK-DAG: DemandedBits: 0xff for   %div.t = trunc i32 %div to i8
+; CHECK-DAG: DemandedBits: 0xff for %div in   %div.t = trunc i32 %div to i8
+; CHECK-DAG: DemandedBits: 0xff for   %div = udiv i32 %a, 9
+; CHECK-DAG: DemandedBits: 0xfff for %a in   %div = udiv i32 %a, 9
+; CHECK-DAG: DemandedBits: 0xffffffff for 9 in   %div = udiv i32 %a, 9
+;
+  %div = udiv i32 %a, 9
+  %div.t = trunc i32 %div to i8
+  ret i8 %div.t
+}
+
+define i8 @test_udiv(i32 %a, i32 %b) {
+; CHECK-LABEL: 'test_udiv'
+; CHECK-DAG: DemandedBits: 0xff for   %div = udiv i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xffffffff for %a in   %div = udiv i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xffffffff for %b in   %div = udiv i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xff for   %div.t = trunc i32 %div to i8
+; CHECK-DAG: DemandedBits: 0xff for %div in   %div.t = trunc i32 %div to i8
+;
+  %div = udiv i32 %a, %b
+  %div.t = trunc i32 %div to i8
+  ret i8 %div.t
+}
+
+define i8 @test_srem_const_amount_4(i32 %a) {
+; CHECK-LABEL: 'test_srem_const_amount_4'
+; CHECK-DAG:  DemandedBits: 0xff for   %rem = srem i32 %a, 4
+; CHECK-DAG:  DemandedBits: 0x3 for %a in   %rem = srem i32 %a, 4
+; CHECK-DAG:  DemandedBits: 0xffffffff for 4 in   %rem = srem i32 %a, 4
+; CHECK-DAG:  DemandedBits: 0xff for   %rem.t = trunc i32 %rem to i8
+; CHECK-DAG:  DemandedBits: 0xff for %rem in   %rem.t = trunc i32 %rem to i8
+;
+  %rem = srem i32 %a, 4
+  %rem.t = trunc i32 %rem to i8
+  ret i8 %rem.t
+}
+
+define i8 @test_srem_const_amount_5(i32 %a) {
+; CHECK-LABEL: 'test_srem_const_amount_5'
+; CHECK-DAG: DemandedBits: 0xff for   %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for %rem in   %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for   %rem = srem i32 %a, 5
+; CHECK-DAG: DemandedBits: 0xfff for %a in   %rem = srem i32 %a, 5
+; CHECK-DAG: DemandedBits: 0xffffffff for 5 in   %rem = srem i32 %a, 5
+;
+  %rem = srem i32 %a, 5
+  %rem.t = trunc i32 %rem to i8
+  ret i8 %rem.t
+}
+
+define i8 @test_srem_const_amount_8(i32 %a) {
+; CHECK-LABEL: 'test_srem_const_amount_8'
+; CHECK-DAG: DemandedBits: 0xff for   %rem = srem i32 %a, 8
+; CHECK-DAG: DemandedBits: 0x7 for %a in   %rem = srem i32 %a, 8
+; CHECK-DAG: DemandedBits: 0xffffffff for 8 in   %rem = srem i32 %a, 8
+; CHECK-DAG: DemandedBits: 0xff for   %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for %rem in   %rem.t = trunc i32 %rem to i8
+;
+  %rem = srem i32 %a, 8
+  %rem.t = trunc i32 %rem to i8
+  ret i8 %rem.t
+}
+
+define i8 @test_srem_const_amount_9(i32 %a) {
+; CHECK-LABEL: 'test_srem_const_amount_9'
+; CHECK-DAG: DemandedBits: 0xff for   %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for %rem in   %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for   %rem = srem i32 %a, 9
+; CHECK-DAG: DemandedBits: 0x1fff for %a in   %rem = srem i32 %a, 9
+; CHECK-DAG: DemandedBits: 0xffffffff for 9 in   %rem = srem i32 %a, 9
+;
+  %rem = srem i32 %a, 9
+  %rem.t = trunc i32 %rem to i8
+  ret i8 %rem.t
+}
+
+define i8 @test_srem(i32 %a, i32 %b) {
+; CHECK-LABEL: 'test_srem'
+; CHECK-DAG: DemandedBits: 0xff for   %rem = srem i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xffffffff for %a in   %rem = srem i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xffffffff for %b in   %rem = srem i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xff for   %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for %rem in   %rem.t = trunc i32 %rem to i8
+;
+  %rem = srem i32 %a, %b
+  %rem.t = trunc i32 %rem to i8
+  ret i8 %rem.t
+}
+
+define i8 @test_urem_const_amount_4(i32 %a) {
+; CHECK-LABEL: 'test_urem_const_amount_4'
+; CHECK-DAG: DemandedBits: 0xff for   %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for %rem in   %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for   %rem = urem i32 %a, 4
+; CHECK-DAG: DemandedBits: 0x3 for %a in   %rem = urem i32 %a, 4
+; CHECK-DAG: DemandedBits: 0xffffffff for 4 in   %rem = urem i32 %a, 4
+;
+  %rem = urem i32 %a, 4
+  %rem.t = trunc i32 %rem to i8
+  ret i8 %rem.t
+}
+
+define i8 @test_urem_const_amount_5(i32 %a) {
+; CHECK-LABEL: 'test_urem_const_amount_5'
+; CHECK-DAG: DemandedBits: 0xff for   %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for %rem in   %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for   %rem = urem i32 %a, 5
+; CHECK-DAG: DemandedBits: 0x7ff for %a in   %rem = urem i32 %a, 5
+; CHECK-DAG: DemandedBits: 0xffffffff for 5 in   %rem = urem i32 %a, 5
+;
+  %rem = urem i32 %a, 5
+  %rem.t = trunc i32 %rem to i8
+  ret i8 %rem.t
+}
+
+define i8 @test_urem_const_amount_8(i32 %a) {
+; CHECK-LABEL: 'test_urem_const_amount_8'
+; CHECK-DAG: DemandedBits: 0xff for   %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for %rem in   %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for   %rem = urem i32 %a, 8
+; CHECK-DAG: DemandedBits: 0x7 for %a in   %rem = urem i32 %a, 8
+; CHECK-DAG: DemandedBits: 0xffffffff for 8 in   %rem = urem i32 %a, 8
+;
+  %rem = urem i32 %a, 8
+  %rem.t = trunc i32 %rem to i8
+  ret i8 %rem.t
+}
+
+define i8 @test_urem_const_amount_9(i32 %a) {
+; CHECK-LABEL: 'test_urem_const_amount_9'
+; CHECK-DAG: DemandedBits: 0xff for   %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for %rem in   %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for   %rem = urem i32 %a, 9
+; CHECK-DAG: DemandedBits: 0xfff for %a in   %rem = urem i32 %a, 9
+; CHECK-DAG: DemandedBits: 0xffffffff for 9 in   %rem = urem i32 %a, 9
+;
+  %rem = urem i32 %a, 9
+  %rem.t = trunc i32 %rem to i8
+  ret i8 %rem.t
+}
+
+define i8 @test_urem(i32 %a, i32 %b) {
+; CHECK-LABEL: 'test_urem'
+; CHECK-DAG: DemandedBits: 0xff for   %rem = urem i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xffffffff for %a in   %rem = urem i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xffffffff for %b in   %rem = urem i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xff for   %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for %rem in   %rem.t = trunc i32 %rem to i8
+;
+  %rem = urem i32 %a, %b
+  %rem.t = trunc i32 %rem to i8
+  ret i8 %rem.t
+}
diff --git a/llvm/test/Analysis/DemandedBits/shl.ll b/llvm/test/Analysis/DemandedBits/shl.ll
index e41f5f4107735..c3313a93c1e85 100644
--- a/llvm/test/Analysis/DemandedBits/shl.ll
+++ b/llvm/test/Analysis/DemandedBits/shl.ll
@@ -57,10 +57,56 @@ define i8 @test_shl(i32 %a, i32 %b) {
 ; CHECK-DAG:  DemandedBits: 0xff for %shl.t = trunc i32 %shl to i8
 ; CHECK-DAG:  DemandedBits: 0xff for %shl in %shl.t = trunc i32 %shl to i8
 ; CHECK-DAG:  DemandedBits: 0xff for %shl = shl i32 %a, %b
-; CHECK-DAG:  DemandedBits: 0xffffffff for %a in %shl = shl i32 %a, %b
+; CHECK-DAG:  DemandedBits: 0xff for %a in %shl = shl i32 %a, %b
 ; CHECK-DAG:  DemandedBits: 0xffffffff for %b in %shl = shl i32 %a, %b
 ;
   %shl = shl i32 %a, %b
   %shl.t = trunc i32 %shl to i8
   ret i8 %shl.t
 }
+
+define i8 @test_shl_var_amount(i32 %a, i32 %b){
+; CHECK-LABEL: 'test_shl_var_amount'
+; CHECK-DAG: DemandedBits: 0xff for   %5 = trunc i32 %4 to i8
+; CHECK-DAG: DemandedBits: 0xff for %4 in   %5 = trunc i32 %4 to i8
+; CHECK-DAG: DemandedBits: 0xff for   %4 = shl i32 %1, %3
+; CHECK-DAG: DemandedBits: 0xff for %1 in   %4 = shl i32 %1, %3
+; CHECK-DAG: DemandedBits: 0xffffffff for %3 in   %4 = shl i32 %1, %3
+; CHECK-DAG: DemandedBits: 0xff for   %2 = trunc i32 %1 to i8
+; CHECK-DAG: DemandedBits: 0xff for %1 in   %2 = trunc i32 %1 to i8
+; CHECK-DAG: DemandedBits: 0xffffffff for   %3 = zext i8 %2 to i32
+; CHECK-DAG: DemandedBits: 0xff for %2 in   %3 = zext i8 %2 to i32
+; CHECK-DAG: DemandedBits: 0xff for   %1 = add nsw i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xff for %a in   %1 = add nsw i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xff for %b in   %1 = add nsw i32 %a, %b
+;
+  %1 = add nsw i32 %a, %b
+  %2 = trunc i32 %1 to i8
+  %3 = zext i8 %2 to i32
+  %4 = shl i32 %1, %3
+  %5 = trunc i32 %4 to i8
+  ret i8 %5
+}
+
+define i8 @test_shl_var_amount_nsw(i32 %a, i32 %b){
+ ; CHECK-LABEL 'test_shl_var_amount_nsw'
+ ; CHECK-DAG: DemandedBits: 0xff for   %5 = trunc i32 %4 to i8
+ ; CHECK-DAG: DemandedBits: 0xff for %4 in   %5 = trunc i32 %4 to i8
+ ; CHECK-DAG: DemandedBits: 0xff for   %4 = shl nsw i32 %1, %3
+ ; CHECK-DAG: DemandedBits: 0xffffffff for %1 in   %4 = shl nsw i32 %1, %3
+ ; CHECK-DAG: DemandedBits: 0xffffffff for %3 in   %4 = shl nsw i32 %1, %3
+ ; CHECK-DAG: DemandedBits: 0xffffffff for   %3 = zext i8 %2 to i32
+ ; CHECK-DAG: DemandedBits: 0xff for %2 in   %3 = zext i8 %2 to i32
+ ; CHECK-DAG: DemandedBits: 0xff for   %2 = ...
[truncated]

karouzakisp · 2025-07-15T14:14:00Z

@nikic @topperc @artagnon Could you please review? Thanks

artagnon

Thanks for the patch! Can you split the improvements, one per operation, so we can ensure that each change has sufficient test coverage? You can club common code in one of the patches (SDiv, UDiv, URem, SRem).

karouzakisp · 2025-07-15T14:21:15Z

Thanks for the patch! Can you split the improvements, one per operation, so we can ensure that each change has sufficient test coverage?

What do you mean split the improvements?

artagnon · 2025-07-15T14:23:12Z

Thanks for the patch! Can you split the improvements, one per operation, so we can ensure that each change has sufficient test coverage?

What do you mean split the improvements?

You can split the patch into multiple independent patches, restricting this PR to just introduce the div/rem code.

jayfoad · 2025-07-15T15:24:38Z

llvm/lib/Analysis/DemandedBits.cpp

+            AB = AOut & APInt::getLowBitsSet(BitWidth, Sh);
+          }
+        } else { // Non power of 2 constant div
+          /*


Use // comments

jayfoad · 2025-07-15T15:26:03Z

llvm/lib/Analysis/DemandedBits.cpp

+            k = LowQ - 1;
+            TopIndex = k + m-1 = 3 + 2 = 5;
+            The dividend bits b5...b0 are enough we don't care for b6 and b7.
+            The same applies to Urem/SRem


Surely not! The result of x % 7 is affected by arbitrarily high-order bits of x.

Thanks for the catch; somehow I missed it.

dtcxzyw

Can you provide the alive2 proof? See also https://llvm.org/docs/InstCombineContributorGuide.html#proofs.
You can use an extra integer parameter as the source of garbage bits:

define i32 @src(i32 %x, i32 %y, i32 noundef %z) {
  %div = udiv i32 %x, %y
  ret i32 %div
}
define i32 @tgt(i32 %x, i32 %y, i32 noundef %z) {
  %demanded_mask = ...
  %demanded_mask_inv = xor %demanded_mask, -1
  %x_demanded = and i32 %x, %demanded_mask
  %x_garbage = and i32 %z, %demanded_mask_inv
  %x_new = or disjoint %x_demanded, %x_garbage
  %div = udiv i32 %x_new, %y
  ret i32 %div
}

llvm/lib/Analysis/DemandedBits.cpp

karouzakisp · 2025-07-15T17:45:42Z

Can you provide the alive2 proof? See also https://llvm.org/docs/InstCombineContributorGuide.html#proofs. You can use an extra integer parameter as the source of garbage bits:
define i32 @src(i32 %x, i32 %y, i32 noundef %z) {
  %div = udiv i32 %x, %y
  ret i32 %div
}
define i32 @tgt(i32 %x, i32 %y, i32 noundef %z) {
  %demanded_mask = ...
  %demanded_mask_inv = xor %demanded_mask, -1
  %x_demanded = and i32 %x, %demanded_mask
  %x_garbage = and i32 %z, %demanded_mask_inv
  %x_new = or disjoint %x_demanded, %x_garbage
  %div = udiv i32 %x_new, %y
  ret i32 %div
}

I can provide Alive Proofs, but I am not sure on which transformations I should focus on. As DemandedBits is an Analysis. You mean on the transformation you gave above?

…rems

dtcxzyw · 2025-07-16T00:41:38Z

llvm/lib/Analysis/DemandedBits.cpp

@@ -36,6 +36,7 @@
 #include "llvm/Support/Casting.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/KnownBits.h"
+#include "llvm/Support/MathExtras.h"


Suggested change

#include "llvm/Support/MathExtras.h"

dtcxzyw · 2025-07-16T00:41:51Z

llvm/lib/Analysis/DemandedBits.cpp

+    if (OperandNo == 0) {
+      const APInt *DivAmnt;
+      if (match(UserI->getOperand(1), m_APInt(DivAmnt))) {
+        uint64_t D = DivAmnt->getZExtValue();


Suggested change

uint64_t D = DivAmnt->getZExtValue();

Unused variable.

dtcxzyw

Miscompilation reproducer: https://alive2.llvm.org/ce/z/NiRcHk

; bin/opt -passes=bdce reduced.ll -S
define i8 @src(i8 %x) {
  %ext = sext i8 %x to i32
  %rem = srem i32 %ext, 2
  %trunc = trunc i32 %rem to i8
  ret i8 %trunc
}

Output:

define i8 @src(i8 %x) {
  %ext1 = zext i8 %x to i32
  %rem = srem i32 %ext1, 2
  %trunc = trunc i32 %rem to i8
  ret i8 %trunc
}

karouzakisp · 2025-07-16T03:14:05Z

Miscompilation reproducer: https://alive2.llvm.org/ce/z/NiRcHk

; bin/opt -passes=bdce reduced.ll -S
define i8 @src(i8 %x) {
  %ext = sext i8 %x to i32
  %rem = srem i32 %ext, 2
  %trunc = trunc i32 %rem to i8
  ret i8 %trunc
}

Output:

define i8 @src(i8 %x) {
  %ext1 = zext i8 %x to i32
  %rem = srem i32 %ext1, 2
  %trunc = trunc i32 %rem to i8
  ret i8 %trunc
}

@dtcxzyw Can you share the non-reduced src?
I ran the transformation locally, and it doesn't get fired up, ie,
The sext doesn't get converted to zext.
So, I don't observe any errors.

llvmbot added llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Jul 15, 2025

artagnon requested review from artagnon, nikic and dtcxzyw July 15, 2025 14:17

artagnon reviewed Jul 15, 2025

View reviewed changes

artagnon requested a review from jayfoad July 15, 2025 14:21

karouzakisp force-pushed the demanded-bits branch from 5fdd1f0 to fdae5ad Compare July 15, 2025 15:04

jayfoad reviewed Jul 15, 2025

View reviewed changes

[LLVM] DemandedBits: Propagate demanded bits through div/rem ops

78db179

karouzakisp force-pushed the demanded-bits branch from fdae5ad to 78db179 Compare July 15, 2025 16:01

karouzakisp mentioned this pull request Jul 15, 2025

[LLVM] Improve the shift operators of the DemandedBits Analysis #148880

Open

dtcxzyw mentioned this pull request Jul 15, 2025

Fuzz PR148853 dtcxzyw/llvm-fuzz-service#102

Open

dtcxzyw reviewed Jul 15, 2025

View reviewed changes

llvm/lib/Analysis/DemandedBits.cpp Outdated Show resolved Hide resolved

llvm/lib/Analysis/DemandedBits.cpp Outdated Show resolved Hide resolved

llvm/lib/Analysis/DemandedBits.cpp Outdated Show resolved Hide resolved

[LLVM] used APInt's API to be safe with from larger than 64 bits div/…

6d9b271

…rems

dtcxzyw reviewed Jul 16, 2025

View reviewed changes

dtcxzyw requested changes Jul 16, 2025

View reviewed changes

[LLVM] Improve the DemandedBits Analysis #148853

Are you sure you want to change the base?

[LLVM] Improve the DemandedBits Analysis #148853

Conversation

karouzakisp commented Jul 15, 2025

Uh oh!

github-actions bot commented Jul 15, 2025

Uh oh!

llvmbot commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

karouzakisp commented Jul 15, 2025

Uh oh!

artagnon left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karouzakisp commented Jul 15, 2025

Uh oh!

artagnon commented Jul 15, 2025

Uh oh!

jayfoad Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

jayfoad Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

karouzakisp Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

dtcxzyw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

karouzakisp commented Jul 15, 2025

Uh oh!

dtcxzyw Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

dtcxzyw Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

dtcxzyw left a comment

Choose a reason for hiding this comment

Uh oh!

karouzakisp commented Jul 16, 2025

Uh oh!

Uh oh!

llvmbot commented Jul 15, 2025 •

edited

Loading

artagnon left a comment •

edited

Loading