Skip to content

Conversation

@karouzakisp
Copy link
Contributor

@karouzakisp karouzakisp commented Jul 15, 2025

This patch adds support for the operator 'srem' inside the DemandedBits Analysis. For other operators, it is way more complex.

srem Alive proof --> https://alive2.llvm.org/ce/z/aPAoYs

@github-actions
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Jul 15, 2025
@llvmbot
Copy link
Member

llvmbot commented Jul 15, 2025

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: Panagiotis K (karouzakisp)

Changes

This patch adds support for missing operators inside the DemandedBits Analysis. Those operators are SDiv, UDiv, URem, SRem. Also, other operators such as Shl and Ashr are improved to handle non constant argument shift amount. Multiplication is also improved. Comparison with the upstream version of llvm with the Oz pipeline showed up to 10% code size reduction in the llvm test suite.


Patch is 27.61 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/148853.diff

6 Files Affected:

  • (modified) llvm/lib/Analysis/DemandedBits.cpp (+93-4)
  • (modified) llvm/test/Analysis/DemandedBits/basic.ll (+25)
  • (added) llvm/test/Analysis/DemandedBits/div_rem.ll (+261)
  • (modified) llvm/test/Analysis/DemandedBits/shl.ll (+47-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/induction-costs-sve.ll (+19-19)
  • (modified) llvm/test/Transforms/LoopVectorize/scalable-trunc-min-bitwidth.ll (+3-3)
diff --git a/llvm/lib/Analysis/DemandedBits.cpp b/llvm/lib/Analysis/DemandedBits.cpp
index 6694d5cc06c8c..1fa94e95cbceb 100644
--- a/llvm/lib/Analysis/DemandedBits.cpp
+++ b/llvm/lib/Analysis/DemandedBits.cpp
@@ -36,6 +36,7 @@
 #include "llvm/Support/Casting.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/KnownBits.h"
+#include "llvm/Support/MathExtras.h"
 #include "llvm/Support/raw_ostream.h"
 #include <algorithm>
 #include <cstdint>
@@ -164,10 +165,24 @@ void DemandedBits::determineLiveOperandBits(
     }
     break;
   case Instruction::Mul:
-    // Find the highest live output bit. We don't need any more input
-    // bits than that (adds, and thus subtracts, ripple only to the
-    // left).
-    AB = APInt::getLowBitsSet(BitWidth, AOut.getActiveBits());
+    const APInt *C;
+    if (OperandNo == 0) {
+      // to have output bits 0...H-1 we need the input bits
+      // 0...(H - ceiling(log_2))
+      if (match(UserI->getOperand(1), m_APInt(C))) {
+        auto LogC = C->isOne() ? 0 : C->logBase2() + 1;
+        unsigned Need =
+            AOut.getActiveBits() > LogC ? AOut.getActiveBits() - LogC : 0;
+        AB = APInt::getLowBitsSet(BitWidth, Need);
+      } else { // TODO: we can possibly check for Op0 constant too
+        AB = APInt::getLowBitsSet(BitWidth, AOut.getActiveBits());
+      }
+    } else {
+      // Find the highest live output bit. We don't need any more input
+      // bits than that (adds, and thus subtracts, ripple only to the
+      // left).
+      AB = APInt::getLowBitsSet(BitWidth, AOut.getActiveBits());
+    }
     break;
   case Instruction::Shl:
     if (OperandNo == 0) {
@@ -183,6 +198,17 @@ void DemandedBits::determineLiveOperandBits(
           AB |= APInt::getHighBitsSet(BitWidth, ShiftAmt+1);
         else if (S->hasNoUnsignedWrap())
           AB |= APInt::getHighBitsSet(BitWidth, ShiftAmt);
+      } else {
+        ComputeKnownBits(BitWidth, UserI->getOperand(1), nullptr);
+        unsigned Min = Known.getMinValue().getLimitedValue(BitWidth - 1);
+        unsigned Max = Known.getMaxValue().getLimitedValue(BitWidth - 1);
+        // similar to Lshr case
+        AB = (AOut.lshr(Min) | AOut.lshr(Max));
+        const auto *S = cast<ShlOperator>(UserI);
+        if (S->hasNoSignedWrap())
+          AB |= APInt::getHighBitsSet(BitWidth, Max + 1);
+        else if (S->hasNoUnsignedWrap())
+          AB |= APInt::getHighBitsSet(BitWidth, Max);
       }
     }
     break;
@@ -197,6 +223,19 @@ void DemandedBits::determineLiveOperandBits(
         // (they must be zero).
         if (cast<LShrOperator>(UserI)->isExact())
           AB |= APInt::getLowBitsSet(BitWidth, ShiftAmt);
+      } else {
+        ComputeKnownBits(BitWidth, UserI->getOperand(1), nullptr);
+        unsigned Min = Known.getMinValue().getLimitedValue(BitWidth - 1);
+        unsigned Max = Known.getMaxValue().getLimitedValue(BitWidth - 1);
+        // Suppose AOut == 0b0000 0011
+        // [min, max] = [1, 3]
+        // shift by 1 we get 0b0000 0110
+        // shift by 2 we get 0b0000 1100
+        // shift by 3 we get 0b0001 1000
+        // we take the or here because need to cover all the above possibilities
+        AB = (AOut.shl(Min) | AOut.shl(Max));
+        if (cast<LShrOperator>(UserI)->isExact())
+          AB |= APInt::getLowBitsSet(BitWidth, Max);
       }
     }
     break;
@@ -217,6 +256,27 @@ void DemandedBits::determineLiveOperandBits(
         // (they must be zero).
         if (cast<AShrOperator>(UserI)->isExact())
           AB |= APInt::getLowBitsSet(BitWidth, ShiftAmt);
+      } else {
+        ComputeKnownBits(BitWidth, UserI->getOperand(1), nullptr);
+        unsigned Min = Known.getMinValue().getLimitedValue(BitWidth - 1);
+        unsigned Max = Known.getMaxValue().getLimitedValue(BitWidth - 1);
+        AB = (AOut.shl(Min) | AOut.shl(Max));
+
+        if (Max) {
+          // Suppose AOut = 0011 1100
+          // [min, max] = [1, 3]
+          // ShiftAmount = 1 : Mask is 1000 0000
+          // ShiftAmount = 2 : Mask is 1100 0000
+          // ShiftAmount = 3 : Mask is 1110 0000
+          // The Mask with Max covers every case in [min, max],
+          // so we are done
+          if ((AOut & APInt::getHighBitsSet(BitWidth, Max)).getBoolValue())
+            AB.setSignBit();
+        }
+        // If the shift is exact, then the low bits are not dead
+        // (they must be zero).
+        if (cast<AShrOperator>(UserI)->isExact())
+          AB |= APInt::getLowBitsSet(BitWidth, Max);
       }
     }
     break;
@@ -246,6 +306,35 @@ void DemandedBits::determineLiveOperandBits(
     else
       AB &= ~(Known.One & ~Known2.One);
     break;
+  case Instruction::UDiv:
+  case Instruction::URem:
+  case Instruction::SDiv:
+  case Instruction::SRem: {
+    auto Opc = UserI->getOpcode();
+    auto IsDiv = Opc == Instruction::UDiv || Opc == Instruction::SDiv;
+    bool IsSigned = Opc == Instruction::SDiv || Opc == Instruction::SRem;
+    if (OperandNo == 0) {
+      const APInt *DivAmnt;
+      if (match(UserI->getOperand(1), m_APInt(DivAmnt))) {
+        uint64_t D = DivAmnt->getZExtValue();
+        if (isPowerOf2_64(D)) {
+          unsigned Sh = Log2_64(D);
+          if (IsDiv) {
+            AB = AOut.shl(Sh);
+          } else {
+            AB = AOut & APInt::getLowBitsSet(BitWidth, Sh);
+          }
+        } else { // Non power of 2 constant div
+          unsigned LowQ = AOut.getActiveBits();
+          unsigned Need = LowQ + Log2_64_Ceil(D);
+          if (IsSigned)
+            Need++;
+          AB = APInt::getLowBitsSet(BitWidth, std::min(BitWidth, Need));
+        }
+      }
+    }
+    break;
+  }
   case Instruction::Xor:
   case Instruction::PHI:
     AB = AOut;
diff --git a/llvm/test/Analysis/DemandedBits/basic.ll b/llvm/test/Analysis/DemandedBits/basic.ll
index 4dc59c5392935..62eba9eaa81c5 100644
--- a/llvm/test/Analysis/DemandedBits/basic.ll
+++ b/llvm/test/Analysis/DemandedBits/basic.ll
@@ -25,3 +25,28 @@ define i8 @test_mul(i32 %a, i32 %b) {
   %6 = add nsw i8 %3, %5
   ret i8 %6
 }
+; CHECK-LABEL: Printing analysis 'Demanded Bits Analysis' for function 'test_mul_constant':
+; CHECK-DAG: DemandedBits: 0xff for   %3 = trunc i32 %2 to i8
+; CHECK-DAG: DemandedBits: 0xff for %2 in   %3 = trunc i32 %2 to i8
+; CHECK-DAG: DemandedBits: 0xff for   %2 = mul nsw i32 %1, 6
+; CHECK-DAG: DemandedBits: 0x1f for %1 in   %2 = mul nsw i32 %1, 6
+; CHECK-DAG: DemandedBits: 0xff for 6 in   %2 = mul nsw i32 %1, 6
+; CHECK-DAG: DemandedBits: 0x1 for   %4 = trunc i32 %2 to i1
+; CHECK-DAG: DemandedBits: 0x1 for %2 in   %4 = trunc i32 %2 to i1
+; CHECK-DAG: DemandedBits: 0x1f for   %1 = add nsw i32 %a, 12
+; CHECK-DAG: DemandedBits: 0x1f for %a in   %1 = add nsw i32 %a, 12
+; CHECK-DAG: DemandedBits: 0x1f for 12 in   %1 = add nsw i32 %a, 12
+; CHECK-DAG: DemandedBits: 0xff for   %5 = zext i1 %4 to i8
+; CHECK-DAG: DemandedBits: 0x1 for %4 in   %5 = zext i1 %4 to i8
+; CHECK-DAG: DemandedBits: 0xff for   %6 = add nsw i8 %3, %5
+; CHECK-DAG: DemandedBits: 0xff for %3 in   %6 = add nsw i8 %3, %5
+; CHECK-DAG: DemandedBits: 0xff for %5 in   %6 = add nsw i8 %3, %5
+define i8 @test_mul_constant(i32 %a, i32 %b){
+  %1 = add nsw i32 %a, 12
+  %2 = mul nsw i32 %1, 6
+  %3 = trunc i32 %2 to i8
+  %4 = trunc i32 %2 to i1
+  %5 = zext i1 %4 to i8
+  %6 = add nsw i8 %3, %5
+  ret i8 %6
+}
diff --git a/llvm/test/Analysis/DemandedBits/div_rem.ll b/llvm/test/Analysis/DemandedBits/div_rem.ll
new file mode 100644
index 0000000000000..818cba17dc1a6
--- /dev/null
+++ b/llvm/test/Analysis/DemandedBits/div_rem.ll
@@ -0,0 +1,261 @@
+; RUN: opt -S -disable-output -passes="print<demanded-bits>" < %s 2>&1 | FileCheck %s
+
+define i8 @test_sdiv_const_amount_4(i32 %a) {
+; CHECK-LABEL: 'test_sdiv_const_amount_4'
+; CHECK-DAG: DemandedBits: 0xff for   %div.t = trunc i32 %div to i8
+; CHECK-DAG: DemandedBits: 0xff for %div in   %div.t = trunc i32 %div to i8
+; CHECK-DAG: DemandedBits: 0xff for   %div = sdiv i32 %a, 4
+; CHECK-DAG: DemandedBits: 0x3fc for %a in   %div = sdiv i32 %a, 4
+; CHECK-DAG: DemandedBits: 0xffffffff for 4 in   %div = sdiv i32 %a, 4
+;
+  %div = sdiv i32 %a, 4
+  %div.t = trunc i32 %div to i8
+  ret i8 %div.t
+}
+
+define i8 @test_sdiv_const_amount_5(i32 %a) {
+; CHECK-LABEL: 'test_sdiv_const_amount_5'
+; CHECK-DAG: DemandedBits: 0xff for   %div = sdiv i32 %a, 5
+; CHECK-DAG: DemandedBits: 0xfff for %a in   %div = sdiv i32 %a, 5
+; CHECK-DAG: DemandedBits: 0xffffffff for 5 in   %div = sdiv i32 %a, 5
+; CHECK-DAG: DemandedBits: 0xff for   %div.t = trunc i32 %div to i8
+; CHECK-DAG: DemandedBits: 0xff for %div in   %div.t = trunc i32 %div to i8
+;
+  %div = sdiv i32 %a, 5
+  %div.t = trunc i32 %div to i8
+  ret i8 %div.t
+}
+
+define i8 @test_sdiv_const_amount_8(i32 %a) {
+; CHECK-LABEL: 'test_sdiv_const_amount_8'
+; CHECK-DAG: DemandedBits: 0xff for   %div = sdiv i32 %a, 8
+; CHECK-DAG: DemandedBits: 0x7f8 for %a in   %div = sdiv i32 %a, 8
+; CHECK-DAG: DemandedBits: 0xffffffff for 8 in   %div = sdiv i32 %a, 8
+; CHECK-DAG: DemandedBits: 0xff for   %div.t = trunc i32 %div to i8
+; CHECK-DAG: DemandedBits: 0xff for %div in   %div.t = trunc i32 %div to i8
+;
+  %div = sdiv i32 %a, 8
+  %div.t = trunc i32 %div to i8
+  ret i8 %div.t
+}
+
+define i8 @test_sdiv_const_amount_9(i32 %a) {
+; CHECK-LABEL: 'test_sdiv_const_amount_9'
+; CHECK-DAG: DemandedBits: 0xff for   %div = udiv i32 %a, 9
+; CHECK-DAG: DemandedBits: 0xfff for %a in   %div = udiv i32 %a, 9
+; CHECK-DAG: DemandedBits: 0xffffffff for 9 in   %div = udiv i32 %a, 9
+; CHECK-DAG: DemandedBits: 0xff for   %div.t = trunc i32 %div to i8
+; CHECK-DAG: DemandedBits: 0xff for %div in   %div.t = trunc i32 %div to i8
+;
+  %div = udiv i32 %a, 9
+  %div.t = trunc i32 %div to i8
+  ret i8 %div.t
+}
+
+define i8 @test_sdiv(i32 %a, i32 %b) {
+; CHECK-LABEL: 'test_sdiv'
+; CHECK-DAG: DemandedBits: 0xff for   %div = sdiv i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xffffffff for %a in   %div = sdiv i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xffffffff for %b in   %div = sdiv i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xff for   %div.t = trunc i32 %div to i8
+; CHECK-DAG: DemandedBits: 0xff for %div in   %div.t = trunc i32 %div to i8
+;
+  %div = sdiv i32 %a, %b
+  %div.t = trunc i32 %div to i8
+  ret i8 %div.t
+}
+
+define i8 @test_udiv_const_amount_4(i32 %a) {
+; CHECK-LABEL: 'test_udiv_const_amount_4'
+; CHECK-DAG: DemandedBits: 0xff for   %div = udiv i32 %a, 4
+; CHECK-DAG: DemandedBits: 0x3fc for %a in   %div = udiv i32 %a, 4
+; CHECK-DAG: DemandedBits: 0xffffffff for 4 in   %div = udiv i32 %a, 4
+; CHECK-DAG: DemandedBits: 0xff for   %div.t = trunc i32 %div to i8
+; CHECK-DAG: DemandedBits: 0xff for %div in   %div.t = trunc i32 %div to i8
+;
+  %div = udiv i32 %a, 4
+  %div.t = trunc i32 %div to i8
+  ret i8 %div.t
+}
+
+define i8 @test_udiv_const_amount_5(i32 %a) {
+; CHECK-LABEL: 'test_udiv_const_amount_5'
+; CHECK-DAG: DemandedBits: 0xff for   %div.t = trunc i32 %div to i8
+; CHECK-DAG: DemandedBits: 0xff for %div in   %div.t = trunc i32 %div to i8
+; CHECK-DAG: DemandedBits: 0xff for   %div = udiv i32 %a, 5
+; CHECK-DAG: DemandedBits: 0x7ff for %a in   %div = udiv i32 %a, 5
+; CHECK-DAG: DemandedBits: 0xffffffff for 5 in   %div = udiv i32 %a, 5
+;
+  %div = udiv i32 %a, 5
+  %div.t = trunc i32 %div to i8
+  ret i8 %div.t
+}
+
+define i8 @test_udiv_const_amount_8(i32 %a) {
+; CHECK-LABEL: 'test_udiv_const_amount_8'
+; CHECK-DAG: DemandedBits: 0xff for   %div = udiv i32 %a, 8
+; CHECK-DAG: DemandedBits: 0x7f8 for %a in   %div = udiv i32 %a, 8
+; CHECK-DAG: DemandedBits: 0xffffffff for 8 in   %div = udiv i32 %a, 8
+; CHECK-DAG: DemandedBits: 0xff for   %div.t = trunc i32 %div to i8
+; CHECK-DAG: DemandedBits: 0xff for %div in   %div.t = trunc i32 %div to i8
+;
+  %div = udiv i32 %a, 8
+  %div.t = trunc i32 %div to i8
+  ret i8 %div.t
+}
+
+define i8 @test_udiv_const_amount_9(i32 %a) {
+; CHECK-LABEL: 'test_udiv_const_amount_9'
+; CHECK-DAG: DemandedBits: 0xff for   %div.t = trunc i32 %div to i8
+; CHECK-DAG: DemandedBits: 0xff for %div in   %div.t = trunc i32 %div to i8
+; CHECK-DAG: DemandedBits: 0xff for   %div = udiv i32 %a, 9
+; CHECK-DAG: DemandedBits: 0xfff for %a in   %div = udiv i32 %a, 9
+; CHECK-DAG: DemandedBits: 0xffffffff for 9 in   %div = udiv i32 %a, 9
+;
+  %div = udiv i32 %a, 9
+  %div.t = trunc i32 %div to i8
+  ret i8 %div.t
+}
+
+define i8 @test_udiv(i32 %a, i32 %b) {
+; CHECK-LABEL: 'test_udiv'
+; CHECK-DAG: DemandedBits: 0xff for   %div = udiv i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xffffffff for %a in   %div = udiv i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xffffffff for %b in   %div = udiv i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xff for   %div.t = trunc i32 %div to i8
+; CHECK-DAG: DemandedBits: 0xff for %div in   %div.t = trunc i32 %div to i8
+;
+  %div = udiv i32 %a, %b
+  %div.t = trunc i32 %div to i8
+  ret i8 %div.t
+}
+
+define i8 @test_srem_const_amount_4(i32 %a) {
+; CHECK-LABEL: 'test_srem_const_amount_4'
+; CHECK-DAG:  DemandedBits: 0xff for   %rem = srem i32 %a, 4
+; CHECK-DAG:  DemandedBits: 0x3 for %a in   %rem = srem i32 %a, 4
+; CHECK-DAG:  DemandedBits: 0xffffffff for 4 in   %rem = srem i32 %a, 4
+; CHECK-DAG:  DemandedBits: 0xff for   %rem.t = trunc i32 %rem to i8
+; CHECK-DAG:  DemandedBits: 0xff for %rem in   %rem.t = trunc i32 %rem to i8
+;
+  %rem = srem i32 %a, 4
+  %rem.t = trunc i32 %rem to i8
+  ret i8 %rem.t
+}
+
+define i8 @test_srem_const_amount_5(i32 %a) {
+; CHECK-LABEL: 'test_srem_const_amount_5'
+; CHECK-DAG: DemandedBits: 0xff for   %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for %rem in   %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for   %rem = srem i32 %a, 5
+; CHECK-DAG: DemandedBits: 0xfff for %a in   %rem = srem i32 %a, 5
+; CHECK-DAG: DemandedBits: 0xffffffff for 5 in   %rem = srem i32 %a, 5
+;
+  %rem = srem i32 %a, 5
+  %rem.t = trunc i32 %rem to i8
+  ret i8 %rem.t
+}
+
+define i8 @test_srem_const_amount_8(i32 %a) {
+; CHECK-LABEL: 'test_srem_const_amount_8'
+; CHECK-DAG: DemandedBits: 0xff for   %rem = srem i32 %a, 8
+; CHECK-DAG: DemandedBits: 0x7 for %a in   %rem = srem i32 %a, 8
+; CHECK-DAG: DemandedBits: 0xffffffff for 8 in   %rem = srem i32 %a, 8
+; CHECK-DAG: DemandedBits: 0xff for   %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for %rem in   %rem.t = trunc i32 %rem to i8
+;
+  %rem = srem i32 %a, 8
+  %rem.t = trunc i32 %rem to i8
+  ret i8 %rem.t
+}
+
+define i8 @test_srem_const_amount_9(i32 %a) {
+; CHECK-LABEL: 'test_srem_const_amount_9'
+; CHECK-DAG: DemandedBits: 0xff for   %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for %rem in   %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for   %rem = srem i32 %a, 9
+; CHECK-DAG: DemandedBits: 0x1fff for %a in   %rem = srem i32 %a, 9
+; CHECK-DAG: DemandedBits: 0xffffffff for 9 in   %rem = srem i32 %a, 9
+;
+  %rem = srem i32 %a, 9
+  %rem.t = trunc i32 %rem to i8
+  ret i8 %rem.t
+}
+
+define i8 @test_srem(i32 %a, i32 %b) {
+; CHECK-LABEL: 'test_srem'
+; CHECK-DAG: DemandedBits: 0xff for   %rem = srem i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xffffffff for %a in   %rem = srem i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xffffffff for %b in   %rem = srem i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xff for   %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for %rem in   %rem.t = trunc i32 %rem to i8
+;
+  %rem = srem i32 %a, %b
+  %rem.t = trunc i32 %rem to i8
+  ret i8 %rem.t
+}
+
+define i8 @test_urem_const_amount_4(i32 %a) {
+; CHECK-LABEL: 'test_urem_const_amount_4'
+; CHECK-DAG: DemandedBits: 0xff for   %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for %rem in   %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for   %rem = urem i32 %a, 4
+; CHECK-DAG: DemandedBits: 0x3 for %a in   %rem = urem i32 %a, 4
+; CHECK-DAG: DemandedBits: 0xffffffff for 4 in   %rem = urem i32 %a, 4
+;
+  %rem = urem i32 %a, 4
+  %rem.t = trunc i32 %rem to i8
+  ret i8 %rem.t
+}
+
+define i8 @test_urem_const_amount_5(i32 %a) {
+; CHECK-LABEL: 'test_urem_const_amount_5'
+; CHECK-DAG: DemandedBits: 0xff for   %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for %rem in   %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for   %rem = urem i32 %a, 5
+; CHECK-DAG: DemandedBits: 0x7ff for %a in   %rem = urem i32 %a, 5
+; CHECK-DAG: DemandedBits: 0xffffffff for 5 in   %rem = urem i32 %a, 5
+;
+  %rem = urem i32 %a, 5
+  %rem.t = trunc i32 %rem to i8
+  ret i8 %rem.t
+}
+
+define i8 @test_urem_const_amount_8(i32 %a) {
+; CHECK-LABEL: 'test_urem_const_amount_8'
+; CHECK-DAG: DemandedBits: 0xff for   %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for %rem in   %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for   %rem = urem i32 %a, 8
+; CHECK-DAG: DemandedBits: 0x7 for %a in   %rem = urem i32 %a, 8
+; CHECK-DAG: DemandedBits: 0xffffffff for 8 in   %rem = urem i32 %a, 8
+;
+  %rem = urem i32 %a, 8
+  %rem.t = trunc i32 %rem to i8
+  ret i8 %rem.t
+}
+
+define i8 @test_urem_const_amount_9(i32 %a) {
+; CHECK-LABEL: 'test_urem_const_amount_9'
+; CHECK-DAG: DemandedBits: 0xff for   %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for %rem in   %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for   %rem = urem i32 %a, 9
+; CHECK-DAG: DemandedBits: 0xfff for %a in   %rem = urem i32 %a, 9
+; CHECK-DAG: DemandedBits: 0xffffffff for 9 in   %rem = urem i32 %a, 9
+;
+  %rem = urem i32 %a, 9
+  %rem.t = trunc i32 %rem to i8
+  ret i8 %rem.t
+}
+
+define i8 @test_urem(i32 %a, i32 %b) {
+; CHECK-LABEL: 'test_urem'
+; CHECK-DAG: DemandedBits: 0xff for   %rem = urem i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xffffffff for %a in   %rem = urem i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xffffffff for %b in   %rem = urem i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xff for   %rem.t = trunc i32 %rem to i8
+; CHECK-DAG: DemandedBits: 0xff for %rem in   %rem.t = trunc i32 %rem to i8
+;
+  %rem = urem i32 %a, %b
+  %rem.t = trunc i32 %rem to i8
+  ret i8 %rem.t
+}
diff --git a/llvm/test/Analysis/DemandedBits/shl.ll b/llvm/test/Analysis/DemandedBits/shl.ll
index e41f5f4107735..c3313a93c1e85 100644
--- a/llvm/test/Analysis/DemandedBits/shl.ll
+++ b/llvm/test/Analysis/DemandedBits/shl.ll
@@ -57,10 +57,56 @@ define i8 @test_shl(i32 %a, i32 %b) {
 ; CHECK-DAG:  DemandedBits: 0xff for %shl.t = trunc i32 %shl to i8
 ; CHECK-DAG:  DemandedBits: 0xff for %shl in %shl.t = trunc i32 %shl to i8
 ; CHECK-DAG:  DemandedBits: 0xff for %shl = shl i32 %a, %b
-; CHECK-DAG:  DemandedBits: 0xffffffff for %a in %shl = shl i32 %a, %b
+; CHECK-DAG:  DemandedBits: 0xff for %a in %shl = shl i32 %a, %b
 ; CHECK-DAG:  DemandedBits: 0xffffffff for %b in %shl = shl i32 %a, %b
 ;
   %shl = shl i32 %a, %b
   %shl.t = trunc i32 %shl to i8
   ret i8 %shl.t
 }
+
+define i8 @test_shl_var_amount(i32 %a, i32 %b){
+; CHECK-LABEL: 'test_shl_var_amount'
+; CHECK-DAG: DemandedBits: 0xff for   %5 = trunc i32 %4 to i8
+; CHECK-DAG: DemandedBits: 0xff for %4 in   %5 = trunc i32 %4 to i8
+; CHECK-DAG: DemandedBits: 0xff for   %4 = shl i32 %1, %3
+; CHECK-DAG: DemandedBits: 0xff for %1 in   %4 = shl i32 %1, %3
+; CHECK-DAG: DemandedBits: 0xffffffff for %3 in   %4 = shl i32 %1, %3
+; CHECK-DAG: DemandedBits: 0xff for   %2 = trunc i32 %1 to i8
+; CHECK-DAG: DemandedBits: 0xff for %1 in   %2 = trunc i32 %1 to i8
+; CHECK-DAG: DemandedBits: 0xffffffff for   %3 = zext i8 %2 to i32
+; CHECK-DAG: DemandedBits: 0xff for %2 in   %3 = zext i8 %2 to i32
+; CHECK-DAG: DemandedBits: 0xff for   %1 = add nsw i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xff for %a in   %1 = add nsw i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xff for %b in   %1 = add nsw i32 %a, %b
+;
+  %1 = add nsw i32 %a, %b
+  %2 = trunc i32 %1 to i8
+  %3 = zext i8 %2 to i32
+  %4 = shl i32 %1, %3
+  %5 = trunc i32 %4 to i8
+  ret i8 %5
+}
+
+define i8 @test_shl_var_amount_nsw(i32 %a, i32 %b){
+ ; CHECK-LABEL 'test_shl_var_amount_nsw'
+ ; CHECK-DAG: DemandedBits: 0xff for   %5 = trunc i32 %4 to i8
+ ; CHECK-DAG: DemandedBits: 0xff for %4 in   %5 = trunc i32 %4 to i8
+ ; CHECK-DAG: DemandedBits: 0xff for   %4 = shl nsw i32 %1, %3
+ ; CHECK-DAG: DemandedBits: 0xffffffff for %1 in   %4 = shl nsw i32 %1, %3
+ ; CHECK-DAG: DemandedBits: 0xffffffff for %3 in   %4 = shl nsw i32 %1, %3
+ ; CHECK-DAG: DemandedBits: 0xffffffff for   %3 = zext i8 %2 to i32
+ ; CHECK-DAG: DemandedBits: 0xff for %2 in   %3 = zext i8 %2 to i32
+ ; CHECK-DAG: DemandedBits: 0xff for   %2 = ...
[truncated]

@karouzakisp
Copy link
Contributor Author

@nikic @topperc @artagnon Could you please review? Thanks

@artagnon artagnon requested review from artagnon, dtcxzyw and nikic July 15, 2025 14:17
Copy link
Contributor

@artagnon artagnon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch! Can you split the improvements, one per operation, so we can ensure that each change has sufficient test coverage? You can club common code in one of the patches (SDiv, UDiv, URem, SRem).

@karouzakisp
Copy link
Contributor Author

Thanks for the patch! Can you split the improvements, one per operation, so we can ensure that each change has sufficient test coverage?

What do you mean split the improvements?

@artagnon artagnon requested a review from jayfoad July 15, 2025 14:21
@artagnon
Copy link
Contributor

Thanks for the patch! Can you split the improvements, one per operation, so we can ensure that each change has sufficient test coverage?

What do you mean split the improvements?

You can split the patch into multiple independent patches, restricting this PR to just introduce the div/rem code.

Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide the alive2 proof? See also https://llvm.org/docs/InstCombineContributorGuide.html#proofs.
You can use an extra integer parameter as the source of garbage bits:

define i32 @src(i32 %x, i32 %y, i32 noundef %z) {
  %div = udiv i32 %x, %y
  ret i32 %div
}
define i32 @tgt(i32 %x, i32 %y, i32 noundef %z) {
  %demanded_mask = ...
  %demanded_mask_inv = xor %demanded_mask, -1
  %x_demanded = and i32 %x, %demanded_mask
  %x_garbage = and i32 %z, %demanded_mask_inv
  %x_new = or disjoint %x_demanded, %x_garbage
  %div = udiv i32 %x_new, %y
  ret i32 %div
}

@karouzakisp
Copy link
Contributor Author

Can you provide the alive2 proof? See also https://llvm.org/docs/InstCombineContributorGuide.html#proofs. You can use an extra integer parameter as the source of garbage bits:

define i32 @src(i32 %x, i32 %y, i32 noundef %z) {
  %div = udiv i32 %x, %y
  ret i32 %div
}
define i32 @tgt(i32 %x, i32 %y, i32 noundef %z) {
  %demanded_mask = ...
  %demanded_mask_inv = xor %demanded_mask, -1
  %x_demanded = and i32 %x, %demanded_mask
  %x_garbage = and i32 %z, %demanded_mask_inv
  %x_new = or disjoint %x_demanded, %x_garbage
  %div = udiv i32 %x_new, %y
  ret i32 %div
}

I can provide Alive Proofs, but I am not sure on which transformations I should focus on. As DemandedBits is an Analysis. You mean on the transformation you gave above?

Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Miscompilation reproducer: https://alive2.llvm.org/ce/z/NiRcHk

; bin/opt -passes=bdce reduced.ll -S
define i8 @src(i8 %x) {
  %ext = sext i8 %x to i32
  %rem = srem i32 %ext, 2
  %trunc = trunc i32 %rem to i8
  ret i8 %trunc
}

Output:

define i8 @src(i8 %x) {
  %ext1 = zext i8 %x to i32
  %rem = srem i32 %ext1, 2
  %trunc = trunc i32 %rem to i8
  ret i8 %trunc
}

@karouzakisp
Copy link
Contributor Author

karouzakisp commented Jul 16, 2025

Miscompilation reproducer: https://alive2.llvm.org/ce/z/NiRcHk

; bin/opt -passes=bdce reduced.ll -S
define i8 @src(i8 %x) {
  %ext = sext i8 %x to i32
  %rem = srem i32 %ext, 2
  %trunc = trunc i32 %rem to i8
  ret i8 %trunc
}

Output:

define i8 @src(i8 %x) {
  %ext1 = zext i8 %x to i32
  %rem = srem i32 %ext1, 2
  %trunc = trunc i32 %rem to i8
  ret i8 %trunc
}

Fixed, we need to preserve the sign bit.
Otherwise, we risk losing it.

; CHECK-DAG: DemandedBits: 0xff for %div.t = trunc i32 %div to i8
; CHECK-DAG: DemandedBits: 0xff for %div in %div.t = trunc i32 %div to i8
; CHECK-DAG: DemandedBits: 0xff for %div = udiv i32 %a, 5
; CHECK-DAG: DemandedBits: 0x7ff for %a in %div = udiv i32 %a, 5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's imagine %a is 0xfffffff. 0xffffffff/0x5 is 0x33333333.

This is saying that we're allowed to treat %a as 0x7ff because the upper bits don't matter. 0x7ff/0x5 is 0x199. That's a different value in bits 7:0 than the 0x33333333.

What am I missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct, when developing the algorithm, I assumed Knuth-like division, which isn't exactly accurate, and bit-by-bit recurrence. Real division isn't exactly like this.

@nikic nikic changed the title [LLVM] Improve the DemandedBits Analysis [DemandedBits] Add div/rem support Jul 16, 2025
if (DivAmnt->isPowerOf2()) {
unsigned Sh = DivAmnt->countr_zero();
if (IsDiv) {
AB = AOut.shl(Sh);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct for signed division. Simple example:

(-1 s/ 2) & 1 is 0.
(-1&2 s/ 2) & 1 is 1.

@nikic
Copy link
Contributor

nikic commented Jul 16, 2025

@karouzakisp Here are the proofs: https://alive2.llvm.org/ce/z/hYWDwZ

This shows that urem, udiv and srem handling is correct and sdiv handling is incorrect. Please try to provide this form of proof for all demanded bits transform.

Note that handling urem and udiv for power of two divisor is actually useless because these will be converted to and / lshr. Only handling srem and sdiv is useful.

@karouzakisp
Copy link
Contributor Author

karouzakisp commented Jul 16, 2025

@karouzakisp Here are the proofs: https://alive2.llvm.org/ce/z/hYWDwZ

This shows that urem, udiv and srem handling is correct and sdiv handling is incorrect. Please try to provide this form of proof for all demanded bits transform.

Note that handling urem and udiv for power of two divisor is actually useless because these will be converted to and / lshr. Only handling srem and sdiv is useful.

@nikic
Thanks for the proofs. I will try to keep kind of the same format on the other PRs.

Regarding the udiv/urem being useless, I thought that some transformations might happen to them. But, are we sure that the conversion will happen before any pass cares about DemandedBits?

@nikic
Copy link
Contributor

nikic commented Jul 16, 2025

@nikic Thanks for the proofs. I will try to keep kind of the same format on the other PRs.

How about keeping it in mind for this PR as well? The new sdiv code is still incorrect. Including the sign bit is not enough: https://alive2.llvm.org/ce/z/J7yBrc

Regarding the udiv/urem being useless, I thought that some transformations might happen to them. But, are we sure that the conversion will happen before any pass cares about DemandedBits?

It's theoretically possible, but as a rule, we don't handle non-canonical instruction forms unless there is specific proof that handling them is necessary.

@karouzakisp
Copy link
Contributor Author

@nikic Thanks for the proofs. I will try to keep kind of the same format on the other PRs.

How about keeping it in mind for this PR as well? The new sdiv code is still incorrect. Including the sign bit is not enough: https://alive2.llvm.org/ce/z/J7yBrc

I know, I am working to fix it. I will create/update all the proofs afterwards.

Regarding the udiv/urem being useless, I thought that some transformations might happen to them. But, are we sure that the conversion will happen before any pass cares about DemandedBits?

It's theoretically possible, but as a rule, we don't handle non-canonical instruction forms unless there is specific proof that handling them is necessary.

In my understanding, enabling it can produce new knowledge on some chains that have srem before canonicalization. You know the IR better, so I leave it to you.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable now.

// multiple times and early on. So, we don't
// need to calculate demanded-bits for those.
const APInt *DivAmnt;
if (match(UserI->getOperand(1), m_APInt(DivAmnt))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (match(UserI->getOperand(1), m_APInt(DivAmnt))) {
if (match(UserI->getOperand(1), m_Power2(DivAmnt))) {

Then you don't need the separate check.

if (match(UserI->getOperand(1), m_APInt(DivAmnt))) {
if (DivAmnt->isPowerOf2()) {
unsigned Sh = DivAmnt->countr_zero();
AB = AOut & APInt::getLowBitsSet(BitWidth, Sh);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
AB = AOut & APInt::getLowBitsSet(BitWidth, Sh);
AB = AOut & (DivAmnt - 1);

A slightly simpler way to express this (matching InstCombineSimplifyDemanded).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think none of the tests currently show the influence of the AOut demanded bits on AB. If you're using trunc i8, you need to test a pow2 >= 512 for that.

case Instruction::SRem: {
// urem and udiv will be converted to and/lshr
// multiple times and early on. So, we don't
// need to calculate demanded-bits for those.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment looks oddly narrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants