Skip to content

Conversation

jserv
Copy link
Collaborator

@jserv jserv commented Aug 26, 2025

This pull request transforms shecc's peephole optimizer from basic instruction fusion to a comprehensive post-register-allocation optimization framework, providing performance improvements while maintaining educational clarity and bootstrap capability.

It creates lean and effective optimizer cooperation by eliminating redundant work between optimization passes.

Key Optimizations

  • Algebraic: x-x→0, x^x→0, x|x→x, x&x→x
  • Strength Reduction: x/8→x>>3, x%16→x&15, x*4→x<<2
  • Comparisons: x==x→1, x!=x→0, x<x→0
  • Bitwise: x&-1→x, x|0→x, x^0→x, x&0→0
  • Triple Patterns: 3-instruction sequences
  • Enhanced: Load/store elimination, dead code elimination, move elimination

jserv added 9 commits August 26, 2025 20:06
This commit implements redundant move elimination to optimize away
unnecessary move operations that are immediately overwritten, targetting
common inefficiencies in compiler-generated code.

Added 5 optimization patterns:
- Consecutive assignments to same destination:
  {mov rd,rs1; mov rd,rs2} → {mov rd,rs2}
- Load immediately overwritten:
  {load rd,offset; mov rd,rs} → {mov rd,rs}
- Constant load immediately overwritten:
  {li rd,imm; mov rd,rs} → {mov rd,rs}
- Consecutive loads to same register:
  {load rd,off1; load rd,off2} → {load rd,off2}
- Consecutive constant loads:
  {li rd,imm1; li rd,imm2} → {li rd,imm2}
This commit implements dead code elimination that works in conjunction
with SCCP to remove unreachable code after constant propagation and
branch folding.

These optimizations target code that becomes dead after constant
propagation, such as:
- Branches with constant conditions (if(1), if(0))
- Instructions that are immediately overwritten
- Unreachable code blocks after branch folding
This extends load/store elimination with more aggressive patterns,
reducing memory traffic by eliminating redundant memory operations.

Local memory optimizations:
- Dead store elimination: Consecutive stores to same location
- Redundant load elimination: Consecutive loads from same location
- Store-to-load forwarding: Replace load with stored value
- Load-store redundancy: Remove store of just-loaded value

Global memory optimizations:
- Global dead store elimination
- Global redundant load elimination
This implements mathematical identity patterns on register operands:
- Self-subtraction: x - x → 0
- Self-XOR: x ^ x → 0
- Self-OR: x | x → x (identity)
- Self-AND: x & x → x (identity)

These patterns emerge after register allocation when different
variables are assigned to the same register. SSA handles constant
folding, peephole handles register-based patterns.
This implements power-of-2 strength reduction patterns:
- Division by 2^n → right shift by n
- Modulo by 2^n → bitwise AND with (2^n - 1)
- Multiplication by 2^n → left shift by n

This optimization is unique to peephole optimizer since SSA
works on virtual registers before actual constants are loaded.
This implements self-comparison optimizations:
- x != x → 0 (always false)
- x == x → 1 (always true)
- x < x → 0 (always false)
- x > x → 0 (always false)
- x <= x → 1 (always true)
- x >= x → 1 (always true)

These register-based patterns appear after register allocation
when different variables are assigned to the same register.
Complements SSA's SCCP constant comparison folding.
This implements bitwise identity and absorption patterns:
- Double complement: ~(~x) → x
- AND with all-ones: x & -1 → x
- OR with zero: x | 0 → x
- XOR with zero: x ^ 0 → x
- AND with zero: x & 0 → 0 (absorption)
- OR with all-ones: x | -1 → -1 (absorption)
- Shift by zero: x << 0 → x, x >> 0 → x

These patterns are not handled by SSA optimizer and provide
significant optimization opportunities for bitwise operations.
This implements 3-instruction sequence optimizations:
- Store-load-store elimination: removes unused intermediate loads
- Consecutive stores: only last store to same location matters
This adds optimizer division of labor documentation:
- SSA: handles constant folding, CSE, self-assignments, DCE
- Peephole: handles register patterns, bitwise ops, strength reduction

Integrate all optimization functions into peephole driver:
- Triple pattern optimization (3-instruction sequences)
- Instruction fusion (2-instruction sequences)
- Comparison optimization (self-comparisons)
- Strength reduction (power-of-2 optimizations)
- Algebraic simplification (register self-operations)
- Bitwise optimization (identity/absorption patterns)
- Move elimination and load/store patterns
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant