Implement post-register-allocation optimization #267

jserv · 2025-08-26T15:57:59Z

This pull request transforms shecc's peephole optimizer from basic instruction fusion to a comprehensive post-register-allocation optimization framework, providing performance improvements while maintaining educational clarity and bootstrap capability.

It creates lean and effective optimizer cooperation by eliminating redundant work between optimization passes.

Key Optimizations

Algebraic: x-x→0, x^x→0, x|x→x, x&x→x
Strength Reduction: x/8→x>>3, x%16→x&15, x*4→x<<2
Comparisons: x==x→1, x!=x→0, x<x→0
Bitwise: x&-1→x, x|0→x, x^0→x, x&0→0
Triple Patterns: 3-instruction sequences
Enhanced: Load/store elimination, dead code elimination, move elimination

This commit implements redundant move elimination to optimize away unnecessary move operations that are immediately overwritten, targetting common inefficiencies in compiler-generated code. Added 5 optimization patterns: - Consecutive assignments to same destination: {mov rd,rs1; mov rd,rs2} → {mov rd,rs2} - Load immediately overwritten: {load rd,offset; mov rd,rs} → {mov rd,rs} - Constant load immediately overwritten: {li rd,imm; mov rd,rs} → {mov rd,rs} - Consecutive loads to same register: {load rd,off1; load rd,off2} → {load rd,off2} - Consecutive constant loads: {li rd,imm1; li rd,imm2} → {li rd,imm2}

This commit implements dead code elimination that works in conjunction with SCCP to remove unreachable code after constant propagation and branch folding. These optimizations target code that becomes dead after constant propagation, such as: - Branches with constant conditions (if(1), if(0)) - Instructions that are immediately overwritten - Unreachable code blocks after branch folding

This extends load/store elimination with more aggressive patterns, reducing memory traffic by eliminating redundant memory operations. Local memory optimizations: - Dead store elimination: Consecutive stores to same location - Redundant load elimination: Consecutive loads from same location - Store-to-load forwarding: Replace load with stored value - Load-store redundancy: Remove store of just-loaded value Global memory optimizations: - Global dead store elimination - Global redundant load elimination

This implements mathematical identity patterns on register operands: - Self-subtraction: x - x → 0 - Self-XOR: x ^ x → 0 - Self-OR: x | x → x (identity) - Self-AND: x & x → x (identity) These patterns emerge after register allocation when different variables are assigned to the same register. SSA handles constant folding, peephole handles register-based patterns.

This implements power-of-2 strength reduction patterns: - Division by 2^n → right shift by n - Modulo by 2^n → bitwise AND with (2^n - 1) - Multiplication by 2^n → left shift by n This optimization is unique to peephole optimizer since SSA works on virtual registers before actual constants are loaded.

This implements self-comparison optimizations: - x != x → 0 (always false) - x == x → 1 (always true) - x < x → 0 (always false) - x > x → 0 (always false) - x <= x → 1 (always true) - x >= x → 1 (always true) These register-based patterns appear after register allocation when different variables are assigned to the same register. Complements SSA's SCCP constant comparison folding.

This implements bitwise identity and absorption patterns: - Double complement: ~(~x) → x - AND with all-ones: x & -1 → x - OR with zero: x | 0 → x - XOR with zero: x ^ 0 → x - AND with zero: x & 0 → 0 (absorption) - OR with all-ones: x | -1 → -1 (absorption) - Shift by zero: x << 0 → x, x >> 0 → x These patterns are not handled by SSA optimizer and provide significant optimization opportunities for bitwise operations.

This implements 3-instruction sequence optimizations: - Store-load-store elimination: removes unused intermediate loads - Consecutive stores: only last store to same location matters

This adds optimizer division of labor documentation: - SSA: handles constant folding, CSE, self-assignments, DCE - Peephole: handles register patterns, bitwise ops, strength reduction Integrate all optimization functions into peephole driver: - Triple pattern optimization (3-instruction sequences) - Instruction fusion (2-instruction sequences) - Comparison optimization (self-comparisons) - Strength reduction (power-of-2 optimizations) - Algebraic simplification (register self-operations) - Bitwise optimization (identity/absorption patterns) - Move elimination and load/store patterns

jserv added 9 commits August 26, 2025 20:06

Add triple pattern optimization

f2f5cb6

This implements 3-instruction sequence optimizations: - Store-load-store elimination: removes unused intermediate loads - Consecutive stores: only last store to same location matters

jserv requested review from ChAoSUnItY, DrXiao, fennecJ, nosba0957 and vacantron August 26, 2025 15:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement post-register-allocation optimization #267

Implement post-register-allocation optimization #267

Uh oh!

jserv commented Aug 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

Implement post-register-allocation optimization #267

Are you sure you want to change the base?

Implement post-register-allocation optimization #267

Uh oh!

Conversation

jserv commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jserv commented Aug 26, 2025 •

edited

Loading