-
Notifications
You must be signed in to change notification settings - Fork 136
Implement post-register-allocation optimization #267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
jserv
wants to merge
9
commits into
master
Choose a base branch
from
improve-peephole
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This commit implements redundant move elimination to optimize away unnecessary move operations that are immediately overwritten, targetting common inefficiencies in compiler-generated code. Added 5 optimization patterns: - Consecutive assignments to same destination: {mov rd,rs1; mov rd,rs2} → {mov rd,rs2} - Load immediately overwritten: {load rd,offset; mov rd,rs} → {mov rd,rs} - Constant load immediately overwritten: {li rd,imm; mov rd,rs} → {mov rd,rs} - Consecutive loads to same register: {load rd,off1; load rd,off2} → {load rd,off2} - Consecutive constant loads: {li rd,imm1; li rd,imm2} → {li rd,imm2}
This commit implements dead code elimination that works in conjunction with SCCP to remove unreachable code after constant propagation and branch folding. These optimizations target code that becomes dead after constant propagation, such as: - Branches with constant conditions (if(1), if(0)) - Instructions that are immediately overwritten - Unreachable code blocks after branch folding
This extends load/store elimination with more aggressive patterns, reducing memory traffic by eliminating redundant memory operations. Local memory optimizations: - Dead store elimination: Consecutive stores to same location - Redundant load elimination: Consecutive loads from same location - Store-to-load forwarding: Replace load with stored value - Load-store redundancy: Remove store of just-loaded value Global memory optimizations: - Global dead store elimination - Global redundant load elimination
This implements mathematical identity patterns on register operands: - Self-subtraction: x - x → 0 - Self-XOR: x ^ x → 0 - Self-OR: x | x → x (identity) - Self-AND: x & x → x (identity) These patterns emerge after register allocation when different variables are assigned to the same register. SSA handles constant folding, peephole handles register-based patterns.
This implements power-of-2 strength reduction patterns: - Division by 2^n → right shift by n - Modulo by 2^n → bitwise AND with (2^n - 1) - Multiplication by 2^n → left shift by n This optimization is unique to peephole optimizer since SSA works on virtual registers before actual constants are loaded.
This implements self-comparison optimizations: - x != x → 0 (always false) - x == x → 1 (always true) - x < x → 0 (always false) - x > x → 0 (always false) - x <= x → 1 (always true) - x >= x → 1 (always true) These register-based patterns appear after register allocation when different variables are assigned to the same register. Complements SSA's SCCP constant comparison folding.
This implements bitwise identity and absorption patterns: - Double complement: ~(~x) → x - AND with all-ones: x & -1 → x - OR with zero: x | 0 → x - XOR with zero: x ^ 0 → x - AND with zero: x & 0 → 0 (absorption) - OR with all-ones: x | -1 → -1 (absorption) - Shift by zero: x << 0 → x, x >> 0 → x These patterns are not handled by SSA optimizer and provide significant optimization opportunities for bitwise operations.
This implements 3-instruction sequence optimizations: - Store-load-store elimination: removes unused intermediate loads - Consecutive stores: only last store to same location matters
This adds optimizer division of labor documentation: - SSA: handles constant folding, CSE, self-assignments, DCE - Peephole: handles register patterns, bitwise ops, strength reduction Integrate all optimization functions into peephole driver: - Triple pattern optimization (3-instruction sequences) - Instruction fusion (2-instruction sequences) - Comparison optimization (self-comparisons) - Strength reduction (power-of-2 optimizations) - Algebraic simplification (register self-operations) - Bitwise optimization (identity/absorption patterns) - Move elimination and load/store patterns
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request transforms shecc's peephole optimizer from basic instruction fusion to a comprehensive post-register-allocation optimization framework, providing performance improvements while maintaining educational clarity and bootstrap capability.
It creates lean and effective optimizer cooperation by eliminating redundant work between optimization passes.
Key Optimizations