Skip to content

⚡️ Speed up function linear_equation_solver by 26% #78

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

codeflash-ai[bot]
Copy link

@codeflash-ai codeflash-ai bot commented Jul 30, 2025

📄 26% (0.26x) speedup for linear_equation_solver in src/numpy_pandas/numerical_methods.py

⏱️ Runtime : 125 milliseconds 99.2 milliseconds (best of 60 runs)

📝 Explanation and details

The optimized code achieves a 26% speedup through several key algorithmic and memory access optimizations:

1. Reduced Memory Access Overhead
The most significant optimization is caching row references and intermediate values:

  • ai = augmented[i] and rowj = augmented[j] cache row references, reducing repeated list lookups
  • inv_aii = 1.0 / ai[i] pre-computes the reciprocal once instead of performing division in every iteration
  • These changes eliminate millions of redundant memory accesses in the innermost loops

2. Improved Pivoting Logic
The original code performs redundant abs() calls on the same pivot element:

# Original: calls abs(augmented[max_idx][i]) twice per comparison
if abs(augmented[j][i]) > abs(augmented[max_idx][i]):

The optimized version stores max_value and only computes abs() once per element, reducing function call overhead.

3. Conditional Row Swapping
Adding if max_idx != i: before swapping eliminates unnecessary operations when no pivot change is needed, which is common in well-conditioned matrices.

4. Optimized Back Substitution
The back substitution phase accumulates the sum separately (sum_ax) before the final division, reducing the number of operations on x[i] and improving numerical stability through better operation ordering.

Performance Impact by Test Case Type:

  • Large matrices (50x50 to 200x200): Show the highest speedups (25-27%) because the optimizations compound across the O(n³) operations
  • Small matrices (2x2, 3x3): Show modest improvements (1-9%) as the overhead reduction is less significant
  • Edge cases: Variable performance depending on pivoting frequency and numerical stability requirements

The optimizations particularly excel on larger, well-conditioned systems where the reduced memory access patterns and cached computations provide substantial cumulative benefits across the nested loops.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 31 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 3 Passed
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import math
import random
from typing import List

# imports
import pytest  # used for our unit tests
from src.numpy_pandas.numerical_methods import linear_equation_solver


# Helper function for comparing floats
def floats_close(a, b, eps=1e-8):
    if isinstance(a, list) and isinstance(b, list):
        return all(floats_close(x, y, eps) for x, y in zip(a, b))
    return abs(a - b) < eps

# Helper function to check if Ax == b approximately
def check_solution(A, x, b, eps=1e-8):
    n = len(A)
    for i in range(n):
        s = sum(A[i][j]*x[j] for j in range(n))
        if not math.isclose(s, b[i], abs_tol=eps):
            return False
    return True

# ---------------------------
# Basic Test Cases
# ---------------------------

def test_single_equation_single_variable():
    # 2x = 8  => x = 4
    A = [[2.0]]
    b = [8.0]
    codeflash_output = linear_equation_solver(A, b); x = codeflash_output # 1.29μs -> 1.33μs (3.15% slower)

def test_two_by_two_unique_solution():
    # x + y = 3, x - y = 1 => x=2, y=1
    A = [[1, 1], [1, -1]]
    b = [3, 1]
    codeflash_output = linear_equation_solver(A, b); x = codeflash_output # 2.50μs -> 2.46μs (1.67% faster)

def test_three_by_three_unique_solution():
    # x + y + z = 6, 2y + 5z = -4, 2x + 5y - z = 27
    A = [
        [1, 1, 1],
        [0, 2, 5],
        [2, 5, -1]
    ]
    b = [6, -4, 27]
    codeflash_output = linear_equation_solver(A, b); x = codeflash_output # 3.83μs -> 3.58μs (6.98% faster)

def test_negative_and_zero_coefficients():
    # 0x + 2y = 8, -3x + 0y = -9 => x=3, y=4
    A = [[0, 2], [-3, 0]]
    b = [8, -9]
    codeflash_output = linear_equation_solver(A, b); x = codeflash_output # 2.50μs -> 2.42μs (3.43% faster)

def test_fractional_coefficients():
    # 0.5x + 0.25y = 1, 0.25x + 0.5y = 1
    A = [[0.5, 0.25], [0.25, 0.5]]
    b = [1, 1]
    codeflash_output = linear_equation_solver(A, b); x = codeflash_output # 2.33μs -> 2.38μs (1.77% slower)

# ---------------------------
# Edge Test Cases
# ---------------------------

def test_singular_matrix_raises_zero_division():
    # x + y = 2, 2x + 2y = 4 (dependent, infinite solutions)
    A = [[1, 1], [2, 2]]
    b = [2, 4]
    with pytest.raises(ZeroDivisionError):
        linear_equation_solver(A, b) # 2.17μs -> 1.88μs (15.5% faster)

def test_inconsistent_system_raises_zero_division():
    # x + y = 2, x + y = 3 (no solution)
    A = [[1, 1], [1, 1]]
    b = [2, 3]
    with pytest.raises(ZeroDivisionError):
        linear_equation_solver(A, b) # 2.00μs -> 1.62μs (23.1% faster)


def test_ill_conditioned_matrix():
    # Very small differences in coefficients
    A = [[1, 1], [1, 1.0000001]]
    b = [2, 2.0000001]
    codeflash_output = linear_equation_solver(A, b); x = codeflash_output # 2.54μs -> 2.50μs (1.68% faster)


def test_zero_right_hand_side():
    # Homogeneous system, nontrivial solution only if singular
    A = [[2, -1], [1, 2]]
    b = [0, 0]
    codeflash_output = linear_equation_solver(A, b); x = codeflash_output # 2.38μs -> 2.50μs (5.00% slower)

def test_identity_matrix():
    # Should always return b
    n = 5
    A = [[1 if i == j else 0 for j in range(n)] for i in range(n)]
    b = [float(i) for i in range(n)]
    codeflash_output = linear_equation_solver(A, b); x = codeflash_output # 6.58μs -> 6.46μs (1.95% faster)

def test_permuted_rows():
    # Test with rows in different order
    A = [[0, 1], [1, 0]]
    b = [2, 1]
    codeflash_output = linear_equation_solver(A, b); x = codeflash_output # 2.25μs -> 2.38μs (5.26% slower)

def test_large_negative_coefficients():
    # All coefficients negative
    A = [[-2, -3], [-1, -1]]
    b = [-8, -3]
    codeflash_output = linear_equation_solver(A, b); x = codeflash_output # 2.42μs -> 2.42μs (0.000% faster)

def test_float_precision():
    # Test with numbers that could suffer float rounding
    A = [[1e-16, 1], [1, 1]]
    b = [1, 2]
    codeflash_output = linear_equation_solver(A, b); x = codeflash_output # 2.46μs -> 2.54μs (3.27% slower)


def test_mismatched_b_length_raises_index_error():
    # b is wrong length
    A = [[1, 2], [3, 4]]
    b = [5]
    with pytest.raises(IndexError):
        linear_equation_solver(A, b) # 875ns -> 875ns (0.000% faster)

# ---------------------------
# Large Scale Test Cases
# ---------------------------

def test_large_identity_matrix():
    # n=100, identity matrix, should return b
    n = 100
    A = [[1 if i == j else 0 for j in range(n)] for i in range(n)]
    b = [float(i) for i in range(n)]
    codeflash_output = linear_equation_solver(A, b); x = codeflash_output # 9.05ms -> 7.23ms (25.2% faster)

def test_large_diagonal_dominant_matrix():
    # n=100, diagonally dominant matrix, should be stable
    n = 100
    A = [[10 if i == j else 1 for j in range(n)] for i in range(n)]
    b = [float(i) for i in range(n)]
    codeflash_output = linear_equation_solver(A, b); x = codeflash_output # 9.04ms -> 7.22ms (25.3% faster)

def test_large_random_matrix():
    # n=50, random invertible matrix
    n = 50
    random.seed(42)
    # Make a random invertible matrix by starting with identity and adding small random values
    A = [[float(1 if i == j else 0) + random.uniform(-0.01, 0.01) for j in range(n)] for i in range(n)]
    b = [random.uniform(-100, 100) for _ in range(n)]
    codeflash_output = linear_equation_solver(A, b); x = codeflash_output # 1.19ms -> 934μs (27.5% faster)

def test_large_sparse_matrix():
    # n=100, mostly zeros except diagonal and one off-diagonal
    n = 100
    A = [[0.0 for _ in range(n)] for _ in range(n)]
    for i in range(n):
        A[i][i] = 2.0
        if i > 0:
            A[i][i-1] = -1.0
    b = [float(i) for i in range(n)]
    codeflash_output = linear_equation_solver(A, b); x = codeflash_output # 8.98ms -> 7.09ms (26.6% faster)

def test_large_system_with_known_solution():
    # n=50, random matrix, construct b from known x
    n = 50
    random.seed(123)
    A = [[random.uniform(-10, 10) for _ in range(n)] for _ in range(n)]
    x_true = [random.uniform(-5, 5) for _ in range(n)]
    b = [sum(A[i][j]*x_true[j] for j in range(n)) for i in range(n)]
    codeflash_output = linear_equation_solver(A, b); x = codeflash_output # 1.20ms -> 951μs (25.8% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import math
import random
# function to test
from typing import List

# imports
import pytest  # used for our unit tests
from src.numpy_pandas.numerical_methods import linear_equation_solver


# Helper function for checking solution accuracy
def is_close_list(a, b, tol=1e-8):
    return all(math.isclose(x, y, abs_tol=tol, rel_tol=tol) for x, y in zip(a, b))

# ========== BASIC TEST CASES ==========

def test_single_variable():
    # 1x = 5
    A = [[1.0]]
    b = [5.0]
    expected = [5.0]
    codeflash_output = linear_equation_solver(A, b); result = codeflash_output # 1.25μs -> 1.33μs (6.23% slower)

def test_two_by_two_unique_solution():
    # 2x + y = 5
    # x + 2y = 6
    A = [[2, 1], [1, 2]]
    b = [5, 6]
    expected = [2, 2]
    codeflash_output = linear_equation_solver(A, b); result = codeflash_output # 2.42μs -> 2.38μs (1.73% faster)

def test_three_by_three_unique_solution():
    # x + y + z = 6
    # 2y + 5z = -4
    # 2x + 5y - z = 27
    A = [
        [1, 1, 1],
        [0, 2, 5],
        [2, 5, -1]
    ]
    b = [6, -4, 27]
    expected = [5, 3, -2]
    codeflash_output = linear_equation_solver(A, b); result = codeflash_output # 3.75μs -> 3.67μs (2.29% faster)


def test_float_precision():
    # x + 1e-10y = 1
    # y = 1
    A = [[1, 1e-10], [0, 1]]
    b = [1, 1]
    expected = [1, 1]
    codeflash_output = linear_equation_solver(A, b); result = codeflash_output # 2.42μs -> 2.21μs (9.42% faster)

# ========== EDGE TEST CASES ==========






def test_empty_system():
    # No equations
    A = []
    b = []
    expected = []
    codeflash_output = linear_equation_solver(A, b); result = codeflash_output # 875ns -> 750ns (16.7% faster)

def test_large_numbers():
    # Test with very large coefficients
    A = [[1e100, 2e100], [3e100, 4e100]]
    b = [5e100, 11e100]
    # The system is:
    # x + 2y = 5
    # 3x + 4y = 11
    # Solution: x = 1, y = 2
    expected = [1, 2]
    codeflash_output = linear_equation_solver(A, b); result = codeflash_output # 2.62μs -> 2.71μs (3.06% slower)

def test_small_numbers():
    # Test with very small coefficients
    A = [[1e-100, 2e-100], [3e-100, 4e-100]]
    b = [5e-100, 11e-100]
    expected = [1, 2]
    codeflash_output = linear_equation_solver(A, b); result = codeflash_output # 2.08μs -> 2.12μs (1.98% slower)



def test_large_identity_matrix():
    # System: Ix = b, where I is identity matrix
    n = 100
    A = [[1 if i == j else 0 for j in range(n)] for i in range(n)]
    b = [float(i) for i in range(n)]
    expected = b[:]
    codeflash_output = linear_equation_solver(A, b); result = codeflash_output # 9.08ms -> 7.22ms (25.9% faster)

def test_large_random_diagonal_matrix():
    # Diagonal matrix with random nonzero values
    n = 100
    random.seed(42)
    diag = [random.uniform(1, 100) for _ in range(n)]
    A = [[diag[i] if i == j else 0 for j in range(n)] for i in range(n)]
    b = [random.uniform(-100, 100) for _ in range(n)]
    expected = [b[i] / diag[i] for i in range(n)]
    codeflash_output = linear_equation_solver(A, b); result = codeflash_output # 9.14ms -> 7.22ms (26.6% faster)

def test_large_dense_matrix():
    # Random dense matrix with unique solution
    n = 50
    random.seed(123)
    # Generate a random invertible matrix by starting with identity and adding small random values
    A = [[(1 if i == j else 0) + random.uniform(-0.1, 0.1) for j in range(n)] for i in range(n)]
    x_true = [random.uniform(-100, 100) for _ in range(n)]
    b = [sum(A[i][j] * x_true[j] for j in range(n)) for i in range(n)]
    codeflash_output = linear_equation_solver(A, b); result = codeflash_output # 1.19ms -> 941μs (26.3% faster)

def test_large_sparse_matrix():
    # Sparse matrix: mostly zeros, diagonal dominant
    n = 100
    random.seed(321)
    A = [[0 for _ in range(n)] for _ in range(n)]
    for i in range(n):
        A[i][i] = random.uniform(10, 20)
        if i < n - 1:
            A[i][i+1] = random.uniform(0, 1)
    x_true = [random.uniform(-10, 10) for _ in range(n)]
    b = [sum(A[i][j] * x_true[j] for j in range(n)) for i in range(n)]
    codeflash_output = linear_equation_solver(A, b); result = codeflash_output # 9.11ms -> 7.27ms (25.2% faster)

def test_large_system_performance():
    # Performance test: 200x200 system
    n = 200
    random.seed(456)
    A = [[(1 if i == j else 0) + random.uniform(-0.01, 0.01) for j in range(n)] for i in range(n)]
    x_true = [random.uniform(-100, 100) for _ in range(n)]
    b = [sum(A[i][j] * x_true[j] for j in range(n)) for i in range(n)]
    codeflash_output = linear_equation_solver(A, b); result = codeflash_output # 67.3ms -> 53.1ms (26.7% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

from src.numpy_pandas.numerical_methods import linear_equation_solver
import pytest

def test_linear_equation_solver():
    linear_equation_solver([[1.0, 0.0], [-0.5, 2.0]], [0.0, 0.0])

def test_linear_equation_solver_2():
    with pytest.raises(IndexError):
        linear_equation_solver([[], [], []], [0.0, 0.0])

def test_linear_equation_solver_3():
    with pytest.raises(IndexError, match='list\\ index\\ out\\ of\\ range'):
        linear_equation_solver([[], [], [], []], [0.0, 0.0, 0.0, 0.5])

To edit these changes git checkout codeflash/optimize-linear_equation_solver-mdpjkx18 and push.

Codeflash

The optimized code achieves a 26% speedup through several key algorithmic and memory access optimizations:

**1. Reduced Memory Access Overhead**
The most significant optimization is caching row references and intermediate values:
- `ai = augmented[i]` and `rowj = augmented[j]` cache row references, reducing repeated list lookups
- `inv_aii = 1.0 / ai[i]` pre-computes the reciprocal once instead of performing division in every iteration
- These changes eliminate millions of redundant memory accesses in the innermost loops

**2. Improved Pivoting Logic**
The original code performs redundant `abs()` calls on the same pivot element:
```python
# Original: calls abs(augmented[max_idx][i]) twice per comparison
if abs(augmented[j][i]) > abs(augmented[max_idx][i]):
```
The optimized version stores `max_value` and only computes `abs()` once per element, reducing function call overhead.

**3. Conditional Row Swapping**
Adding `if max_idx != i:` before swapping eliminates unnecessary operations when no pivot change is needed, which is common in well-conditioned matrices.

**4. Optimized Back Substitution**
The back substitution phase accumulates the sum separately (`sum_ax`) before the final division, reducing the number of operations on `x[i]` and improving numerical stability through better operation ordering.

**Performance Impact by Test Case Type:**
- **Large matrices (50x50 to 200x200)**: Show the highest speedups (25-27%) because the optimizations compound across the O(n³) operations
- **Small matrices (2x2, 3x3)**: Show modest improvements (1-9%) as the overhead reduction is less significant
- **Edge cases**: Variable performance depending on pivoting frequency and numerical stability requirements

The optimizations particularly excel on larger, well-conditioned systems where the reduced memory access patterns and cached computations provide substantial cumulative benefits across the nested loops.
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Jul 30, 2025
@codeflash-ai codeflash-ai bot requested a review from aseembits93 July 30, 2025 05:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⚡️ codeflash Optimization PR opened by Codeflash AI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants