Skip to content

Commit 6b72059

Browse files
⚡️ Speed up function matrix_decomposition_LU by 1,015%
The optimized code achieves a **15.9x speedup** by replacing explicit nested loops with vectorized NumPy operations, specifically using `np.dot()` for computing dot products. **Key Optimizations Applied:** 1. **Vectorized dot products for U matrix computation**: Instead of the nested loop `for j in range(i): sum_val += L[i, j] * U[j, k]`, the optimized version uses `np.dot(Li, U[:i, k])` where `Li = L[i, :i]`. 2. **Pre-computed slices for L matrix computation**: The optimized version extracts `Ui = U[:i, i]` once per iteration and reuses it with `np.dot(L[k, :i], Ui)` instead of recalculating the sum in a loop. **Why This Creates Significant Speedup:** The original implementation has O(n³) scalar operations performed in Python loops. From the line profiler, we can see that the innermost loop operations (`sum_val += L[i, j] * U[j, k]` and `sum_val += L[k, j] * U[j, i]`) account for **60.9%** of total runtime (30.7% + 30.2%). The optimized version leverages NumPy's highly optimized BLAS (Basic Linear Algebra Subprograms) routines for dot products, which: - Execute in compiled C code rather than interpreted Python - Use vectorized CPU instructions (SIMD) - Have better memory access patterns and cache locality **Performance Characteristics by Test Case:** - **Small matrices (≤10x10)**: The optimization shows **38-47% slower performance** due to NumPy function call overhead dominating the small computation cost - **Medium matrices (50x50)**: Shows **3-6x speedup** where vectorization benefits start outweighing overhead - **Large matrices (≥100x100)**: Demonstrates **7-15x speedup** where vectorized operations provide maximum benefit The crossover point appears around 20-30x30 matrices, making this optimization particularly effective for larger matrix decompositions commonly encountered in scientific computing and machine learning applications.
1 parent 9b951ff commit 6b72059

File tree

1 file changed

+5
-7
lines changed

1 file changed

+5
-7
lines changed

src/numpy_pandas/matrix_operations.py

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -60,16 +60,14 @@ def matrix_decomposition_LU(A: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
6060
L = np.zeros((n, n))
6161
U = np.zeros((n, n))
6262
for i in range(n):
63+
# Compute the U[i, k] entries using vectorized dot product
64+
Li = L[i, :i]
6365
for k in range(i, n):
64-
sum_val = 0
65-
for j in range(i):
66-
sum_val += L[i, j] * U[j, k]
67-
U[i, k] = A[i, k] - sum_val
66+
U[i, k] = A[i, k] - np.dot(Li, U[:i, k])
6867
L[i, i] = 1
68+
Ui = U[:i, i]
6969
for k in range(i + 1, n):
70-
sum_val = 0
71-
for j in range(i):
72-
sum_val += L[k, j] * U[j, i]
70+
sum_val = np.dot(L[k, :i], Ui)
7371
if U[i, i] == 0:
7472
raise ValueError("Cannot perform LU decomposition")
7573
L[k, i] = (A[k, i] - sum_val) / U[i, i]

0 commit comments

Comments
 (0)