Skip to content

⚡️ Speed up function graph_traversal by 1,193% #61

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

codeflash-ai[bot]
Copy link

@codeflash-ai codeflash-ai bot commented Jul 30, 2025

📄 1,193% (11.93x) speedup for graph_traversal in src/dsa/various.py

⏱️ Runtime : 5.37 milliseconds 416 microseconds (best of 15 runs)

📝 Explanation and details

The optimized code achieves a ~12x speedup by replacing a list-based visited tracking mechanism with a set-based approach, addressing the core performance bottleneck in graph traversal.

Key Optimization Applied:

  • Separated concerns: Uses a set() for O(1) membership checking (visited) and a separate list for maintaining traversal order (result)
  • Fixed graph.get() default: Changed from graph.get(n, []) to graph.get(n, {}) to match the expected dict type

Why This Creates Massive Speedup:
The original code's if n in visited operation on a list has O(n) time complexity - it must scan through the entire list linearly. As the graph grows, each membership check becomes progressively slower. The optimized version uses if n in visited on a set, which is O(1) average case due to hash table lookups.

Performance Impact by Graph Size:

  • Small graphs (1-10 nodes): Minimal improvement or slight regression (~5-20% slower) due to set overhead
  • Medium graphs (30-200 nodes): Significant gains (155-331% faster) as O(n) vs O(1) difference becomes apparent
  • Large graphs (500-1000 nodes): Dramatic speedups (844-2362% faster) where the quadratic behavior of list membership checking becomes the dominant cost

Best Use Cases:
The optimization excels for:

  • Large star graphs where many nodes are visited quickly
  • Complete or dense graphs with high connectivity
  • Long traversal paths where membership checks accumulate
  • Any scenario where the visited set grows beyond ~20-30 nodes

The annotation test results clearly show this pattern - small test cases are slightly slower due to set initialization overhead, while large-scale tests show exponential performance gains as the visited collection grows.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 62 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 1 Passed
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import pytest  # used for our unit tests
from src.dsa.various import graph_traversal

# unit tests

# ------------------- Basic Test Cases -------------------

def test_single_node_graph():
    # Graph with a single node and no edges
    graph = {1: {}}
    codeflash_output = graph_traversal(graph, 1) # 500ns -> 583ns (14.2% slower)

def test_two_nodes_one_edge():
    # Graph with two nodes, one edge
    graph = {1: {2: None}, 2: {}}
    codeflash_output = graph_traversal(graph, 1) # 625ns -> 750ns (16.7% slower)
    # Start from the other node
    codeflash_output = graph_traversal(graph, 2) # 250ns -> 250ns (0.000% faster)

def test_simple_chain():
    # Linear chain: 1 -> 2 -> 3 -> 4
    graph = {1: {2: None}, 2: {3: None}, 3: {4: None}, 4: {}}
    codeflash_output = graph_traversal(graph, 1) # 833ns -> 959ns (13.1% slower)
    codeflash_output = graph_traversal(graph, 2) # 375ns -> 459ns (18.3% slower)
    codeflash_output = graph_traversal(graph, 3) # 291ns -> 292ns (0.342% slower)
    codeflash_output = graph_traversal(graph, 4) # 208ns -> 250ns (16.8% slower)

def test_simple_cycle():
    # Cycle: 1 -> 2 -> 3 -> 1
    graph = {1: {2: None}, 2: {3: None}, 3: {1: None}}
    codeflash_output = graph_traversal(graph, 1) # 750ns -> 916ns (18.1% slower)
    codeflash_output = graph_traversal(graph, 2) # 417ns -> 500ns (16.6% slower)
    codeflash_output = graph_traversal(graph, 3) # 375ns -> 416ns (9.86% slower)

def test_branching_graph():
    # 1 -> 2, 1 -> 3, 2 -> 4, 3 -> 4
    graph = {1: {2: None, 3: None}, 2: {4: None}, 3: {4: None}, 4: {}}
    codeflash_output = graph_traversal(graph, 1); result = codeflash_output # 917ns -> 1.00μs (8.30% slower)

def test_disconnected_graph():
    # 1 -> 2, 3 (disconnected)
    graph = {1: {2: None}, 2: {}, 3: {}}
    codeflash_output = graph_traversal(graph, 1) # 625ns -> 750ns (16.7% slower)
    codeflash_output = graph_traversal(graph, 3) # 209ns -> 292ns (28.4% slower)

# ------------------- Edge Test Cases -------------------

def test_empty_graph():
    # Empty graph: no nodes
    graph = {}
    codeflash_output = graph_traversal(graph, 1) # 500ns -> 583ns (14.2% slower)

def test_start_node_not_in_graph():
    # Start node not present in graph keys
    graph = {2: {3: None}, 3: {}}
    codeflash_output = graph_traversal(graph, 1) # 500ns -> 583ns (14.2% slower)

def test_self_loop():
    # Node with self-loop
    graph = {1: {1: None}}
    codeflash_output = graph_traversal(graph, 1) # 583ns -> 667ns (12.6% slower)

def test_multiple_self_loops_and_edges():
    # Multiple nodes with self-loops and edges
    graph = {1: {1: None, 2: None}, 2: {2: None, 3: None}, 3: {3: None}}
    codeflash_output = graph_traversal(graph, 1) # 917ns -> 1.00μs (8.30% slower)

def test_graph_with_isolated_nodes():
    # Some nodes are isolated (no edges in or out)
    graph = {1: {2: None}, 2: {}, 3: {}, 4: {}}
    codeflash_output = graph_traversal(graph, 1) # 625ns -> 750ns (16.7% slower)
    codeflash_output = graph_traversal(graph, 3) # 250ns -> 292ns (14.4% slower)
    codeflash_output = graph_traversal(graph, 4) # 208ns -> 250ns (16.8% slower)

def test_graph_with_non_integer_nodes():
    # Should handle integer keys only, but let's test with negative and zero
    graph = {0: {1: None}, 1: {-1: None}, -1: {}}
    codeflash_output = graph_traversal(graph, 0) # 708ns -> 792ns (10.6% slower)
    codeflash_output = graph_traversal(graph, -1) # 208ns -> 292ns (28.8% slower)

def test_graph_with_missing_edges():
    # Node present, but no outgoing edges (should be treated as empty dict)
    graph = {1: {}, 2: None}
    # Since our implementation uses graph.get(n, []), 2: None will cause TypeError.
    # Let's check that such malformed input raises.
    with pytest.raises(TypeError):
        graph_traversal(graph, 2) # 916ns -> 1.00μs (8.40% slower)

# ------------------- Large Scale Test Cases -------------------


def test_large_star_graph():
    # Star: node 0 connects to 1..999
    N = 1000
    graph = {0: {i: None for i in range(1, N)}}
    for i in range(1, N):
        graph[i] = {}
    codeflash_output = graph_traversal(graph, 0); result = codeflash_output # 1.92ms -> 78.0μs (2362% faster)

def test_large_complete_graph():
    # Complete graph: every node connects to every other node
    N = 30  # Keep small to avoid recursion limit
    graph = {i: {j: None for j in range(N) if j != i} for i in range(N)}
    codeflash_output = graph_traversal(graph, 0); result = codeflash_output # 74.3μs -> 26.5μs (180% faster)

def test_large_disconnected_graph():
    # 500 node chain, 500 isolated nodes
    N = 500
    graph = {i: {i+1: None} for i in range(N-1)}
    graph[N-1] = {}
    for i in range(N, 2*N):
        graph[i] = {}
    codeflash_output = graph_traversal(graph, 0); result = codeflash_output # 506μs -> 53.6μs (844% faster)
    # Start from an isolated node
    codeflash_output = graph_traversal(graph, N) # 500ns -> 541ns (7.58% slower)

def test_large_graph_with_cycles():
    # 100 node cycle
    N = 100
    graph = {i: {(i+1)%N: None} for i in range(N)}
    codeflash_output = graph_traversal(graph, 0); result = codeflash_output # 29.2μs -> 11.5μs (155% faster)

# ------------------- Determinism and Robustness -------------------

def test_determinism():
    # The traversal order should be deterministic for a given input
    graph = {1: {2: None, 3: None}, 2: {4: None}, 3: {4: None}, 4: {}}
    codeflash_output = graph_traversal(graph, 1); result1 = codeflash_output # 958ns -> 1.04μs (8.06% slower)
    codeflash_output = graph_traversal(graph, 1); result2 = codeflash_output # 500ns -> 583ns (14.2% slower)

def test_mutation_resistance():
    # Ensure that the function does not modify the input graph
    graph = {1: {2: None}, 2: {}}
    import copy
    graph_copy = copy.deepcopy(graph)
    codeflash_output = graph_traversal(graph, 1); _ = codeflash_output # 666ns -> 667ns (0.150% slower)

def test_no_duplicate_visits():
    # Even if there are multiple paths, each node should be visited once
    graph = {1: {2: None, 3: None}, 2: {4: None}, 3: {4: None}, 4: {}}
    codeflash_output = graph_traversal(graph, 1); result = codeflash_output # 875ns -> 1.00μs (12.5% slower)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import pytest  # used for our unit tests
from src.dsa.various import graph_traversal

# unit tests

# ---------------------------
# BASIC TEST CASES
# ---------------------------

def test_single_node_graph():
    # Graph with a single node and no edges
    graph = {1: {}}
    codeflash_output = graph_traversal(graph, 1) # 500ns -> 583ns (14.2% slower)

def test_two_connected_nodes():
    # Graph: 1 -> 2
    graph = {1: {2: None}, 2: {}}
    codeflash_output = graph_traversal(graph, 1) # 666ns -> 750ns (11.2% slower)
    codeflash_output = graph_traversal(graph, 2) # 250ns -> 333ns (24.9% slower)

def test_simple_chain():
    # Graph: 1 -> 2 -> 3
    graph = {1: {2: None}, 2: {3: None}, 3: {}}
    codeflash_output = graph_traversal(graph, 1) # 708ns -> 875ns (19.1% slower)
    codeflash_output = graph_traversal(graph, 2) # 291ns -> 375ns (22.4% slower)
    codeflash_output = graph_traversal(graph, 3) # 208ns -> 250ns (16.8% slower)

def test_simple_cycle():
    # Graph: 1 -> 2 -> 3 -> 1 (cycle)
    graph = {1: {2: None}, 2: {3: None}, 3: {1: None}}
    codeflash_output = graph_traversal(graph, 1); result = codeflash_output # 750ns -> 834ns (10.1% slower)

def test_branching_graph():
    # Graph: 1 -> 2, 1 -> 3
    graph = {1: {2: None, 3: None}, 2: {}, 3: {}}
    codeflash_output = graph_traversal(graph, 1); result = codeflash_output # 791ns -> 833ns (5.04% slower)

def test_disconnected_graph():
    # Graph: 1 -> 2, 3 (isolated)
    graph = {1: {2: None}, 2: {}, 3: {}}
    codeflash_output = graph_traversal(graph, 1) # 625ns -> 750ns (16.7% slower)
    codeflash_output = graph_traversal(graph, 3) # 250ns -> 292ns (14.4% slower)

# ---------------------------
# EDGE TEST CASES
# ---------------------------

def test_empty_graph():
    # Empty graph, starting node not present
    graph = {}
    codeflash_output = graph_traversal(graph, 1) # 542ns -> 625ns (13.3% slower)

def test_start_node_not_in_graph():
    # Node not in graph, but should still return [node]
    graph = {2: {3: None}, 3: {}}
    codeflash_output = graph_traversal(graph, 1) # 500ns -> 583ns (14.2% slower)

def test_graph_with_self_loop():
    # Node with a self-loop: 1 -> 1
    graph = {1: {1: None}}
    codeflash_output = graph_traversal(graph, 1) # 542ns -> 667ns (18.7% slower)

def test_graph_with_multiple_cycles():
    # 1 -> 2 -> 3 -> 1 (cycle), 2 -> 4 -> 2 (cycle)
    graph = {
        1: {2: None},
        2: {3: None, 4: None},
        3: {1: None},
        4: {2: None}
    }
    codeflash_output = graph_traversal(graph, 1); result = codeflash_output # 916ns -> 1.00μs (8.40% slower)

def test_graph_with_unreachable_nodes():
    # 1 -> 2, 3 -> 4 (disconnected components)
    graph = {1: {2: None}, 2: {}, 3: {4: None}, 4: {}}
    codeflash_output = graph_traversal(graph, 1) # 625ns -> 709ns (11.8% slower)
    codeflash_output = graph_traversal(graph, 3) # 291ns -> 375ns (22.4% slower)

def test_graph_with_empty_neighbors():
    # Node with no outgoing edges (empty dict)
    graph = {1: {}, 2: {}}
    codeflash_output = graph_traversal(graph, 1) # 500ns -> 583ns (14.2% slower)
    codeflash_output = graph_traversal(graph, 2) # 208ns -> 333ns (37.5% slower)

def test_graph_with_non_sequential_node_ids():
    # Node IDs are not sequential
    graph = {10: {20: None}, 20: {30: None}, 30: {}}
    codeflash_output = graph_traversal(graph, 10) # 750ns -> 833ns (9.96% slower)

# ---------------------------
# LARGE SCALE TEST CASES
# ---------------------------


def test_large_star_graph():
    # Star graph: 0 -> 1, 0 -> 2, ..., 0 -> 999
    N = 999
    graph = {0: {i: None for i in range(1, N+1)}}
    for i in range(1, N+1):
        graph[i] = {}
    codeflash_output = graph_traversal(graph, 0); result = codeflash_output # 1.92ms -> 78.9μs (2340% faster)

def test_large_complete_graph():
    # Complete graph: every node connects to every other node
    N = 50  # keep N small to avoid recursion limit
    graph = {i: {j: None for j in range(N) if j != i} for i in range(N)}
    codeflash_output = graph_traversal(graph, 0); result = codeflash_output # 297μs -> 69.2μs (331% faster)

def test_large_sparse_graph():
    # Sparse graph: only a few edges
    N = 1000
    graph = {i: {} for i in range(N)}
    # Add a single chain
    for i in range(0, N-1, 100):
        graph[i][i+100] = None
    codeflash_output = graph_traversal(graph, 0); result = codeflash_output # 2.12μs -> 2.21μs (3.80% slower)
    expected = list(range(0, N, 100))

def test_large_graph_with_cycles():
    # Large graph with cycles
    N = 200
    graph = {i: {(i+1)%N: None} for i in range(N)}
    codeflash_output = graph_traversal(graph, 0); result = codeflash_output # 94.0μs -> 22.1μs (325% faster)

# ---------------------------
# ADDITIONAL EDGE CASES
# ---------------------------

def test_graph_with_duplicate_edges():
    # Graph with multiple edges (should not matter in dict)
    graph = {1: {2: None, 2: None}, 2: {}}
    codeflash_output = graph_traversal(graph, 1) # 666ns -> 750ns (11.2% slower)

def test_graph_with_integer_and_negative_nodes():
    # Graph with negative and zero node ids
    graph = {0: {-1: None}, -1: {-2: None}, -2: {}}
    codeflash_output = graph_traversal(graph, 0) # 875ns -> 1.29μs (32.2% slower)

def test_graph_with_isolated_nodes():
    # Graph with isolated nodes
    graph = {1: {}, 2: {}, 3: {}}
    codeflash_output = graph_traversal(graph, 1) # 500ns -> 625ns (20.0% slower)
    codeflash_output = graph_traversal(graph, 2) # 291ns -> 333ns (12.6% slower)
    codeflash_output = graph_traversal(graph, 3) # 208ns -> 209ns (0.478% slower)

def test_graph_with_large_branching_factor():
    # Node with many outgoing edges
    N = 500
    graph = {0: {i: None for i in range(1, N+1)}}
    for i in range(1, N+1):
        graph[i] = {}
    codeflash_output = graph_traversal(graph, 0); result = codeflash_output # 495μs -> 40.0μs (1140% faster)

def test_graph_with_nonexistent_start_node_and_empty_graph():
    # Empty graph, start node not present
    graph = {}
    codeflash_output = graph_traversal(graph, 100) # 542ns -> 625ns (13.3% slower)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

from src.dsa.various import graph_traversal

def test_graph_traversal():
    graph_traversal({2: {}}, 2)

To edit these changes git checkout codeflash/optimize-graph_traversal-mdpca1f5 and push.

Codeflash

The optimized code achieves a ~12x speedup by replacing a list-based visited tracking mechanism with a set-based approach, addressing the core performance bottleneck in graph traversal.

**Key Optimization Applied:**
- **Separated concerns**: Uses a `set()` for O(1) membership checking (`visited`) and a separate `list` for maintaining traversal order (`result`)
- **Fixed graph.get() default**: Changed from `graph.get(n, [])` to `graph.get(n, {})` to match the expected dict type

**Why This Creates Massive Speedup:**
The original code's `if n in visited` operation on a list has O(n) time complexity - it must scan through the entire list linearly. As the graph grows, each membership check becomes progressively slower. The optimized version uses `if n in visited` on a set, which is O(1) average case due to hash table lookups.

**Performance Impact by Graph Size:**
- **Small graphs (1-10 nodes)**: Minimal improvement or slight regression (~5-20% slower) due to set overhead
- **Medium graphs (30-200 nodes)**: Significant gains (155-331% faster) as O(n) vs O(1) difference becomes apparent  
- **Large graphs (500-1000 nodes)**: Dramatic speedups (844-2362% faster) where the quadratic behavior of list membership checking becomes the dominant cost

**Best Use Cases:**
The optimization excels for:
- Large star graphs where many nodes are visited quickly
- Complete or dense graphs with high connectivity
- Long traversal paths where membership checks accumulate
- Any scenario where the visited set grows beyond ~20-30 nodes

The annotation test results clearly show this pattern - small test cases are slightly slower due to set initialization overhead, while large-scale tests show exponential performance gains as the visited collection grows.
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Jul 30, 2025
@codeflash-ai codeflash-ai bot requested a review from aseembits93 July 30, 2025 02:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⚡️ codeflash Optimization PR opened by Codeflash AI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants