Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Nov 7, 2025

📄 41% (0.41x) speedup for cartesian_product in pandas/core/indexes/multi.py

⏱️ Runtime : 4.81 milliseconds 3.40 milliseconds (best of 141 runs)

📝 Explanation and details

The optimized version achieves a 41% speedup through several targeted micro-optimizations that reduce overhead in NumPy array operations:

Key Optimizations:

  1. Replaced np.fromiter() with list comprehension: Changed np.fromiter((len(x) for x in X), dtype=np.intp) to np.array([len(x) for x in X], dtype=np.intp). This eliminates generator dispatch overhead since len() is fast and the input sizes are typically small-to-medium.

  2. Eliminated np.roll() operation: Replaced the expensive np.roll(cumprodX, 1) with manual array allocation (np.empty_like()) and slice assignment. Rolling an entire array involves copying all elements, while the optimized version just copies a slice, reducing memory operations.

  3. Integer division instead of float division: Changed b = cumprodX[-1] / cumprodX to b = prod_total // cumprodX. This avoids float conversion overhead and maintains integer precision, which is more efficient for the subsequent np.tile/np.repeat operations that expect integer arguments.

  4. Early array conversion: Added np.asarray(xi) to ensure inputs are converted to arrays once per iteration, optimizing downstream NumPy operations.

Performance Impact:
The line profiler shows the most significant gains come from eliminating the costly np.roll() operation (17.5% of original runtime) and reducing overhead in array creation. The optimizations are particularly effective for the common use cases shown in tests - small-to-medium cartesian products with 2-3 dimensions, where the overhead reductions provide substantial relative benefits.

Test Case Performance:
The optimization shows consistent 50-80% speedups across most test cases, with particularly strong performance on basic cases (2-3 lists) and edge cases with mixed types, demonstrating the robustness of the optimizations across different input scenarios.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 16 Passed
🌀 Generated Regression Tests 44 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
⚙️ Existing Unit Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
indexes/multi/test_util.py::TestCartesianProduct.test_datetimeindex 88.7μs 85.9μs 3.25%✅
indexes/multi/test_util.py::TestCartesianProduct.test_empty 256μs 151μs 68.6%✅
indexes/multi/test_util.py::TestCartesianProduct.test_empty_input 1.08μs 1.17μs -8.02%⚠️
indexes/multi/test_util.py::TestCartesianProduct.test_exceed_product_space 31.1μs 31.3μs -0.371%⚠️
indexes/multi/test_util.py::TestCartesianProduct.test_invalid_input 15.2μs 17.3μs -12.2%⚠️
indexes/multi/test_util.py::TestCartesianProduct.test_simple 97.6μs 58.3μs 67.5%✅
🌀 Generated Regression Tests and Runtime
from __future__ import annotations

import numpy as np
# imports
import pytest  # used for our unit tests
from pandas.core.dtypes.common import is_list_like
from pandas.core.indexes.multi import cartesian_product

# unit tests

# ----------- BASIC TEST CASES -----------

def test_basic_two_lists_ints():
    # Basic: two lists of ints
    codeflash_output = cartesian_product([[1, 2], [3, 4]]); result = codeflash_output # 100μs -> 62.3μs (61.9% faster)

def test_basic_two_lists_strs():
    # Basic: two lists of strings
    codeflash_output = cartesian_product([['A', 'B'], ['x', 'y']]); result = codeflash_output # 92.1μs -> 56.1μs (64.3% faster)

def test_basic_three_lists():
    # Basic: three lists
    codeflash_output = cartesian_product([[1, 2], ['a', 'b'], [True, False]]); result = codeflash_output # 101μs -> 59.2μs (71.4% faster)

def test_basic_single_list():
    # Basic: single list
    codeflash_output = cartesian_product([[10, 20, 30]]); result = codeflash_output # 68.9μs -> 41.1μs (67.6% faster)

def test_basic_lists_of_length_one():
    # Basic: all lists length 1
    codeflash_output = cartesian_product([[7], ['x'], [True]]); result = codeflash_output # 91.4μs -> 50.3μs (81.8% faster)

def test_basic_mixed_types():
    # Basic: mixed types in lists
    codeflash_output = cartesian_product([[1, 2], ['a', 3]]); result = codeflash_output # 89.6μs -> 55.8μs (60.5% faster)

# ----------- EDGE TEST CASES -----------

def test_edge_empty_outer_list():
    # Edge: empty outer list
    codeflash_output = cartesian_product([]); result = codeflash_output # 1.11μs -> 1.27μs (12.4% slower)

def test_edge_one_empty_inner_list():
    # Edge: one inner list empty
    codeflash_output = cartesian_product([[1, 2], [], [3, 4]]); result = codeflash_output # 102μs -> 61.6μs (65.8% faster)

def test_edge_all_inner_lists_empty():
    # Edge: all inner lists empty
    codeflash_output = cartesian_product([[], []]); result = codeflash_output # 85.2μs -> 51.7μs (64.8% faster)

def test_edge_inner_list_length_zero_and_one():
    # Edge: some lists empty, some length one
    codeflash_output = cartesian_product([[1], [], [2]]); result = codeflash_output # 90.2μs -> 54.9μs (64.3% faster)

def test_edge_non_listlike_outer():
    # Edge: non-listlike outer input
    with pytest.raises(TypeError):
        cartesian_product(123) # 1.85μs -> 2.03μs (9.15% slower)

def test_edge_non_listlike_inner():
    # Edge: non-listlike inner input
    with pytest.raises(TypeError):
        cartesian_product([[1, 2], 3]) # 1.80μs -> 2.03μs (11.2% slower)


def test_edge_inner_lists_with_different_types():
    # Edge: inner lists with different types
    codeflash_output = cartesian_product([[1, 2], ['a', None]]); result = codeflash_output # 111μs -> 70.3μs (59.2% faster)

def test_edge_inner_lists_with_duplicates():
    # Edge: inner lists with duplicate values
    codeflash_output = cartesian_product([[1, 1], [2, 2]]); result = codeflash_output # 91.0μs -> 54.0μs (68.5% faster)

def test_edge_inner_lists_with_numpy_arrays():
    # Edge: inner lists are numpy arrays
    codeflash_output = cartesian_product([np.array([1, 2]), np.array([3, 4])]); result = codeflash_output # 82.2μs -> 48.7μs (68.9% faster)

def test_edge_inner_lists_are_tuples():
    # Edge: inner lists are tuples
    codeflash_output = cartesian_product([(1, 2), ('a', 'b')]); result = codeflash_output # 90.9μs -> 55.2μs (64.6% faster)


def test_large_scale_two_lists_100_elements():
    # Large scale: two lists of 100 elements each
    arr1 = np.arange(100)
    arr2 = np.arange(100, 200)
    codeflash_output = cartesian_product([arr1, arr2]); result = codeflash_output # 109μs -> 72.5μs (50.7% faster)

def test_large_scale_three_lists_10_elements():
    # Large scale: three lists of 10 elements each
    arr1 = np.arange(10)
    arr2 = np.arange(10, 20)
    arr3 = np.arange(20, 30)
    codeflash_output = cartesian_product([arr1, arr2, arr3]); result = codeflash_output # 99.2μs -> 62.0μs (60.0% faster)

def test_large_scale_single_list_1000_elements():
    # Large scale: single list of 1000 elements
    arr = np.arange(1000)
    codeflash_output = cartesian_product([arr]); result = codeflash_output # 70.0μs -> 42.5μs (64.6% faster)

def test_large_scale_with_empty_inner_list():
    # Large scale: one large, one empty
    arr = np.arange(1000)
    codeflash_output = cartesian_product([arr, []]); result = codeflash_output # 84.6μs -> 53.5μs (58.3% faster)

def test_large_scale_performance():
    # Large scale: time it doesn't exceed 1 second for 100x10x1
    import time
    arr1 = np.arange(100)
    arr2 = np.arange(10)
    arr3 = np.arange(1)
    start = time.time()
    codeflash_output = cartesian_product([arr1, arr2, arr3]); result = codeflash_output # 95.5μs -> 58.5μs (63.2% faster)
    duration = time.time() - start

def test_large_scale_inner_lists_are_tuples():
    # Large scale: inner lists are tuples, 50 elements each
    arr1 = tuple(range(50))
    arr2 = tuple(range(50, 100))
    codeflash_output = cartesian_product([arr1, arr2]); result = codeflash_output # 95.1μs -> 59.1μs (61.0% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from __future__ import annotations

import numpy as np
# imports
import pytest
from pandas.core.dtypes.common import is_list_like
from pandas.core.indexes.multi import cartesian_product

# unit tests

# Basic Test Cases

def test_cartesian_product_basic_two_lists():
    # Test with two small lists of strings and ints
    codeflash_output = cartesian_product([list("AB"), [1, 2]]); result = codeflash_output # 111μs -> 70.4μs (58.3% faster)
    expected0 = np.array(['A', 'A', 'B', 'B'])
    expected1 = np.array([1, 2, 1, 2])

def test_cartesian_product_basic_three_lists():
    # Test with three lists
    codeflash_output = cartesian_product([[1, 2], ['a', 'b'], [True, False]]); result = codeflash_output # 104μs -> 62.5μs (67.2% faster)
    expected0 = np.array([1, 1, 1, 1, 2, 2, 2, 2])
    expected1 = np.array(['a', 'a', 'b', 'b', 'a', 'a', 'b', 'b'])
    expected2 = np.array([True, False, True, False, True, False, True, False])

def test_cartesian_product_single_list():
    # Test with a single list
    codeflash_output = cartesian_product([[1, 2, 3]]); result = codeflash_output # 69.7μs -> 40.9μs (70.4% faster)
    expected = np.array([1, 2, 3])

def test_cartesian_product_lists_of_length_one():
    # Test with lists of length one
    codeflash_output = cartesian_product([[42], ['x'], [True]]); result = codeflash_output # 88.7μs -> 50.9μs (74.4% faster)
    expected0 = np.array([42])
    expected1 = np.array(['x'])
    expected2 = np.array([True])

# Edge Test Cases

def test_cartesian_product_empty_outer_list():
    # Test with empty outer list
    codeflash_output = cartesian_product([]); result = codeflash_output # 1.08μs -> 1.32μs (18.2% slower)

def test_cartesian_product_inner_empty_list():
    # Test with one empty inner list (should produce arrays of length 0)
    codeflash_output = cartesian_product([[1, 2], [], [3, 4]]); result = codeflash_output # 102μs -> 62.0μs (65.9% faster)
    for arr in result:
        pass

def test_cartesian_product_all_inner_empty_lists():
    # Test with all inner lists empty
    codeflash_output = cartesian_product([[], []]); result = codeflash_output # 84.8μs -> 51.7μs (64.0% faster)
    for arr in result:
        pass

def test_cartesian_product_mixed_types():
    # Test with mixed types (int, float, str)
    codeflash_output = cartesian_product([[1, 2], [3.5, 4.5], ['a', 'b']]); result = codeflash_output # 102μs -> 61.0μs (67.2% faster)
    expected0 = np.array([1, 1, 1, 1, 2, 2, 2, 2])
    expected1 = np.array([3.5, 3.5, 4.5, 4.5, 3.5, 3.5, 4.5, 4.5])
    expected2 = np.array(['a', 'b', 'a', 'b', 'a', 'b', 'a', 'b'])

def test_cartesian_product_input_not_list_like():
    # Test with input not a list-like (should raise TypeError)
    with pytest.raises(TypeError):
        cartesian_product(42) # 1.84μs -> 2.03μs (9.27% slower)
    with pytest.raises(TypeError):
        cartesian_product('not a list') # 975ns -> 935ns (4.28% faster)
    with pytest.raises(TypeError):
        cartesian_product(None) # 524ns -> 525ns (0.190% slower)

def test_cartesian_product_inner_not_list_like():
    # Test with one inner element not list-like (should raise TypeError)
    with pytest.raises(TypeError):
        cartesian_product([[1, 2], 42]) # 1.62μs -> 1.75μs (7.65% slower)
    with pytest.raises(TypeError):
        cartesian_product([[1, 2], None]) # 685ns -> 777ns (11.8% slower)


def test_cartesian_product_zero_length_inner_list():
    # Test with one inner list of zero length, others non-empty
    codeflash_output = cartesian_product([[1, 2, 3], [], [4, 5]]); result = codeflash_output # 114μs -> 70.6μs (62.1% faster)
    for arr in result:
        pass

def test_cartesian_product_inner_lists_are_tuples():
    # Test with tuples instead of lists
    codeflash_output = cartesian_product([(1, 2), ('a', 'b')]); result = codeflash_output # 95.7μs -> 58.1μs (64.8% faster)
    expected0 = np.array([1, 1, 2, 2])
    expected1 = np.array(['a', 'b', 'a', 'b'])

def test_cartesian_product_inner_lists_are_arrays():
    # Test with numpy arrays as inner lists
    codeflash_output = cartesian_product([np.array([1, 2]), np.array([3, 4])]); result = codeflash_output # 81.3μs -> 49.9μs (63.0% faster)
    expected0 = np.array([1, 1, 2, 2])
    expected1 = np.array([3, 4, 3, 4])

def test_cartesian_product_inner_lists_are_generators():
    # Test with generators as inner lists
    codeflash_output = cartesian_product([range(2), range(3)]); result = codeflash_output # 92.8μs -> 53.8μs (72.6% faster)
    expected0 = np.array([0, 0, 0, 1, 1, 1])
    expected1 = np.array([0, 1, 2, 0, 1, 2])

# Large Scale Test Cases

def test_cartesian_product_large_lists():
    # Test with two large lists (1000 elements each)
    a = np.arange(1000)
    b = np.arange(1000)
    codeflash_output = cartesian_product([a, b]); result = codeflash_output # 1.17ms -> 1.13ms (3.00% faster)

def test_cartesian_product_large_single_list():
    # Test with a single large list
    a = np.arange(1000)
    codeflash_output = cartesian_product([a]); result = codeflash_output # 77.9μs -> 49.8μs (56.4% faster)

def test_cartesian_product_large_three_lists():
    # Test with three lists, each of size 10 (total 1000 combinations)
    a = np.arange(10)
    b = np.arange(10)
    c = np.arange(10)
    codeflash_output = cartesian_product([a, b, c]); result = codeflash_output # 93.7μs -> 58.7μs (59.8% faster)
    # Each output array should have 1000 elements
    for arr in result:
        pass

def test_cartesian_product_large_with_empty_inner():
    # Test with one large and one empty list
    a = np.arange(1000)
    b = []
    codeflash_output = cartesian_product([a, b]); result = codeflash_output # 84.5μs -> 51.9μs (62.7% faster)
    for arr in result:
        pass

def test_cartesian_product_large_mixed_types():
    # Test with large lists of mixed types
    a = np.arange(100)
    b = np.array(list('abcdefghij'))
    codeflash_output = cartesian_product([a, b]); result = codeflash_output # 83.1μs -> 51.5μs (61.2% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-cartesian_product-mholdb8r and push.

Codeflash Static Badge

The optimized version achieves a **41% speedup** through several targeted micro-optimizations that reduce overhead in NumPy array operations:

**Key Optimizations:**

1. **Replaced `np.fromiter()` with list comprehension**: Changed `np.fromiter((len(x) for x in X), dtype=np.intp)` to `np.array([len(x) for x in X], dtype=np.intp)`. This eliminates generator dispatch overhead since `len()` is fast and the input sizes are typically small-to-medium.

2. **Eliminated `np.roll()` operation**: Replaced the expensive `np.roll(cumprodX, 1)` with manual array allocation (`np.empty_like()`) and slice assignment. Rolling an entire array involves copying all elements, while the optimized version just copies a slice, reducing memory operations.

3. **Integer division instead of float division**: Changed `b = cumprodX[-1] / cumprodX` to `b = prod_total // cumprodX`. This avoids float conversion overhead and maintains integer precision, which is more efficient for the subsequent `np.tile`/`np.repeat` operations that expect integer arguments.

4. **Early array conversion**: Added `np.asarray(xi)` to ensure inputs are converted to arrays once per iteration, optimizing downstream NumPy operations.

**Performance Impact:**
The line profiler shows the most significant gains come from eliminating the costly `np.roll()` operation (17.5% of original runtime) and reducing overhead in array creation. The optimizations are particularly effective for the common use cases shown in tests - small-to-medium cartesian products with 2-3 dimensions, where the overhead reductions provide substantial relative benefits.

**Test Case Performance:**
The optimization shows consistent 50-80% speedups across most test cases, with particularly strong performance on basic cases (2-3 lists) and edge cases with mixed types, demonstrating the robustness of the optimizations across different input scenarios.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 November 7, 2025 08:26
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: Medium Optimization Quality according to Codeflash labels Nov 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: Medium Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant