Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Nov 7, 2025

📄 43% (0.43x) speedup for _calculate_griddata in optuna/visualization/matplotlib/_contour.py

⏱️ Runtime : 582 milliseconds 407 milliseconds (best of 18 runs)

📝 Explanation and details

The optimized code achieves a 42% speedup through several key algorithmic and data structure improvements:

1. Vectorized Array Operations

  • Replaced manual loops with NumPy array operations in _calculate_griddata(). Instead of iterating through zip(xaxis.values, yaxis.values) and checking each pair individually, the code creates NumPy arrays and uses vectorized masking: mask_valid = (xaxis_vals != None) & (yaxis_vals != None). This eliminates Python-level iteration overhead.

2. Hash Map Lookups Instead of Linear Search

  • Replaced expensive list.index() calls with O(1) dictionary lookups. The original code used xaxis.indices.index(x_value) which is O(n), while the optimized version precomputes xindex_lookup = {val: idx for idx, val in enumerate(xaxis.indices)} for O(1) access.

3. Vectorized Distance Calculations in _create_zmap()

  • Transformed the core distance calculation from individual np.argmin(np.abs(xi - x)) calls per point to a fully vectorized operation using broadcasting: x_indices = np.abs(xi_arr[np.newaxis, :] - x_array[:, np.newaxis]). This computes all distances at once, dramatically reducing function call overhead.

4. Sparse Matrix Construction Optimizations

  • In _interpolate_zmap(), precomputed zmap_keys = set(zmap.keys()) to avoid repeated dictionary key existence checks, and added the diagonal coefficient (4) explicitly rather than accumulating it through neighbor iterations.

5. List Comprehension and Memory Layout Improvements

  • Replaced map() calls with list comprehensions ([float(x) for x in values] vs list(map(lambda x: float(x), values))) for better performance in modern Python.

Performance Impact by Test Case:
The optimizations show consistent 42-44% improvements across all numerical and mixed-type test cases, with the largest gains in scenarios involving many data points where the vectorized operations and hash lookups provide maximum benefit. The two edge cases showing slower performance (empty values, same axis names) represent degenerate cases where the overhead of creating NumPy arrays outweighs the benefits, but these are uncommon in real usage.

Workload Suitability:
These optimizations are particularly effective for medium to large-scale contour plotting workloads typical in hyperparameter optimization visualization, where the function processes hundreds of trial points across multiple parameter dimensions.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 15 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import numpy as np
# imports
import pytest  # used for our unit tests
from optuna.visualization.matplotlib._contour import _calculate_griddata


# Minimal stubs for required classes
class _AxisInfo:
    def __init__(self, name, values, indices, is_cat=False, is_log=False, range_=None):
        self.name = name
        self.values = values
        self.indices = indices
        self.is_cat = is_cat
        self.is_log = is_log
        self.range = range_ if range_ is not None else (min(values), max(values))

class _PlotValues:
    def __init__(self, x, y):
        self.x = x
        self.y = y

class _SubContourInfo:
    def __init__(self, xaxis, yaxis, z_values, constraints):
        self.xaxis = xaxis
        self.yaxis = yaxis
        self.z_values = z_values
        self.constraints = constraints
from optuna.visualization.matplotlib._contour import _calculate_griddata

# -------------------- UNIT TESTS --------------------

# 1. Basic Test Cases













#------------------------------------------------
from __future__ import annotations

from collections.abc import Sequence

import numpy as np
# imports
import pytest  # used for our unit tests
import scipy
from optuna.visualization.matplotlib._contour import _calculate_griddata

CONTOUR_POINT_NUM = 100

class _AxisInfo:
    def __init__(self, name, values, indices, is_cat=False, is_log=False, range=None):
        self.name = name
        self.values = values
        self.indices = indices
        self.is_cat = is_cat
        self.is_log = is_log
        self.range = range if range is not None else (min(values), max(values))

class _PlotValues:
    def __init__(self, x, y):
        self.x = x
        self.y = y

class _SubContourInfo:
    def __init__(self, xaxis, yaxis, z_values, constraints):
        self.xaxis = xaxis
        self.yaxis = yaxis
        self.z_values = z_values
        self.constraints = constraints
from optuna.visualization.matplotlib._contour import _calculate_griddata

# unit tests

# ---- Basic Test Cases ----

def test_basic_numerical_2d():
    # Simple 2D grid with numerical axes
    xaxis = _AxisInfo('x', [1.0, 2.0, 3.0], [1.0, 2.0, 3.0], is_cat=False, is_log=False, range=(1.0, 3.0))
    yaxis = _AxisInfo('y', [10.0, 20.0, 30.0], [10.0, 20.0, 30.0], is_cat=False, is_log=False, range=(10.0, 30.0))
    z_values = {(0,0): 1.0, (1,1): 2.0, (2,2): 3.0}
    constraints = [True, False, True]
    info = _SubContourInfo(xaxis, yaxis, z_values, constraints)
    zi, feasible, infeasible = _calculate_griddata(info) # 44.8ms -> 31.3ms (43.2% faster)

def test_basic_categorical_2d():
    # Categorical axes
    xaxis = _AxisInfo('x', ['a', 'b', 'c'], ['a', 'b', 'c'], is_cat=True, is_log=False, range=(0,2))
    yaxis = _AxisInfo('y', ['x', 'y', 'z'], ['x', 'y', 'z'], is_cat=True, is_log=False, range=(0,2))
    z_values = {(0,0): 5.0, (1,1): 6.0, (2,2): 7.0}
    constraints = [True, True, False]
    info = _SubContourInfo(xaxis, yaxis, z_values, constraints)
    zi, feasible, infeasible = _calculate_griddata(info) # 44.8ms -> 31.2ms (43.5% faster)

def test_basic_mixed_cat_num():
    # Mixed categorical and numerical axes
    xaxis = _AxisInfo('x', ['a', 'b', 'c'], ['a', 'b', 'c'], is_cat=True, is_log=False, range=(0,2))
    yaxis = _AxisInfo('y', [1.0, 2.0, 3.0], [1.0, 2.0, 3.0], is_cat=False, is_log=False, range=(1.0,3.0))
    z_values = {(0,0): 10.0, (1,1): 20.0, (2,2): 30.0}
    constraints = [False, True, True]
    info = _SubContourInfo(xaxis, yaxis, z_values, constraints)
    zi, feasible, infeasible = _calculate_griddata(info) # 44.5ms -> 31.3ms (42.1% faster)

def test_basic_logscale_axis():
    # Logarithmic axis
    xaxis = _AxisInfo('x', [1.0, 10.0, 100.0], [1.0, 10.0, 100.0], is_cat=False, is_log=True, range=(1.0,100.0))
    yaxis = _AxisInfo('y', [2.0, 20.0, 200.0], [2.0, 20.0, 200.0], is_cat=False, is_log=True, range=(2.0,200.0))
    z_values = {(0,0): 0.1, (1,1): 0.2, (2,2): 0.3}
    constraints = [True, False, True]
    info = _SubContourInfo(xaxis, yaxis, z_values, constraints)
    zi, feasible, infeasible = _calculate_griddata(info) # 44.5ms -> 31.3ms (42.4% faster)

# ---- Edge Test Cases ----

def test_edge_empty_values():
    # No valid x or y values
    xaxis = _AxisInfo('x', [], [], is_cat=False, is_log=False, range=(0,1))
    yaxis = _AxisInfo('y', [], [], is_cat=False, is_log=False, range=(0,1))
    z_values = {}
    constraints = []
    info = _SubContourInfo(xaxis, yaxis, z_values, constraints)
    zi, feasible, infeasible = _calculate_griddata(info) # 5.64μs -> 17.1μs (66.9% slower)

def test_edge_none_values():
    # Some values are None
    xaxis = _AxisInfo('x', [None, 2.0, None], [None, 2.0, None], is_cat=False, is_log=False, range=(2.0,2.0))
    yaxis = _AxisInfo('y', [None, 20.0, None], [None, 20.0, None], is_cat=False, is_log=False, range=(20.0,20.0))
    z_values = {(1,1): 42.0}
    constraints = [False]
    info = _SubContourInfo(xaxis, yaxis, z_values, constraints)
    zi, feasible, infeasible = _calculate_griddata(info) # 44.2ms -> 30.7ms (44.2% faster)

def test_edge_duplicate_values():
    # Duplicate values in axes
    xaxis = _AxisInfo('x', [1.0, 1.0, 2.0], [1.0, 1.0, 2.0], is_cat=False, is_log=False, range=(1.0,2.0))
    yaxis = _AxisInfo('y', [10.0, 10.0, 20.0], [10.0, 10.0, 20.0], is_cat=False, is_log=False, range=(10.0,20.0))
    z_values = {(0,0): 3.0, (1,1): 4.0, (2,2): 5.0}
    constraints = [True, False, True]
    info = _SubContourInfo(xaxis, yaxis, z_values, constraints)
    zi, feasible, infeasible = _calculate_griddata(info) # 43.9ms -> 30.7ms (43.0% faster)

def test_edge_same_axis_names():
    # xaxis and yaxis have same name
    xaxis = _AxisInfo('x', [1.0], [1.0], is_cat=False, is_log=False, range=(1.0,1.0))
    yaxis = _AxisInfo('x', [2.0], [2.0], is_cat=False, is_log=False, range=(2.0,2.0))
    z_values = {(0,0): 99.0}
    constraints = [True]
    info = _SubContourInfo(xaxis, yaxis, z_values, constraints)
    zi, feasible, infeasible = _calculate_griddata(info) # 49.1μs -> 61.9μs (20.7% slower)

def test_edge_all_infeasible():
    # All points infeasible
    xaxis = _AxisInfo('x', [1.0, 2.0, 3.0], [1.0, 2.0, 3.0], is_cat=False, is_log=False, range=(1.0,3.0))
    yaxis = _AxisInfo('y', [10.0, 20.0, 30.0], [10.0, 20.0, 30.0], is_cat=False, is_log=False, range=(10.0,30.0))
    z_values = {(0,0): 1.0, (1,1): 2.0, (2,2): 3.0}
    constraints = [False, False, False]
    info = _SubContourInfo(xaxis, yaxis, z_values, constraints)
    zi, feasible, infeasible = _calculate_griddata(info) # 44.6ms -> 31.1ms (43.1% faster)

def test_edge_all_feasible():
    # All points feasible
    xaxis = _AxisInfo('x', [1.0, 2.0, 3.0], [1.0, 2.0, 3.0], is_cat=False, is_log=False, range=(1.0,3.0))
    yaxis = _AxisInfo('y', [10.0, 20.0, 30.0], [10.0, 20.0, 30.0], is_cat=False, is_log=False, range=(10.0,30.0))
    z_values = {(0,0): 1.0, (1,1): 2.0, (2,2): 3.0}
    constraints = [True, True, True]
    info = _SubContourInfo(xaxis, yaxis, z_values, constraints)
    zi, feasible, infeasible = _calculate_griddata(info) # 44.4ms -> 31.3ms (41.8% faster)

# ---- Large Scale Test Cases ----

def test_large_scale_numerical():
    # Large grid, numerical axes
    n = 100  # under 1000 elements as per instructions
    xvals = list(range(n))
    yvals = list(range(n,2*n))
    xaxis = _AxisInfo('x', xvals, xvals, is_cat=False, is_log=False, range=(0,n-1))
    yaxis = _AxisInfo('y', yvals, yvals, is_cat=False, is_log=False, range=(n,2*n-1))
    z_values = {(i,i): float(i) for i in range(n)}
    constraints = [True if i%2==0 else False for i in range(n)]
    info = _SubContourInfo(xaxis, yaxis, z_values, constraints)
    zi, feasible, infeasible = _calculate_griddata(info) # 44.2ms -> 31.1ms (42.0% faster)


def test_large_scale_mixed():
    # Large grid, mixed axes
    n = 80
    xvals = [f'cat{i}' for i in range(n)]
    yvals = list(range(n))
    xaxis = _AxisInfo('x', xvals, xvals, is_cat=True, is_log=False, range=(0,n-1))
    yaxis = _AxisInfo('y', yvals, yvals, is_cat=False, is_log=False, range=(0,n-1))
    z_values = {(i,i): float(i) for i in range(n)}
    constraints = [True if i%4==0 else False for i in range(n)]
    info = _SubContourInfo(xaxis, yaxis, z_values, constraints)
    zi, feasible, infeasible = _calculate_griddata(info) # 46.5ms -> 32.6ms (42.4% faster)

def test_large_scale_all_infeasible():
    # Large grid, all infeasible
    n = 90
    xvals = list(range(n))
    yvals = list(range(n,2*n))
    xaxis = _AxisInfo('x', xvals, xvals, is_cat=False, is_log=False, range=(0,n-1))
    yaxis = _AxisInfo('y', yvals, yvals, is_cat=False, is_log=False, range=(n,2*n-1))
    z_values = {(i,i): float(i) for i in range(n)}
    constraints = [False for _ in range(n)]
    info = _SubContourInfo(xaxis, yaxis, z_values, constraints)
    zi, feasible, infeasible = _calculate_griddata(info) # 44.6ms -> 30.9ms (44.4% faster)

def test_large_scale_all_feasible():
    # Large grid, all feasible
    n = 70
    xvals = list(range(n))
    yvals = list(range(n,2*n))
    xaxis = _AxisInfo('x', xvals, xvals, is_cat=False, is_log=False, range=(0,n-1))
    yaxis = _AxisInfo('y', yvals, yvals, is_cat=False, is_log=False, range=(n,2*n-1))
    z_values = {(i,i): float(i) for i in range(n)}
    constraints = [True for _ in range(n)]
    info = _SubContourInfo(xaxis, yaxis, z_values, constraints)
    zi, feasible, infeasible = _calculate_griddata(info) # 45.1ms -> 31.6ms (42.7% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-_calculate_griddata-mhobjf2z and push.

Codeflash Static Badge

The optimized code achieves a **42% speedup** through several key algorithmic and data structure improvements:

**1. Vectorized Array Operations**
- Replaced manual loops with NumPy array operations in `_calculate_griddata()`. Instead of iterating through `zip(xaxis.values, yaxis.values)` and checking each pair individually, the code creates NumPy arrays and uses vectorized masking: `mask_valid = (xaxis_vals != None) & (yaxis_vals != None)`. This eliminates Python-level iteration overhead.

**2. Hash Map Lookups Instead of Linear Search**
- Replaced expensive `list.index()` calls with O(1) dictionary lookups. The original code used `xaxis.indices.index(x_value)` which is O(n), while the optimized version precomputes `xindex_lookup = {val: idx for idx, val in enumerate(xaxis.indices)}` for O(1) access.

**3. Vectorized Distance Calculations in `_create_zmap()`**
- Transformed the core distance calculation from individual `np.argmin(np.abs(xi - x))` calls per point to a fully vectorized operation using broadcasting: `x_indices = np.abs(xi_arr[np.newaxis, :] - x_array[:, np.newaxis])`. This computes all distances at once, dramatically reducing function call overhead.

**4. Sparse Matrix Construction Optimizations**
- In `_interpolate_zmap()`, precomputed `zmap_keys = set(zmap.keys())` to avoid repeated dictionary key existence checks, and added the diagonal coefficient (4) explicitly rather than accumulating it through neighbor iterations.

**5. List Comprehension and Memory Layout Improvements**
- Replaced `map()` calls with list comprehensions (`[float(x) for x in values]` vs `list(map(lambda x: float(x), values))`) for better performance in modern Python.

**Performance Impact by Test Case:**
The optimizations show consistent 42-44% improvements across all numerical and mixed-type test cases, with the largest gains in scenarios involving many data points where the vectorized operations and hash lookups provide maximum benefit. The two edge cases showing slower performance (empty values, same axis names) represent degenerate cases where the overhead of creating NumPy arrays outweighs the benefits, but these are uncommon in real usage.

**Workload Suitability:**
These optimizations are particularly effective for medium to large-scale contour plotting workloads typical in hyperparameter optimization visualization, where the function processes hundreds of trial points across multiple parameter dimensions.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 November 7, 2025 03:51
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant