Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Nov 4, 2025

📄 56% (0.56x) speedup for _validate_constraints in optuna/samplers/nsgaii/_constraints_evaluation.py

⏱️ Runtime : 17.5 milliseconds 11.2 milliseconds (best of 250 runs)

📝 Explanation and details

The optimization replaces np.any(np.isnan(np.array(_constraints))) with np.isnan(np.asarray(_constraints)).any(), which provides a 55% speedup by reducing unnecessary array operations.

Key changes:

  • np.asarray() instead of np.array(): np.asarray() avoids copying data if the input is already a numpy array, while np.array() always creates a new copy
  • .any() method instead of np.any() function: Using the array method directly is slightly more efficient than the numpy function wrapper

Why this optimization works:
The line profiler shows the NaN check (np.any(np.isnan(np.array(_constraints)))) consumes 78.1% of the original runtime. By eliminating unnecessary data copying with np.asarray() and using the more direct .any() method, this critical bottleneck is reduced from 31.2ms to 16.8ms - a 46% improvement on the most expensive operation.

Test case performance:
The optimization is particularly effective for:

  • Large populations: 55-64% faster on 1000+ trial datasets where the NaN check dominates runtime
  • Valid constraint scenarios: 20-37% faster when all trials have proper constraints and the function primarily performs NaN validation
  • Mixed scenarios: 17-29% faster even when some trials are missing constraints or have validation errors

The optimization maintains identical behavior and error handling while significantly reducing computational overhead in the constraint validation bottleneck.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 45 Passed
🌀 Generated Regression Tests 35 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
⚙️ Existing Unit Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
samplers_tests/test_nsgaii.py::test_rank_population_missing_constraint_values 127μs 102μs 24.8%✅
samplers_tests/test_nsgaii.py::test_validate_constraints 28.0μs 22.8μs 22.9%✅
🌀 Generated Regression Tests and Runtime
import warnings
from types import SimpleNamespace

# imports
import pytest
from optuna.samplers.nsgaii._constraints_evaluation import \
    _validate_constraints


# Mock FrozenTrial class for testing
class FrozenTrial:
    def __init__(self, number, system_attrs=None):
        self.number = number
        self.system_attrs = system_attrs or {}

# Constants (as used in the original function)
_CONSTRAINTS_KEY = "constraints"
from optuna.samplers.nsgaii._constraints_evaluation import \
    _validate_constraints

# unit tests

# --- Basic Test Cases ---

def test_no_constraints_flag_returns():
    # If is_constrained is False, function should return immediately, even if population is anything
    population = [FrozenTrial(0, {_CONSTRAINTS_KEY: [1.0, 2.0]})]
    _validate_constraints(population, is_constrained=False) # 586ns -> 535ns (9.53% faster)

def test_all_trials_have_equal_length_constraints():
    # All trials have constraints of equal length, should not raise
    population = [
        FrozenTrial(0, {_CONSTRAINTS_KEY: [1.0, 2.0]}),
        FrozenTrial(1, {_CONSTRAINTS_KEY: [0.0, -1.0]}),
        FrozenTrial(2, {_CONSTRAINTS_KEY: [3.5, 4.2]}),
    ]
    _validate_constraints(population, is_constrained=True) # 24.1μs -> 18.5μs (30.2% faster)

def test_empty_population():
    # Empty population should not raise, regardless of is_constrained
    _validate_constraints([], is_constrained=False) # 470ns -> 538ns (12.6% slower)
    _validate_constraints([], is_constrained=True) # 1.52μs -> 1.55μs (1.68% slower)

def test_single_trial_with_constraints():
    # Single trial with constraints should not raise
    population = [FrozenTrial(0, {_CONSTRAINTS_KEY: [1.0, 2.0, 3.0]})]
    _validate_constraints(population, is_constrained=True) # 14.3μs -> 11.9μs (20.6% faster)

def test_single_trial_without_constraints():
    # Single trial without constraints should warn, but not raise
    population = [FrozenTrial(0, {})]
    with warnings.catch_warnings(record=True) as w:
        warnings.simplefilter("always")
        _validate_constraints(population, is_constrained=True) # 7.15μs -> 7.41μs (3.43% slower)

# --- Edge Test Cases ---

def test_trial_with_nan_constraint_raises():
    # Any NaN in constraints should raise ValueError
    import math
    population = [
        FrozenTrial(0, {_CONSTRAINTS_KEY: [1.0, float('nan')]}),
        FrozenTrial(1, {_CONSTRAINTS_KEY: [2.0, 3.0]}),
    ]
    with pytest.raises(ValueError, match="NaN is not acceptable as constraint value."):
        _validate_constraints(population, is_constrained=True) # 15.1μs -> 13.0μs (16.4% faster)

def test_trial_with_different_constraint_lengths_raises():
    # Trials with different number of constraints should raise ValueError
    population = [
        FrozenTrial(0, {_CONSTRAINTS_KEY: [1.0, 2.0]}),
        FrozenTrial(1, {_CONSTRAINTS_KEY: [3.0]}),
    ]
    with pytest.raises(ValueError, match="different numbers of constraints"):
        _validate_constraints(population, is_constrained=True) # 18.6μs -> 14.4μs (29.3% faster)

def test_some_trials_missing_constraints_warn():
    # Some trials missing constraints should warn for each missing, but not raise
    population = [
        FrozenTrial(0, {_CONSTRAINTS_KEY: [1.0, 2.0]}),
        FrozenTrial(1, {}),  # missing constraints
        FrozenTrial(2, {_CONSTRAINTS_KEY: [3.0, 4.0]}),
        FrozenTrial(3, {}),  # missing constraints
    ]
    with warnings.catch_warnings(record=True) as w:
        warnings.simplefilter("always")
        _validate_constraints(population, is_constrained=True) # 24.3μs -> 20.5μs (18.7% faster)
        warn_msgs = [str(warn.message) for warn in w]

def test_all_trials_missing_constraints_warn():
    # All trials missing constraints should warn for each, but not raise
    population = [FrozenTrial(i, {}) for i in range(3)]
    with warnings.catch_warnings(record=True) as w:
        warnings.simplefilter("always")
        _validate_constraints(population, is_constrained=True) # 8.47μs -> 8.61μs (1.58% slower)
        warn_msgs = [str(warn.message) for warn in w]

def test_constraints_empty_lists():
    # All constraints are empty lists, should not raise
    population = [
        FrozenTrial(0, {_CONSTRAINTS_KEY: []}),
        FrozenTrial(1, {_CONSTRAINTS_KEY: []}),
    ]
    _validate_constraints(population, is_constrained=True) # 17.7μs -> 12.9μs (37.1% faster)

def test_mixture_empty_and_nonempty_constraints_raises():
    # Mixture of empty and non-empty constraints should raise ValueError
    population = [
        FrozenTrial(0, {_CONSTRAINTS_KEY: []}),
        FrozenTrial(1, {_CONSTRAINTS_KEY: [1.0]}),
    ]
    with pytest.raises(ValueError, match="different numbers of constraints"):
        _validate_constraints(population, is_constrained=True) # 12.1μs -> 9.76μs (23.7% faster)

def test_constraints_with_integers_and_floats():
    # Constraints can be ints or floats, as long as lengths match and no NaN
    population = [
        FrozenTrial(0, {_CONSTRAINTS_KEY: [1, 2.0]}),
        FrozenTrial(1, {_CONSTRAINTS_KEY: [3, 4.0]}),
    ]
    _validate_constraints(population, is_constrained=True) # 18.2μs -> 14.1μs (29.0% faster)

def test_constraints_with_negative_and_zero_values():
    # Negative and zero values are valid
    population = [
        FrozenTrial(0, {_CONSTRAINTS_KEY: [0, -1.5, 2.3]}),
        FrozenTrial(1, {_CONSTRAINTS_KEY: [0, -2.0, 3.1]}),
    ]
    _validate_constraints(population, is_constrained=True) # 16.0μs -> 12.2μs (31.5% faster)

# --- Large Scale Test Cases ---

def test_large_population_all_valid():
    # Large population, all constraints valid and same length
    population = [FrozenTrial(i, {_CONSTRAINTS_KEY: [float(i), float(i+1)]}) for i in range(1000)]
    _validate_constraints(population, is_constrained=True) # 2.30ms -> 1.40ms (63.7% faster)

def test_large_population_some_missing_constraints_warn():
    # Large population, some trials missing constraints
    population = [
        FrozenTrial(i, {_CONSTRAINTS_KEY: [float(i), float(i+1)]}) if i % 10 != 0 else FrozenTrial(i, {})
        for i in range(1000)
    ]
    with warnings.catch_warnings(record=True) as w:
        warnings.simplefilter("always")
        _validate_constraints(population, is_constrained=True) # 2.27ms -> 1.44ms (57.5% faster)
        warn_msgs = [str(warn.message) for warn in w]

def test_large_population_with_nan_raises():
    # Large population, one trial has NaN constraint
    population = [FrozenTrial(i, {_CONSTRAINTS_KEY: [float(i), float(i+1)]}) for i in range(999)]
    population.append(FrozenTrial(999, {_CONSTRAINTS_KEY: [0.0, float('nan')]}))
    with pytest.raises(ValueError, match="NaN is not acceptable as constraint value."):
        _validate_constraints(population, is_constrained=True) # 2.29ms -> 1.40ms (63.4% faster)

def test_large_population_with_length_mismatch_raises():
    # Large population, one trial has different length constraints
    population = [FrozenTrial(i, {_CONSTRAINTS_KEY: [float(i), float(i+1)]}) for i in range(999)]
    population.append(FrozenTrial(999, {_CONSTRAINTS_KEY: [0.0, 1.0, 2.0]}))
    with pytest.raises(ValueError, match="different numbers of constraints"):
        _validate_constraints(population, is_constrained=True) # 85.2μs -> 81.8μs (4.11% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import warnings
from types import SimpleNamespace

# Required imports for the function and test setup
import numpy as np
# imports
import pytest
from optuna.samplers.nsgaii._constraints_evaluation import \
    _validate_constraints


# Mock FrozenTrial for testing purposes
class FrozenTrial:
    def __init__(self, number, system_attrs):
        self.number = number
        self.system_attrs = system_attrs

# Constants used in the function
_CONSTRAINTS_KEY = "constraints"
from optuna.samplers.nsgaii._constraints_evaluation import \
    _validate_constraints

# --- Unit Tests ---

# Basic Test Cases

def test_validate_constraints_not_constrained():
    """Should do nothing if is_constrained is False."""
    population = [
        FrozenTrial(0, { _CONSTRAINTS_KEY: [1.0, 2.0] }),
        FrozenTrial(1, { _CONSTRAINTS_KEY: [3.0, 4.0] }),
    ]
    # Should not raise
    _validate_constraints(population, is_constrained=False) # 560ns -> 563ns (0.533% slower)

def test_validate_constraints_all_have_constraints():
    """Should pass when all trials have the same number of constraints and no NaNs."""
    population = [
        FrozenTrial(0, { _CONSTRAINTS_KEY: [1.0, 2.0] }),
        FrozenTrial(1, { _CONSTRAINTS_KEY: [3.0, 4.0] }),
        FrozenTrial(2, { _CONSTRAINTS_KEY: [5.0, 6.0] }),
    ]
    _validate_constraints(population, is_constrained=True) # 25.8μs -> 20.9μs (23.4% faster)

def test_validate_constraints_single_trial():
    """Should pass with a single trial."""
    population = [FrozenTrial(0, { _CONSTRAINTS_KEY: [1.0, 2.0, 3.0] })]
    _validate_constraints(population, is_constrained=True) # 12.6μs -> 10.6μs (18.9% faster)

def test_validate_constraints_empty_population():
    """Should pass with empty population."""
    population = []
    _validate_constraints(population, is_constrained=True) # 1.74μs -> 1.78μs (2.24% slower)

def test_validate_constraints_some_missing_constraints_warns():
    """Should warn if some trials are missing constraints, but not raise."""
    population = [
        FrozenTrial(0, { _CONSTRAINTS_KEY: [1.0, 2.0] }),
        FrozenTrial(1, {}),  # Missing constraints
        FrozenTrial(2, { _CONSTRAINTS_KEY: [3.0, 4.0] }),
    ]
    with warnings.catch_warnings(record=True) as w:
        warnings.simplefilter("always")
        _validate_constraints(population, is_constrained=True) # 24.8μs -> 21.1μs (17.6% faster)

# Edge Test Cases

def test_validate_constraints_nan_in_constraints_raises():
    """Should raise ValueError if any constraint value is NaN."""
    population = [
        FrozenTrial(0, { _CONSTRAINTS_KEY: [1.0, float('nan')] }),
        FrozenTrial(1, { _CONSTRAINTS_KEY: [3.0, 4.0] }),
    ]
    with pytest.raises(ValueError, match="NaN is not acceptable as constraint value."):
        _validate_constraints(population, is_constrained=True) # 12.9μs -> 10.7μs (20.0% faster)

def test_validate_constraints_mismatched_constraint_lengths_raises():
    """Should raise ValueError if trials have different numbers of constraints."""
    population = [
        FrozenTrial(0, { _CONSTRAINTS_KEY: [1.0, 2.0] }),
        FrozenTrial(1, { _CONSTRAINTS_KEY: [3.0] }),  # Only one constraint
        FrozenTrial(2, { _CONSTRAINTS_KEY: [5.0, 6.0] }),
    ]
    with pytest.raises(ValueError, match="Trials with different numbers of constraints cannot be compared."):
        _validate_constraints(population, is_constrained=True) # 18.0μs -> 14.2μs (26.9% faster)

def test_validate_constraints_all_missing_constraints_warns():
    """Should warn for every trial if all are missing constraints."""
    population = [
        FrozenTrial(0, {}),
        FrozenTrial(1, {}),
    ]
    with warnings.catch_warnings(record=True) as w:
        warnings.simplefilter("always")
        _validate_constraints(population, is_constrained=True) # 9.00μs -> 9.33μs (3.52% slower)

def test_validate_constraints_empty_constraints_list():
    """Should pass if all trials have empty constraints lists."""
    population = [
        FrozenTrial(0, { _CONSTRAINTS_KEY: [] }),
        FrozenTrial(1, { _CONSTRAINTS_KEY: [] }),
    ]
    _validate_constraints(population, is_constrained=True) # 18.5μs -> 14.5μs (27.9% faster)

def test_validate_constraints_mixed_empty_and_nonempty_constraints_raises():
    """Should raise if some trials have empty constraints and others do not."""
    population = [
        FrozenTrial(0, { _CONSTRAINTS_KEY: [] }),
        FrozenTrial(1, { _CONSTRAINTS_KEY: [1.0] }),
    ]
    with pytest.raises(ValueError, match="Trials with different numbers of constraints cannot be compared."):
        _validate_constraints(population, is_constrained=True) # 12.2μs -> 9.73μs (25.7% faster)

def test_validate_constraints_none_in_constraints_raises():
    """Should raise if constraint list contains None (not NaN, but not a float)."""
    population = [
        FrozenTrial(0, { _CONSTRAINTS_KEY: [1.0, None] }),
        FrozenTrial(1, { _CONSTRAINTS_KEY: [3.0, 4.0] }),
    ]
    # np.isnan(None) raises TypeError, so function should propagate that error
    with pytest.raises(TypeError):
        _validate_constraints(population, is_constrained=True) # 9.70μs -> 10.1μs (4.27% slower)

def test_validate_constraints_non_numeric_constraint_raises():
    """Should raise if constraint list contains non-numeric values."""
    population = [
        FrozenTrial(0, { _CONSTRAINTS_KEY: [1.0, "foo"] }),
        FrozenTrial(1, { _CONSTRAINTS_KEY: [3.0, 4.0] }),
    ]
    with pytest.raises(TypeError):
        _validate_constraints(population, is_constrained=True) # 11.4μs -> 11.0μs (3.02% faster)

def test_validate_constraints_constraints_are_tuple():
    """Should work if constraints are tuples instead of lists."""
    population = [
        FrozenTrial(0, { _CONSTRAINTS_KEY: (1.0, 2.0) }),
        FrozenTrial(1, { _CONSTRAINTS_KEY: (3.0, 4.0) }),
    ]
    _validate_constraints(population, is_constrained=True) # 20.6μs -> 16.2μs (27.3% faster)

# Large Scale Test Cases

def test_validate_constraints_large_population():
    """Should pass with a large population with identical constraints."""
    N = 1000
    constraints = [float(i) for i in range(10)]
    population = [FrozenTrial(i, { _CONSTRAINTS_KEY: constraints[:] }) for i in range(N)]
    _validate_constraints(population, is_constrained=True) # 2.51ms -> 1.62ms (55.1% faster)

def test_validate_constraints_large_population_with_one_missing_warns():
    """Should warn if one trial in a large population is missing constraints."""
    N = 999
    constraints = [float(i) for i in range(10)]
    population = [FrozenTrial(i, { _CONSTRAINTS_KEY: constraints[:] }) for i in range(N)]
    population.append(FrozenTrial(N, {}))  # One missing
    with warnings.catch_warnings(record=True) as w:
        warnings.simplefilter("always")
        _validate_constraints(population, is_constrained=True) # 2.51ms -> 1.63ms (54.3% faster)

def test_validate_constraints_large_population_with_one_nan_raises():
    """Should raise ValueError if one trial in a large population has NaN."""
    N = 999
    constraints = [float(i) for i in range(10)]
    population = [FrozenTrial(i, { _CONSTRAINTS_KEY: constraints[:] }) for i in range(N)]
    # Insert NaN in the last trial
    bad_constraints = constraints[:]
    bad_constraints[5] = float('nan')
    population.append(FrozenTrial(N, { _CONSTRAINTS_KEY: bad_constraints }))
    with pytest.raises(ValueError, match="NaN is not acceptable as constraint value."):
        _validate_constraints(population, is_constrained=True) # 2.50ms -> 1.62ms (54.7% faster)

def test_validate_constraints_large_population_mismatched_lengths_raises():
    """Should raise ValueError if one trial in a large population has a different constraints length."""
    N = 999
    constraints = [float(i) for i in range(10)]
    population = [FrozenTrial(i, { _CONSTRAINTS_KEY: constraints[:] }) for i in range(N)]
    # One trial with a shorter constraints list
    population.append(FrozenTrial(N, { _CONSTRAINTS_KEY: constraints[:5] }))
    with pytest.raises(ValueError, match="Trials with different numbers of constraints cannot be compared."):
        _validate_constraints(population, is_constrained=True) # 2.51ms -> 1.62ms (55.0% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-_validate_constraints-mhl2hjob and push.

Codeflash Static Badge

The optimization replaces `np.any(np.isnan(np.array(_constraints)))` with `np.isnan(np.asarray(_constraints)).any()`, which provides a **55% speedup** by reducing unnecessary array operations.

**Key changes:**
- **`np.asarray()` instead of `np.array()`**: `np.asarray()` avoids copying data if the input is already a numpy array, while `np.array()` always creates a new copy
- **`.any()` method instead of `np.any()` function**: Using the array method directly is slightly more efficient than the numpy function wrapper

**Why this optimization works:**
The line profiler shows the NaN check (`np.any(np.isnan(np.array(_constraints)))`) consumes 78.1% of the original runtime. By eliminating unnecessary data copying with `np.asarray()` and using the more direct `.any()` method, this critical bottleneck is reduced from 31.2ms to 16.8ms - a 46% improvement on the most expensive operation.

**Test case performance:**
The optimization is particularly effective for:
- **Large populations**: 55-64% faster on 1000+ trial datasets where the NaN check dominates runtime
- **Valid constraint scenarios**: 20-37% faster when all trials have proper constraints and the function primarily performs NaN validation
- **Mixed scenarios**: 17-29% faster even when some trials are missing constraints or have validation errors

The optimization maintains identical behavior and error handling while significantly reducing computational overhead in the constraint validation bottleneck.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 November 4, 2025 21:14
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant