Support algorithms #40

yannrichet-asnr · 2025-10-18T19:25:33Z

This commit adds support for using pandas DataFrames as input_variables,
enabling non-factorial parametric study designs alongside the existing
dict-based factorial (Cartesian product) approach.

Implementation (fz/helpers.py):

Updated generate_variable_combinations() to detect DataFrame input
DataFrame: each row represents one case (non-factorial design)
Dict: existing Cartesian product behavior (factorial design)
Added optional pandas import with HAS_PANDAS flag
Enhanced type hints and docstring with examples
Added informative logging when DataFrame detected
Raises TypeError for invalid input types

Key features:

Factorial design (dict): Creates ALL combinations (Cartesian product)
Example: {"x": [1,2], "y": [3,4]} → 4 cases
Non-factorial design (DataFrame): Only specified combinations
Example: pd.DataFrame({"x":[1,2], "y":[3,4]}) → 2 cases (rows)

Use cases for DataFrames:

Variables with constraints or dependencies
Latin Hypercube Sampling, Sobol sequences
Imported designs from DOE software
Optimization algorithm sample points
Sensitivity analysis (one-at-a-time)
Sparse or adaptive sampling
Any irregular design pattern

Tests (tests/test_dataframe_input.py):

12 comprehensive tests covering all scenarios
Unit tests for generate_variable_combinations()
Integration tests with fzr()
Tests for DataFrame vs dict behavior comparison
Tests for mixed types, constraints, repeated values
Input validation tests
All 12 tests pass successfully

Documentation:

README.md: New "Input Variables: Factorial vs Non-Factorial Designs" section
- Comparison of dict (factorial) vs DataFrame (non-factorial)
- When to use each approach
- Examples with LHS, constraint-based designs
examples/dataframe_input.md: Comprehensive guide with:
- 7 practical examples (constraints, LHS, Sobol, DOE import, etc.)
- Comparison table
- Tips and best practices
- Common patterns and workflows
Updated Features section to mention both design types
Updated DataFrame I/O description

Backward compatibility:

Existing dict-based code continues to work unchanged
DataFrame support requires pandas (optional dependency)
Graceful handling when pandas not installed

Example usage:

import pandas as pd
from fz import fzr

# Non-factorial: specific combinations only
input_variables = pd.DataFrame({
    "temp": [100, 200, 100, 300],
    "pressure": [1.0, 1.0, 2.0, 1.5]
})
# Creates 4 cases: (100,1.0), (200,1.0), (100,2.0), (300,1.5)

results = fzr(input_file, input_variables, model, calculators)

Copilot

Pull Request Overview

Adds support for pandas DataFrame input to enable non-factorial (row-wise) parametric designs alongside existing dict-based factorial Cartesian product generation.

Extends generate_variable_combinations to accept DataFrames and return one case per row.
Adds comprehensive tests and documentation/examples differentiating factorial dict vs non-factorial DataFrame usage.
Updates README and adds a detailed example guide for DataFrame-driven designs.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
fz/helpers.py	Implements DataFrame handling in generate_variable_combinations with optional pandas import and logging.
tests/test_dataframe_input.py	Adds unit and integration tests for DataFrame vs dict behavior and input validation.
examples/dataframe_input.md	New extensive guide on using DataFrames for non-factorial designs with multiple sampling patterns.
README.md	Updates feature list and documents factorial vs non-factorial input variable formats with examples.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-18T19:26:26Z

fz/helpers.py



-def generate_variable_combinations(input_variables: Dict) -> List[Dict]:
+def generate_variable_combinations(input_variables: Union[Dict, Any]) -> List[Dict]:


The type hint Union[Dict, Any] effectively collapses to Any and advertises acceptance of all types, while the function raises TypeError for non-dict/non-DataFrame inputs. Narrow the annotation to accepted types only, e.g. Union[Dict[str, Any], 'pd.DataFrame'] guarded by a TYPE_CHECKING block or a Protocol to improve static analysis.

Copilot · 2025-10-18T19:26:26Z

fz/helpers.py

+        var_combinations = []
+        for _, row in input_variables.iterrows():
+            var_combinations.append(row.to_dict())


Using iterrows is relatively slow and may coerce dtypes; you can replace this block with var_combinations = input_variables.to_dict(orient='records') for a vectorized, faster conversion that preserves dtypes.

Suggested change

var_combinations = []

for _, row in input_variables.iterrows():

var_combinations.append(row.to_dict())

var_combinations = input_variables.to_dict(orient='records')

Copilot · 2025-10-18T19:26:26Z

examples/dataframe_input.md

+| **Example** | `{"x": [1,2], "y": [3,4]}` → 4 cases | `pd.DataFrame({"x":[1,2], "y":[3,4]})` → 2 cases |
+| **Constraints** | Cannot handle constraints | Can handle constraints |
+| **Sampling** | Grid-based | Any sampling method |
+


Each line has a double leading pipe '||', which will render an extra empty column or break the table. Remove one leading pipe per line so the table starts with a single | (e.g. | Aspect | Dict (Factorial) | DataFrame (Non-Factorial) |).

Suggested change

Copilot · 2025-10-18T19:26:26Z

tests/test_dataframe_input.py

+        model = {
+            "formulaprefix": "@",
+            "delim": "{}",
+            "commentline": "#",
+            "output": {
+                "result": "grep 'result:' output.txt | awk '{print $2}'"
+            }
+        }


[nitpick] The same model dict is duplicated across multiple tests (e.g., lines 171–178, 201–208, 257–263). Consider extracting it into a fixture or a class attribute to reduce repetition and ease future changes.

This commit adds support for using pandas DataFrames as input_variables, enabling non-factorial parametric study designs alongside the existing dict-based factorial (Cartesian product) approach. Implementation (fz/helpers.py): - Updated generate_variable_combinations() to detect DataFrame input - DataFrame: each row represents one case (non-factorial design) - Dict: existing Cartesian product behavior (factorial design) - Added optional pandas import with HAS_PANDAS flag - Enhanced type hints and docstring with examples - Added informative logging when DataFrame detected - Raises TypeError for invalid input types Key features: - Factorial design (dict): Creates ALL combinations (Cartesian product) Example: {"x": [1,2], "y": [3,4]} → 4 cases - Non-factorial design (DataFrame): Only specified combinations Example: pd.DataFrame({"x":[1,2], "y":[3,4]}) → 2 cases (rows) Use cases for DataFrames: - Variables with constraints or dependencies - Latin Hypercube Sampling, Sobol sequences - Imported designs from DOE software - Optimization algorithm sample points - Sensitivity analysis (one-at-a-time) - Sparse or adaptive sampling - Any irregular design pattern Tests (tests/test_dataframe_input.py): - 12 comprehensive tests covering all scenarios - Unit tests for generate_variable_combinations() - Integration tests with fzr() - Tests for DataFrame vs dict behavior comparison - Tests for mixed types, constraints, repeated values - Input validation tests - All 12 tests pass successfully Documentation: - README.md: New "Input Variables: Factorial vs Non-Factorial Designs" section * Comparison of dict (factorial) vs DataFrame (non-factorial) * When to use each approach * Examples with LHS, constraint-based designs - examples/dataframe_input.md: Comprehensive guide with: * 7 practical examples (constraints, LHS, Sobol, DOE import, etc.) * Comparison table * Tips and best practices * Common patterns and workflows - Updated Features section to mention both design types - Updated DataFrame I/O description Backward compatibility: - Existing dict-based code continues to work unchanged - DataFrame support requires pandas (optional dependency) - Graceful handling when pandas not installed Example usage: ```python import pandas as pd from fz import fzr # Non-factorial: specific combinations only input_variables = pd.DataFrame({ "temp": [100, 200, 100, 300], "pressure": [1.0, 1.0, 2.0, 1.5] }) # Creates 4 cases: (100,1.0), (200,1.0), (100,2.0), (300,1.5) results = fzr(input_file, input_variables, model, calculators) ``` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

The integration tests were failing on Windows due to use of bash features that don't work reliably across platforms: - 'source' command (inconsistent behavior on Windows Git Bash) - bash arithmetic with $((x + y)) - 'awk' command (not always available on Windows) Changes: - Simplified calculator script to extract pre-computed sum from input.txt instead of re-computing it with bash arithmetic - The formula @{$x + $y} is already evaluated by fz during compilation, so the script just extracts the result - Replaced 'awk' with 'cut' and 'tr' for output parsing (more portable) - Uses only basic shell commands available on all platforms: grep, cut, echo, tr The test approach now mirrors test_fzo_fzr_coherence.py which uses simple, portable bash scripts that work across Linux, macOS, and Windows. All 12 tests pass on Linux after this change. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Changed the DataFrame integration tests to use exactly the same script pattern as test_fzo_fzr_coherence.py, which is known to work on Windows. Changes: - Back to using 'source input.txt' to read variables - Back to using bash arithmetic: result=$((x + y)) - Output format: 'echo "result = $result" > output.txt' (spaces around =) - Parsing: 'grep "result = " output.txt | cut -d "=" -f2' (no tr command) This exactly mirrors the working pattern from test_fzo_fzr_coherence.py lines 117-120 which successfully runs on Windows, macOS, and Linux in CI. The previous attempt avoided 'source' and bash arithmetic, but that created a different issue. The test_fzo_fzr_coherence.py tests prove that these commands DO work in the CI environment on all platforms. All 12 tests pass on Linux. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

* Add Windows bash availability check with helpful error message; ensure subprocess uses bash on Windows; add related tests and documentation. * ensure awk & cut are available * use msys2 in CI * do not check for cat in msys2... (try) * fast error if bash unavailable on windows * check windows bash concistently with core/runners * factorize windows bash get function * select tests by OS * try centralize system exec * fix bash support on win (from win & claude) * add bc alongside bash for win * for now do not support win batch commands (like timeout)

During the rebase, the bool(input_variables) checks were not properly handling DataFrame inputs, causing "The truth value of a DataFrame is ambiguous" errors. Changes: - Updated fzr() type hint to accept Union[Dict, "pandas.DataFrame"] - Enhanced docstring to document DataFrame support for non-factorial designs - Fixed bool(input_variables) checks in core.py and helpers.py to handle both dict and DataFrame types properly - DataFrame empty check: not input_variables.empty - Dict empty check: bool(input_variables) All 12 DataFrame input tests now pass successfully. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

The _raw field that preserved original algorithm output is now removed from the returned dict. Raw content is no longer needed in the return structure since: 1. HTML/Markdown content is saved to files and referenced by file path 2. JSON content is parsed and saved as both json_data and json_file 3. Key=value content is parsed and saved as keyvalue_data and txt_file 4. Plain text content is kept in 'text' field if no format detected 5. Data dict is always available in 'data' field Changes: - Removed processed['_raw'] = display_dict from _get_and_process_analysis() - Added logging of text content before processing in _get_and_process_analysis() - Removed all checks for '_raw' field in fzd() function - Updated HTML results generation to use processed content with file links - The iteration summary HTML now links to analysis files instead of embedding raw content All 9 content detection tests still pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Implemented the same directory handling for fzd as fzr: - Existing analysis_dir is renamed with timestamp (e.g., analysis_dir_2025-10-24_19-30-34) - Renamed directory iterations are included in cache paths for result reuse - Cache now checks both current and previous run iterations Changes: - Added ensure_unique_directory() call for fzd's analysis_dir - Build cache paths dynamically to include: * Current run iterations (iter001, iter002, etc.) * Previous run iterations from renamed directory (up to 99 iterations) - Cache paths are checked before actual calculators for efficiency This prevents data loss from overwriting and enables result reuse across runs. Also removed accidentally committed test files (fzd_analysis directory). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Updated documentation to remove references to _raw field which is no longer included in fzd results. The _raw field was removed to keep the return structure clean - raw content is either: - Saved to files (HTML, markdown) with file references - Parsed into Python objects (JSON, key=value) - Kept as plain text if no format detected - Logged to console before processing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Cache is now only used when explicitly requested in calculators list, matching fzr's behavior. Changes: - "cache://_" expands to all iterations from renamed previous run (e.g., results_fzd_2025-10-24_19-30-34/iter001 through iter099) - If ANY cache calculator is in the list, previous iterations from current run are automatically added for efficiency - If no cache is requested, no cache paths are added at all Examples: calculators=["sh://calc.sh"] → No cache used at all calculators=["cache://_", "sh://calc.sh"] → Uses renamed dir iterations + current run previous iterations calculators=["cache://some/path", "sh://calc.sh"] → Uses specified path + current run previous iterations This gives users explicit control over caching while still providing sensible automatic behavior for current run iterations when cache is enabled. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

* Add Windows bash availability check with helpful error message; ensure subprocess uses bash on Windows; add related tests and documentation. * ensure awk & cut are available * use msys2 in CI * do not check for cat in msys2... (try) * fast error if bash unavailable on windows * check windows bash concistently with core/runners * factorize windows bash get function * select tests by OS * try centralize system exec * fix bash support on win (from win & claude) * add bc alongside bash for win * for now do not support win batch commands (like timeout)

Removed unused imports and variables across the codebase: - Removed unused stdlib imports (re, subprocess, tempfile, etc.) - Removed unused type imports (Tuple, Union, List, Optional where unused) - Removed unused helper function imports - Removed unused local variables (pid, used_calculator_uri, case_elapsed, interpreter) - Removed commented debug code All 269 tests pass successfully after cleanup. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

…stall, install_algo, ...

…es regexp in .fz/*

- Fixed indentation errors in interpreter.py and runners.py - Added missing _validate_calculator_uri function in runners.py - Fixed DataFrame validation in fzr() to accept both dict and DataFrame - Added comprehensive validation for fzd arguments with helpful error messages - Added test_no_algorithms.py with 11 algorithm validation tests - Added informative logging when loading algorithms in fzd All 448 tests pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

- Add threading and defaultdict imports to core.py - Add load_aliases import from io module - Fix DataFrame validation in fzr() to accept both dict and DataFrame All 455 tests passing. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

- Move pandas from optional to required dependencies in setup.py and pyproject.toml - Remove PANDAS_AVAILABLE checks and conditional handling throughout codebase - Always return DataFrame from fzo() and fzr() - Simplify code by assuming pandas is always available pandas is now essential for fzd() and provides better data handling in fzo() and fzr(). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Resolves PermissionError on Windows during temporary directory cleanup by restoring the original working directory before the TemporaryDirectory context manager exits. On Windows, you cannot delete a directory that is the current working directory. The tests were calling os.chdir(tmpdir) and then attempting to clean up the directory when the context exited, causing: - PermissionError: [WinError 32] The process cannot access the file because it is being used by another process - PermissionError: [WinError 5] Access is denied Solution: Wrap test logic in try/finally blocks that save and restore the original working directory, allowing Windows to successfully delete temporary directories during cleanup. Fixes #40 (Windows CI failure in test_dict_flattening.py)

- Remove PANDAS_AVAILABLE variable definitions from test files - Remove HAS_PANDAS variable definitions from source files - Remove test_fzd_without_pandas (no longer relevant) - Remove skipif decorators that checked for pandas availability - Simplify conditional checks that tested pandas availability pandas is now always available as a required dependency. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

- Remove orphaned try/except blocks for pandas imports in fz/helpers.py and fz/io.py - Fix broken pandas imports in test files (test_dict_flattening.py, test_fzo_fzr_coherence.py, test_no_algorithms.py) - Add missing pandas import to test_dataframe_input.py - Fix indentation issues in test_dataframe_input.py after removing conditional code All 455 tests now passing. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Created context/ directory with 10 markdown files (4369 lines total) designed for LLM consumption to help understand and suggest fz usage: - overview.md: Framework introduction and key concepts - syntax-guide.md: Variable substitution and formula syntax - core-functions.md: API reference for fzi, fzc, fzo, fzr - model-definition.md: Model configuration guide - calculators.md: Execution backend types (sh, ssh, cache) - formulas-and-interpreters.md: Python and R formula evaluation - parallel-and-caching.md: Parallel execution and caching strategies - quick-examples.md: Common patterns and use case examples - README.md: Documentation guide and usage instructions - INDEX.md: Quick reference index for finding topics Each file includes: - Clear syntax examples with code snippets - Complete working examples - Common patterns and best practices - Cross-references to related topics This documentation will help LLMs provide better assistance with fz by understanding its syntax, features, and typical usage patterns. Co-authored-by: Claude <[email protected]>

Resolves PermissionError on Windows during temporary directory cleanup by restoring the original working directory before the TemporaryDirectory context manager exits. On Windows, you cannot delete a directory that is the current working directory. The tests were calling os.chdir(tmpdir) and then attempting to clean up the directory when the context exited, causing: - PermissionError: [WinError 32] The process cannot access the file because it is being used by another process - PermissionError: [WinError 5] Access is denied Solution: Wrap test logic in try/finally blocks that save and restore the original working directory, allowing Windows to successfully delete temporary directories during cleanup. Fixes #40 (Windows CI failure in test_dict_flattening.py) Co-authored-by: Claude <[email protected]>

Resolved conflicts: - .gitignore: Added results*/ and output/ ignore patterns from main - fz/core.py: Integrated callback support from main while keeping DataFrame-only return logic

Copilot AI review requested due to automatic review settings October 18, 2025 19:25

Copilot AI reviewed Oct 18, 2025

View reviewed changes

yannrichet changed the title ~~Add DataFrame input support for non-factorial parametric designs~~ Support algorithms Oct 23, 2025

yannrichet-asnr force-pushed the implement-algorithms branch from 5077247 to 6b43c56 Compare October 24, 2025 21:28

yannrichet-asnr force-pushed the implement-algorithms branch from f35a769 to 3b8c6c5 Compare November 22, 2025 11:54

yannrichet and others added 25 commits November 22, 2025 13:46

try force unix eol

8162c48

fix path separator for bash on windows

1c1e871

avoid issues for EOL chars on windows

b2c84bb

keep visible when finished, and add total time

72c096c

.

c753b55

parsing melted string in md, html, kv, json, ...

23b2bd9

spec of design algorithms support

d21508d

impl algorithm

fe5c304

impl. fzd

ff1730e

claude.ai spec

647e62a

mv dev doc

6481aac

fix path separator for bash on windows

d44b46f

try setup unified shell for win/lin/macos

ca00364

.

5435735

mv dev doc

2c01b58

yannrichet and others added 20 commits November 22, 2025 13:46

cleanup

d4c6cd9

refactor display -> analysis

5f6225d

rm _raw from fzd output

7c7a60f

up doc

8075708

impl. R algorithms

97dca70

impl algorithm plugin

1787489

fix algo loading on win

4fa347b

more consistent args

c67055a

use install_model, install_algorithm instead of unclear names like in…

ba2cb57

…stall, install_algo, ...

auto flatten output dicts if any

1339b34

some refactoring

4a4c9c0

fix api

f131a5e

working notebook with modelica, fzr, fzd, ...

bd6008e

format md

64e8172

fix input_path/input_file

dbb2d67

enhance args interpretation: raw json, thant json file, then json fil…

c0baaac

…es regexp in .fz/*

yannrichet-asnr force-pushed the implement-algorithms branch from 3b8c6c5 to 28d619f Compare November 22, 2025 12:58

yannrichet and others added 2 commits November 22, 2025 16:11

yannrichet mentioned this pull request Nov 22, 2025

Fix Windows file deletion issue in test_dict_flattening.py #46

Merged

yannrichet and others added 4 commits November 22, 2025 17:24

Merge main into implement-algorithms

59f8a76

Resolved conflicts: - .gitignore: Added results*/ and output/ ignore patterns from main - fz/core.py: Integrated callback support from main while keeping DataFrame-only return logic

.

877028c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support algorithms #40

Support algorithms #40

Uh oh!

yannrichet-asnr commented Oct 18, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Oct 18, 2025

Uh oh!

Copilot AI Oct 18, 2025

Uh oh!

Copilot AI Oct 18, 2025

Uh oh!

Copilot AI Oct 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		def generate_variable_combinations(input_variables: Dict) -> List[Dict]:
		def generate_variable_combinations(input_variables: Union[Dict, Any]) -> List[Dict]:

Support algorithms #40

Are you sure you want to change the base?

Support algorithms #40

Uh oh!

Conversation

yannrichet-asnr commented Oct 18, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 18, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants