Skip to content

Conversation

@yannrichet-asnr
Copy link
Member

This commit adds support for using pandas DataFrames as input_variables,
enabling non-factorial parametric study designs alongside the existing
dict-based factorial (Cartesian product) approach.

Implementation (fz/helpers.py):

  • Updated generate_variable_combinations() to detect DataFrame input
  • DataFrame: each row represents one case (non-factorial design)
  • Dict: existing Cartesian product behavior (factorial design)
  • Added optional pandas import with HAS_PANDAS flag
  • Enhanced type hints and docstring with examples
  • Added informative logging when DataFrame detected
  • Raises TypeError for invalid input types

Key features:

  • Factorial design (dict): Creates ALL combinations (Cartesian product)
    Example: {"x": [1,2], "y": [3,4]} → 4 cases
  • Non-factorial design (DataFrame): Only specified combinations
    Example: pd.DataFrame({"x":[1,2], "y":[3,4]}) → 2 cases (rows)

Use cases for DataFrames:

  • Variables with constraints or dependencies
  • Latin Hypercube Sampling, Sobol sequences
  • Imported designs from DOE software
  • Optimization algorithm sample points
  • Sensitivity analysis (one-at-a-time)
  • Sparse or adaptive sampling
  • Any irregular design pattern

Tests (tests/test_dataframe_input.py):

  • 12 comprehensive tests covering all scenarios
  • Unit tests for generate_variable_combinations()
  • Integration tests with fzr()
  • Tests for DataFrame vs dict behavior comparison
  • Tests for mixed types, constraints, repeated values
  • Input validation tests
  • All 12 tests pass successfully

Documentation:

  • README.md: New "Input Variables: Factorial vs Non-Factorial Designs" section
    • Comparison of dict (factorial) vs DataFrame (non-factorial)
    • When to use each approach
    • Examples with LHS, constraint-based designs
  • examples/dataframe_input.md: Comprehensive guide with:
    • 7 practical examples (constraints, LHS, Sobol, DOE import, etc.)
    • Comparison table
    • Tips and best practices
    • Common patterns and workflows
  • Updated Features section to mention both design types
  • Updated DataFrame I/O description

Backward compatibility:

  • Existing dict-based code continues to work unchanged
  • DataFrame support requires pandas (optional dependency)
  • Graceful handling when pandas not installed

Example usage:

import pandas as pd
from fz import fzr

# Non-factorial: specific combinations only
input_variables = pd.DataFrame({
    "temp": [100, 200, 100, 300],
    "pressure": [1.0, 1.0, 2.0, 1.5]
})
# Creates 4 cases: (100,1.0), (200,1.0), (100,2.0), (300,1.5)

results = fzr(input_file, input_variables, model, calculators)

Copilot AI review requested due to automatic review settings October 18, 2025 19:25
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds support for pandas DataFrame input to enable non-factorial (row-wise) parametric designs alongside existing dict-based factorial Cartesian product generation.

  • Extends generate_variable_combinations to accept DataFrames and return one case per row.
  • Adds comprehensive tests and documentation/examples differentiating factorial dict vs non-factorial DataFrame usage.
  • Updates README and adds a detailed example guide for DataFrame-driven designs.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
fz/helpers.py Implements DataFrame handling in generate_variable_combinations with optional pandas import and logging.
tests/test_dataframe_input.py Adds unit and integration tests for DataFrame vs dict behavior and input validation.
examples/dataframe_input.md New extensive guide on using DataFrames for non-factorial designs with multiple sampling patterns.
README.md Updates feature list and documents factorial vs non-factorial input variable formats with examples.

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.



def generate_variable_combinations(input_variables: Dict) -> List[Dict]:
def generate_variable_combinations(input_variables: Union[Dict, Any]) -> List[Dict]:
Copy link

Copilot AI Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type hint Union[Dict, Any] effectively collapses to Any and advertises acceptance of all types, while the function raises TypeError for non-dict/non-DataFrame inputs. Narrow the annotation to accepted types only, e.g. Union[Dict[str, Any], 'pd.DataFrame'] guarded by a TYPE_CHECKING block or a Protocol to improve static analysis.

Copilot uses AI. Check for mistakes.
Comment on lines +158 to +268
var_combinations = []
for _, row in input_variables.iterrows():
var_combinations.append(row.to_dict())
Copy link

Copilot AI Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using iterrows is relatively slow and may coerce dtypes; you can replace this block with var_combinations = input_variables.to_dict(orient='records') for a vectorized, faster conversion that preserves dtypes.

Suggested change
var_combinations = []
for _, row in input_variables.iterrows():
var_combinations.append(row.to_dict())
var_combinations = input_variables.to_dict(orient='records')

Copilot uses AI. Check for mistakes.
| **Example** | `{"x": [1,2], "y": [3,4]}` → 4 cases | `pd.DataFrame({"x":[1,2], "y":[3,4]})` → 2 cases |
| **Constraints** | Cannot handle constraints | Can handle constraints |
| **Sampling** | Grid-based | Any sampling method |

Copy link

Copilot AI Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each line has a double leading pipe '||', which will render an extra empty column or break the table. Remove one leading pipe per line so the table starts with a single | (e.g. | Aspect | Dict (Factorial) | DataFrame (Non-Factorial) |).

Suggested change

Copilot uses AI. Check for mistakes.
Comment on lines 171 to 181
model = {
"formulaprefix": "@",
"delim": "{}",
"commentline": "#",
"output": {
"result": "grep 'result:' output.txt | awk '{print $2}'"
}
}
Copy link

Copilot AI Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The same model dict is duplicated across multiple tests (e.g., lines 171–178, 201–208, 257–263). Consider extracting it into a fixture or a class attribute to reduce repetition and ease future changes.

Copilot uses AI. Check for mistakes.
@yannrichet yannrichet changed the title Add DataFrame input support for non-factorial parametric designs Support algorithms Oct 23, 2025
yannrichet and others added 25 commits November 22, 2025 13:46
This commit adds support for using pandas DataFrames as input_variables,
enabling non-factorial parametric study designs alongside the existing
dict-based factorial (Cartesian product) approach.

Implementation (fz/helpers.py):
- Updated generate_variable_combinations() to detect DataFrame input
- DataFrame: each row represents one case (non-factorial design)
- Dict: existing Cartesian product behavior (factorial design)
- Added optional pandas import with HAS_PANDAS flag
- Enhanced type hints and docstring with examples
- Added informative logging when DataFrame detected
- Raises TypeError for invalid input types

Key features:
- Factorial design (dict): Creates ALL combinations (Cartesian product)
  Example: {"x": [1,2], "y": [3,4]} → 4 cases
- Non-factorial design (DataFrame): Only specified combinations
  Example: pd.DataFrame({"x":[1,2], "y":[3,4]}) → 2 cases (rows)

Use cases for DataFrames:
- Variables with constraints or dependencies
- Latin Hypercube Sampling, Sobol sequences
- Imported designs from DOE software
- Optimization algorithm sample points
- Sensitivity analysis (one-at-a-time)
- Sparse or adaptive sampling
- Any irregular design pattern

Tests (tests/test_dataframe_input.py):
- 12 comprehensive tests covering all scenarios
- Unit tests for generate_variable_combinations()
- Integration tests with fzr()
- Tests for DataFrame vs dict behavior comparison
- Tests for mixed types, constraints, repeated values
- Input validation tests
- All 12 tests pass successfully

Documentation:
- README.md: New "Input Variables: Factorial vs Non-Factorial Designs" section
  * Comparison of dict (factorial) vs DataFrame (non-factorial)
  * When to use each approach
  * Examples with LHS, constraint-based designs
- examples/dataframe_input.md: Comprehensive guide with:
  * 7 practical examples (constraints, LHS, Sobol, DOE import, etc.)
  * Comparison table
  * Tips and best practices
  * Common patterns and workflows
- Updated Features section to mention both design types
- Updated DataFrame I/O description

Backward compatibility:
- Existing dict-based code continues to work unchanged
- DataFrame support requires pandas (optional dependency)
- Graceful handling when pandas not installed

Example usage:
```python
import pandas as pd
from fz import fzr

# Non-factorial: specific combinations only
input_variables = pd.DataFrame({
    "temp": [100, 200, 100, 300],
    "pressure": [1.0, 1.0, 2.0, 1.5]
})
# Creates 4 cases: (100,1.0), (200,1.0), (100,2.0), (300,1.5)

results = fzr(input_file, input_variables, model, calculators)
```

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
The integration tests were failing on Windows due to use of bash features
that don't work reliably across platforms:
- 'source' command (inconsistent behavior on Windows Git Bash)
- bash arithmetic with $((x + y))
- 'awk' command (not always available on Windows)

Changes:
- Simplified calculator script to extract pre-computed sum from input.txt
  instead of re-computing it with bash arithmetic
- The formula @{$x + $y} is already evaluated by fz during compilation,
  so the script just extracts the result
- Replaced 'awk' with 'cut' and 'tr' for output parsing (more portable)
- Uses only basic shell commands available on all platforms:
  grep, cut, echo, tr

The test approach now mirrors test_fzo_fzr_coherence.py which uses
simple, portable bash scripts that work across Linux, macOS, and Windows.

All 12 tests pass on Linux after this change.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Changed the DataFrame integration tests to use exactly the same script
pattern as test_fzo_fzr_coherence.py, which is known to work on Windows.

Changes:
- Back to using 'source input.txt' to read variables
- Back to using bash arithmetic: result=$((x + y))
- Output format: 'echo "result = $result" > output.txt' (spaces around =)
- Parsing: 'grep "result = " output.txt | cut -d "=" -f2' (no tr command)

This exactly mirrors the working pattern from test_fzo_fzr_coherence.py
lines 117-120 which successfully runs on Windows, macOS, and Linux in CI.

The previous attempt avoided 'source' and bash arithmetic, but that
created a different issue. The test_fzo_fzr_coherence.py tests prove
that these commands DO work in the CI environment on all platforms.

All 12 tests pass on Linux.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
* Add Windows bash availability check with helpful error message; ensure subprocess uses bash on Windows; add related tests and documentation.
* ensure awk & cut are available
* use msys2 in CI
* do not check for cat in msys2... (try)
* fast error if bash unavailable on windows
* check windows bash concistently with core/runners
* factorize windows bash get function
* select tests by OS
* try centralize system exec
* fix bash support on win (from win & claude)
* add bc alongside bash for win
* for now do not support win batch commands (like timeout)
* Add Windows bash availability check with helpful error message; ensure subprocess uses bash on Windows; add related tests and documentation.
* ensure awk & cut are available
* use msys2 in CI
* do not check for cat in msys2... (try)
* fast error if bash unavailable on windows
* check windows bash concistently with core/runners
* factorize windows bash get function
* select tests by OS
* try centralize system exec
* fix bash support on win (from win & claude)
* add bc alongside bash for win
* for now do not support win batch commands (like timeout)
During the rebase, the bool(input_variables) checks were not properly
handling DataFrame inputs, causing "The truth value of a DataFrame is
ambiguous" errors.

Changes:
- Updated fzr() type hint to accept Union[Dict, "pandas.DataFrame"]
- Enhanced docstring to document DataFrame support for non-factorial designs
- Fixed bool(input_variables) checks in core.py and helpers.py to handle
  both dict and DataFrame types properly
- DataFrame empty check: not input_variables.empty
- Dict empty check: bool(input_variables)

All 12 DataFrame input tests now pass successfully.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
The _raw field that preserved original algorithm output is now removed
from the returned dict. Raw content is no longer needed in the return
structure since:

1. HTML/Markdown content is saved to files and referenced by file path
2. JSON content is parsed and saved as both json_data and json_file
3. Key=value content is parsed and saved as keyvalue_data and txt_file
4. Plain text content is kept in 'text' field if no format detected
5. Data dict is always available in 'data' field

Changes:
- Removed processed['_raw'] = display_dict from _get_and_process_analysis()
- Added logging of text content before processing in _get_and_process_analysis()
- Removed all checks for '_raw' field in fzd() function
- Updated HTML results generation to use processed content with file links
- The iteration summary HTML now links to analysis files instead of embedding raw content

All 9 content detection tests still pass.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Implemented the same directory handling for fzd as fzr:
- Existing analysis_dir is renamed with timestamp (e.g., analysis_dir_2025-10-24_19-30-34)
- Renamed directory iterations are included in cache paths for result reuse
- Cache now checks both current and previous run iterations

Changes:
- Added ensure_unique_directory() call for fzd's analysis_dir
- Build cache paths dynamically to include:
  * Current run iterations (iter001, iter002, etc.)
  * Previous run iterations from renamed directory (up to 99 iterations)
- Cache paths are checked before actual calculators for efficiency

This prevents data loss from overwriting and enables result reuse across runs.

Also removed accidentally committed test files (fzd_analysis directory).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Updated documentation to remove references to _raw field which is no
longer included in fzd results. The _raw field was removed to keep the
return structure clean - raw content is either:
- Saved to files (HTML, markdown) with file references
- Parsed into Python objects (JSON, key=value)
- Kept as plain text if no format detected
- Logged to console before processing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Cache is now only used when explicitly requested in calculators list,
matching fzr's behavior.

Changes:
- "cache://_" expands to all iterations from renamed previous run
  (e.g., results_fzd_2025-10-24_19-30-34/iter001 through iter099)
- If ANY cache calculator is in the list, previous iterations from
  current run are automatically added for efficiency
- If no cache is requested, no cache paths are added at all

Examples:
  calculators=["sh://calc.sh"]
  → No cache used at all

  calculators=["cache://_", "sh://calc.sh"]
  → Uses renamed dir iterations + current run previous iterations

  calculators=["cache://some/path", "sh://calc.sh"]
  → Uses specified path + current run previous iterations

This gives users explicit control over caching while still providing
sensible automatic behavior for current run iterations when cache is
enabled.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
yannrichet and others added 20 commits November 22, 2025 13:46
* Add Windows bash availability check with helpful error message; ensure subprocess uses bash on Windows; add related tests and documentation.
* ensure awk & cut are available
* use msys2 in CI
* do not check for cat in msys2... (try)
* fast error if bash unavailable on windows
* check windows bash concistently with core/runners
* factorize windows bash get function
* select tests by OS
* try centralize system exec
* fix bash support on win (from win & claude)
* add bc alongside bash for win
* for now do not support win batch commands (like timeout)
Removed unused imports and variables across the codebase:
- Removed unused stdlib imports (re, subprocess, tempfile, etc.)
- Removed unused type imports (Tuple, Union, List, Optional where unused)
- Removed unused helper function imports
- Removed unused local variables (pid, used_calculator_uri, case_elapsed, interpreter)
- Removed commented debug code

All 269 tests pass successfully after cleanup.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Fixed indentation errors in interpreter.py and runners.py
- Added missing _validate_calculator_uri function in runners.py
- Fixed DataFrame validation in fzr() to accept both dict and DataFrame
- Added comprehensive validation for fzd arguments with helpful error messages
- Added test_no_algorithms.py with 11 algorithm validation tests
- Added informative logging when loading algorithms in fzd

All 448 tests pass.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Add threading and defaultdict imports to core.py
- Add load_aliases import from io module
- Fix DataFrame validation in fzr() to accept both dict and DataFrame

All 455 tests passing.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Move pandas from optional to required dependencies in setup.py and pyproject.toml
- Remove PANDAS_AVAILABLE checks and conditional handling throughout codebase
- Always return DataFrame from fzo() and fzr()
- Simplify code by assuming pandas is always available

pandas is now essential for fzd() and provides better data handling in fzo() and fzr().

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
yannrichet-asnr pushed a commit that referenced this pull request Nov 22, 2025
Resolves PermissionError on Windows during temporary directory cleanup
by restoring the original working directory before the TemporaryDirectory
context manager exits.

On Windows, you cannot delete a directory that is the current working
directory. The tests were calling os.chdir(tmpdir) and then attempting
to clean up the directory when the context exited, causing:
- PermissionError: [WinError 32] The process cannot access the file
  because it is being used by another process
- PermissionError: [WinError 5] Access is denied

Solution: Wrap test logic in try/finally blocks that save and restore
the original working directory, allowing Windows to successfully delete
temporary directories during cleanup.

Fixes #40 (Windows CI failure in test_dict_flattening.py)
yannrichet and others added 2 commits November 22, 2025 16:11
- Remove PANDAS_AVAILABLE variable definitions from test files
- Remove HAS_PANDAS variable definitions from source files
- Remove test_fzd_without_pandas (no longer relevant)
- Remove skipif decorators that checked for pandas availability
- Simplify conditional checks that tested pandas availability

pandas is now always available as a required dependency.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Remove orphaned try/except blocks for pandas imports in fz/helpers.py and fz/io.py
- Fix broken pandas imports in test files (test_dict_flattening.py, test_fzo_fzr_coherence.py, test_no_algorithms.py)
- Add missing pandas import to test_dataframe_input.py
- Fix indentation issues in test_dataframe_input.py after removing conditional code

All 455 tests now passing.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
yannrichet and others added 4 commits November 22, 2025 17:24
Created context/ directory with 10 markdown files (4369 lines total) designed
for LLM consumption to help understand and suggest fz usage:

- overview.md: Framework introduction and key concepts
- syntax-guide.md: Variable substitution and formula syntax
- core-functions.md: API reference for fzi, fzc, fzo, fzr
- model-definition.md: Model configuration guide
- calculators.md: Execution backend types (sh, ssh, cache)
- formulas-and-interpreters.md: Python and R formula evaluation
- parallel-and-caching.md: Parallel execution and caching strategies
- quick-examples.md: Common patterns and use case examples
- README.md: Documentation guide and usage instructions
- INDEX.md: Quick reference index for finding topics

Each file includes:
- Clear syntax examples with code snippets
- Complete working examples
- Common patterns and best practices
- Cross-references to related topics

This documentation will help LLMs provide better assistance with fz by
understanding its syntax, features, and typical usage patterns.

Co-authored-by: Claude <[email protected]>
Resolves PermissionError on Windows during temporary directory cleanup
by restoring the original working directory before the TemporaryDirectory
context manager exits.

On Windows, you cannot delete a directory that is the current working
directory. The tests were calling os.chdir(tmpdir) and then attempting
to clean up the directory when the context exited, causing:
- PermissionError: [WinError 32] The process cannot access the file
  because it is being used by another process
- PermissionError: [WinError 5] Access is denied

Solution: Wrap test logic in try/finally blocks that save and restore
the original working directory, allowing Windows to successfully delete
temporary directories during cleanup.

Fixes #40 (Windows CI failure in test_dict_flattening.py)

Co-authored-by: Claude <[email protected]>
Resolved conflicts:
- .gitignore: Added results*/ and output/ ignore patterns from main
- fz/core.py: Integrated callback support from main while keeping DataFrame-only return logic
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants