-
Notifications
You must be signed in to change notification settings - Fork 1
Support algorithms #40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds support for pandas DataFrame input to enable non-factorial (row-wise) parametric designs alongside existing dict-based factorial Cartesian product generation.
- Extends generate_variable_combinations to accept DataFrames and return one case per row.
- Adds comprehensive tests and documentation/examples differentiating factorial dict vs non-factorial DataFrame usage.
- Updates README and adds a detailed example guide for DataFrame-driven designs.
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| fz/helpers.py | Implements DataFrame handling in generate_variable_combinations with optional pandas import and logging. |
| tests/test_dataframe_input.py | Adds unit and integration tests for DataFrame vs dict behavior and input validation. |
| examples/dataframe_input.md | New extensive guide on using DataFrames for non-factorial designs with multiple sampling patterns. |
| README.md | Updates feature list and documents factorial vs non-factorial input variable formats with examples. |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
|
||
|
|
||
| def generate_variable_combinations(input_variables: Dict) -> List[Dict]: | ||
| def generate_variable_combinations(input_variables: Union[Dict, Any]) -> List[Dict]: |
Copilot
AI
Oct 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type hint Union[Dict, Any] effectively collapses to Any and advertises acceptance of all types, while the function raises TypeError for non-dict/non-DataFrame inputs. Narrow the annotation to accepted types only, e.g. Union[Dict[str, Any], 'pd.DataFrame'] guarded by a TYPE_CHECKING block or a Protocol to improve static analysis.
| var_combinations = [] | ||
| for _, row in input_variables.iterrows(): | ||
| var_combinations.append(row.to_dict()) |
Copilot
AI
Oct 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using iterrows is relatively slow and may coerce dtypes; you can replace this block with var_combinations = input_variables.to_dict(orient='records') for a vectorized, faster conversion that preserves dtypes.
| var_combinations = [] | |
| for _, row in input_variables.iterrows(): | |
| var_combinations.append(row.to_dict()) | |
| var_combinations = input_variables.to_dict(orient='records') |
| | **Example** | `{"x": [1,2], "y": [3,4]}` → 4 cases | `pd.DataFrame({"x":[1,2], "y":[3,4]})` → 2 cases | | ||
| | **Constraints** | Cannot handle constraints | Can handle constraints | | ||
| | **Sampling** | Grid-based | Any sampling method | | ||
|
|
Copilot
AI
Oct 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each line has a double leading pipe '||', which will render an extra empty column or break the table. Remove one leading pipe per line so the table starts with a single | (e.g. | Aspect | Dict (Factorial) | DataFrame (Non-Factorial) |).
| model = { | ||
| "formulaprefix": "@", | ||
| "delim": "{}", | ||
| "commentline": "#", | ||
| "output": { | ||
| "result": "grep 'result:' output.txt | awk '{print $2}'" | ||
| } | ||
| } |
Copilot
AI
Oct 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The same model dict is duplicated across multiple tests (e.g., lines 171–178, 201–208, 257–263). Consider extracting it into a fixture or a class attribute to reduce repetition and ease future changes.
5077247 to
6b43c56
Compare
f35a769 to
3b8c6c5
Compare
This commit adds support for using pandas DataFrames as input_variables,
enabling non-factorial parametric study designs alongside the existing
dict-based factorial (Cartesian product) approach.
Implementation (fz/helpers.py):
- Updated generate_variable_combinations() to detect DataFrame input
- DataFrame: each row represents one case (non-factorial design)
- Dict: existing Cartesian product behavior (factorial design)
- Added optional pandas import with HAS_PANDAS flag
- Enhanced type hints and docstring with examples
- Added informative logging when DataFrame detected
- Raises TypeError for invalid input types
Key features:
- Factorial design (dict): Creates ALL combinations (Cartesian product)
Example: {"x": [1,2], "y": [3,4]} → 4 cases
- Non-factorial design (DataFrame): Only specified combinations
Example: pd.DataFrame({"x":[1,2], "y":[3,4]}) → 2 cases (rows)
Use cases for DataFrames:
- Variables with constraints or dependencies
- Latin Hypercube Sampling, Sobol sequences
- Imported designs from DOE software
- Optimization algorithm sample points
- Sensitivity analysis (one-at-a-time)
- Sparse or adaptive sampling
- Any irregular design pattern
Tests (tests/test_dataframe_input.py):
- 12 comprehensive tests covering all scenarios
- Unit tests for generate_variable_combinations()
- Integration tests with fzr()
- Tests for DataFrame vs dict behavior comparison
- Tests for mixed types, constraints, repeated values
- Input validation tests
- All 12 tests pass successfully
Documentation:
- README.md: New "Input Variables: Factorial vs Non-Factorial Designs" section
* Comparison of dict (factorial) vs DataFrame (non-factorial)
* When to use each approach
* Examples with LHS, constraint-based designs
- examples/dataframe_input.md: Comprehensive guide with:
* 7 practical examples (constraints, LHS, Sobol, DOE import, etc.)
* Comparison table
* Tips and best practices
* Common patterns and workflows
- Updated Features section to mention both design types
- Updated DataFrame I/O description
Backward compatibility:
- Existing dict-based code continues to work unchanged
- DataFrame support requires pandas (optional dependency)
- Graceful handling when pandas not installed
Example usage:
```python
import pandas as pd
from fz import fzr
# Non-factorial: specific combinations only
input_variables = pd.DataFrame({
"temp": [100, 200, 100, 300],
"pressure": [1.0, 1.0, 2.0, 1.5]
})
# Creates 4 cases: (100,1.0), (200,1.0), (100,2.0), (300,1.5)
results = fzr(input_file, input_variables, model, calculators)
```
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>
The integration tests were failing on Windows due to use of bash features
that don't work reliably across platforms:
- 'source' command (inconsistent behavior on Windows Git Bash)
- bash arithmetic with $((x + y))
- 'awk' command (not always available on Windows)
Changes:
- Simplified calculator script to extract pre-computed sum from input.txt
instead of re-computing it with bash arithmetic
- The formula @{$x + $y} is already evaluated by fz during compilation,
so the script just extracts the result
- Replaced 'awk' with 'cut' and 'tr' for output parsing (more portable)
- Uses only basic shell commands available on all platforms:
grep, cut, echo, tr
The test approach now mirrors test_fzo_fzr_coherence.py which uses
simple, portable bash scripts that work across Linux, macOS, and Windows.
All 12 tests pass on Linux after this change.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>
Changed the DataFrame integration tests to use exactly the same script pattern as test_fzo_fzr_coherence.py, which is known to work on Windows. Changes: - Back to using 'source input.txt' to read variables - Back to using bash arithmetic: result=$((x + y)) - Output format: 'echo "result = $result" > output.txt' (spaces around =) - Parsing: 'grep "result = " output.txt | cut -d "=" -f2' (no tr command) This exactly mirrors the working pattern from test_fzo_fzr_coherence.py lines 117-120 which successfully runs on Windows, macOS, and Linux in CI. The previous attempt avoided 'source' and bash arithmetic, but that created a different issue. The test_fzo_fzr_coherence.py tests prove that these commands DO work in the CI environment on all platforms. All 12 tests pass on Linux. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
* Add Windows bash availability check with helpful error message; ensure subprocess uses bash on Windows; add related tests and documentation. * ensure awk & cut are available * use msys2 in CI * do not check for cat in msys2... (try) * fast error if bash unavailable on windows * check windows bash concistently with core/runners * factorize windows bash get function * select tests by OS * try centralize system exec * fix bash support on win (from win & claude) * add bc alongside bash for win * for now do not support win batch commands (like timeout)
* Add Windows bash availability check with helpful error message; ensure subprocess uses bash on Windows; add related tests and documentation. * ensure awk & cut are available * use msys2 in CI * do not check for cat in msys2... (try) * fast error if bash unavailable on windows * check windows bash concistently with core/runners * factorize windows bash get function * select tests by OS * try centralize system exec * fix bash support on win (from win & claude) * add bc alongside bash for win * for now do not support win batch commands (like timeout)
During the rebase, the bool(input_variables) checks were not properly handling DataFrame inputs, causing "The truth value of a DataFrame is ambiguous" errors. Changes: - Updated fzr() type hint to accept Union[Dict, "pandas.DataFrame"] - Enhanced docstring to document DataFrame support for non-factorial designs - Fixed bool(input_variables) checks in core.py and helpers.py to handle both dict and DataFrame types properly - DataFrame empty check: not input_variables.empty - Dict empty check: bool(input_variables) All 12 DataFrame input tests now pass successfully. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
The _raw field that preserved original algorithm output is now removed from the returned dict. Raw content is no longer needed in the return structure since: 1. HTML/Markdown content is saved to files and referenced by file path 2. JSON content is parsed and saved as both json_data and json_file 3. Key=value content is parsed and saved as keyvalue_data and txt_file 4. Plain text content is kept in 'text' field if no format detected 5. Data dict is always available in 'data' field Changes: - Removed processed['_raw'] = display_dict from _get_and_process_analysis() - Added logging of text content before processing in _get_and_process_analysis() - Removed all checks for '_raw' field in fzd() function - Updated HTML results generation to use processed content with file links - The iteration summary HTML now links to analysis files instead of embedding raw content All 9 content detection tests still pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Implemented the same directory handling for fzd as fzr: - Existing analysis_dir is renamed with timestamp (e.g., analysis_dir_2025-10-24_19-30-34) - Renamed directory iterations are included in cache paths for result reuse - Cache now checks both current and previous run iterations Changes: - Added ensure_unique_directory() call for fzd's analysis_dir - Build cache paths dynamically to include: * Current run iterations (iter001, iter002, etc.) * Previous run iterations from renamed directory (up to 99 iterations) - Cache paths are checked before actual calculators for efficiency This prevents data loss from overwriting and enables result reuse across runs. Also removed accidentally committed test files (fzd_analysis directory). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Updated documentation to remove references to _raw field which is no longer included in fzd results. The _raw field was removed to keep the return structure clean - raw content is either: - Saved to files (HTML, markdown) with file references - Parsed into Python objects (JSON, key=value) - Kept as plain text if no format detected - Logged to console before processing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Cache is now only used when explicitly requested in calculators list, matching fzr's behavior. Changes: - "cache://_" expands to all iterations from renamed previous run (e.g., results_fzd_2025-10-24_19-30-34/iter001 through iter099) - If ANY cache calculator is in the list, previous iterations from current run are automatically added for efficiency - If no cache is requested, no cache paths are added at all Examples: calculators=["sh://calc.sh"] → No cache used at all calculators=["cache://_", "sh://calc.sh"] → Uses renamed dir iterations + current run previous iterations calculators=["cache://some/path", "sh://calc.sh"] → Uses specified path + current run previous iterations This gives users explicit control over caching while still providing sensible automatic behavior for current run iterations when cache is enabled. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
* Add Windows bash availability check with helpful error message; ensure subprocess uses bash on Windows; add related tests and documentation. * ensure awk & cut are available * use msys2 in CI * do not check for cat in msys2... (try) * fast error if bash unavailable on windows * check windows bash concistently with core/runners * factorize windows bash get function * select tests by OS * try centralize system exec * fix bash support on win (from win & claude) * add bc alongside bash for win * for now do not support win batch commands (like timeout)
Removed unused imports and variables across the codebase: - Removed unused stdlib imports (re, subprocess, tempfile, etc.) - Removed unused type imports (Tuple, Union, List, Optional where unused) - Removed unused helper function imports - Removed unused local variables (pid, used_calculator_uri, case_elapsed, interpreter) - Removed commented debug code All 269 tests pass successfully after cleanup. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
…stall, install_algo, ...
…es regexp in .fz/*
- Fixed indentation errors in interpreter.py and runners.py - Added missing _validate_calculator_uri function in runners.py - Fixed DataFrame validation in fzr() to accept both dict and DataFrame - Added comprehensive validation for fzd arguments with helpful error messages - Added test_no_algorithms.py with 11 algorithm validation tests - Added informative logging when loading algorithms in fzd All 448 tests pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Add threading and defaultdict imports to core.py - Add load_aliases import from io module - Fix DataFrame validation in fzr() to accept both dict and DataFrame All 455 tests passing. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
3b8c6c5 to
28d619f
Compare
- Move pandas from optional to required dependencies in setup.py and pyproject.toml - Remove PANDAS_AVAILABLE checks and conditional handling throughout codebase - Always return DataFrame from fzo() and fzr() - Simplify code by assuming pandas is always available pandas is now essential for fzd() and provides better data handling in fzo() and fzr(). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Resolves PermissionError on Windows during temporary directory cleanup by restoring the original working directory before the TemporaryDirectory context manager exits. On Windows, you cannot delete a directory that is the current working directory. The tests were calling os.chdir(tmpdir) and then attempting to clean up the directory when the context exited, causing: - PermissionError: [WinError 32] The process cannot access the file because it is being used by another process - PermissionError: [WinError 5] Access is denied Solution: Wrap test logic in try/finally blocks that save and restore the original working directory, allowing Windows to successfully delete temporary directories during cleanup. Fixes #40 (Windows CI failure in test_dict_flattening.py)
- Remove PANDAS_AVAILABLE variable definitions from test files - Remove HAS_PANDAS variable definitions from source files - Remove test_fzd_without_pandas (no longer relevant) - Remove skipif decorators that checked for pandas availability - Simplify conditional checks that tested pandas availability pandas is now always available as a required dependency. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Remove orphaned try/except blocks for pandas imports in fz/helpers.py and fz/io.py - Fix broken pandas imports in test files (test_dict_flattening.py, test_fzo_fzr_coherence.py, test_no_algorithms.py) - Add missing pandas import to test_dataframe_input.py - Fix indentation issues in test_dataframe_input.py after removing conditional code All 455 tests now passing. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Created context/ directory with 10 markdown files (4369 lines total) designed for LLM consumption to help understand and suggest fz usage: - overview.md: Framework introduction and key concepts - syntax-guide.md: Variable substitution and formula syntax - core-functions.md: API reference for fzi, fzc, fzo, fzr - model-definition.md: Model configuration guide - calculators.md: Execution backend types (sh, ssh, cache) - formulas-and-interpreters.md: Python and R formula evaluation - parallel-and-caching.md: Parallel execution and caching strategies - quick-examples.md: Common patterns and use case examples - README.md: Documentation guide and usage instructions - INDEX.md: Quick reference index for finding topics Each file includes: - Clear syntax examples with code snippets - Complete working examples - Common patterns and best practices - Cross-references to related topics This documentation will help LLMs provide better assistance with fz by understanding its syntax, features, and typical usage patterns. Co-authored-by: Claude <[email protected]>
Resolves PermissionError on Windows during temporary directory cleanup by restoring the original working directory before the TemporaryDirectory context manager exits. On Windows, you cannot delete a directory that is the current working directory. The tests were calling os.chdir(tmpdir) and then attempting to clean up the directory when the context exited, causing: - PermissionError: [WinError 32] The process cannot access the file because it is being used by another process - PermissionError: [WinError 5] Access is denied Solution: Wrap test logic in try/finally blocks that save and restore the original working directory, allowing Windows to successfully delete temporary directories during cleanup. Fixes #40 (Windows CI failure in test_dict_flattening.py) Co-authored-by: Claude <[email protected]>
Resolved conflicts: - .gitignore: Added results*/ and output/ ignore patterns from main - fz/core.py: Integrated callback support from main while keeping DataFrame-only return logic
This commit adds support for using pandas DataFrames as input_variables,
enabling non-factorial parametric study designs alongside the existing
dict-based factorial (Cartesian product) approach.
Implementation (fz/helpers.py):
Key features:
Example: {"x": [1,2], "y": [3,4]} → 4 cases
Example: pd.DataFrame({"x":[1,2], "y":[3,4]}) → 2 cases (rows)
Use cases for DataFrames:
Tests (tests/test_dataframe_input.py):
Documentation:
Backward compatibility:
Example usage: