Skip to content

Conversation

@crkrenn
Copy link
Collaborator

@crkrenn crkrenn commented Mar 7, 2023

@antimatterhorn & @daub1,

The main change here is working "smart sampler" using an algorithm that Cody and I developed. The best_candidate sampler works as it did before, but if you add "previous_samples, cost_variable, downselect_ratio, and voxel_overlap", it will read a csv file containing parameters and a cost column. The code will select the best downselect_ratio * len(previous_samples) points as determined by the cost function. It will generate random samples in voxels defined by each selected previous point and its nearest neighbor.

There is a working example in tests. As expected, the smart sampler zooms in on minimal values of the rosenbruck function.

If variables are listed in constants or parameters and are not in the previous samples file, they will be sampled normally. This will let you add another parameter to a set of "good" points.

The only other significant change was some refactoring and commenting to remove flint lake and pylint errors.

-Chris

        sampler:
            type: best_candidate
            num_samples: 30
            previous_samples: samples.csv # optional
            cost_variable: cost   # required if previous_samples is provided
            downselect_ratio: 0.3 # required if previous_samples is provided
            voxel_overlap: 1.0    # required if previous_samples is provided
            constants:
                X1: 20
            parameters:
                X2:
                    min: 5
                    max: 10
                X3:
                    min: 5
                    max: 10

crkrenn added 4 commits March 5, 2023 08:54
- Update `RANDOM_SCHEMA` and `BEST_CANDIDATE_SCHEMA` to match
- Disable the "previous_samples" check
- Add an error check to the schema
- Add a test for min and max values in the parameters
- Add a function to calculate the Manhattan distance between two points
- Change the format of the output from `get_samples` to a dictionary of dictionaries
- Convert samples

[scisample/schema.py]
- Remove `previous_samples` from `RANDOM_SCHEMA`
- Add `cost_variable`, `downselect_ratio`, `voxel_overlap` to `BEST_CANDIDATE_SCHEMA`
- Update `BEST_CANDIDATE_SCHEMA` to match `RANDOM_SCHEMA`
[scisample/random_sampler.py]
- Disable the "previous_samples" check
- Add an error check to the schema
- Add a test for min and max values in the parameters
[scisample/utils.py]
- Add a function to calculate the Manhattan distance between two points
[tests/test_utils.py]
- Add manhattan_distance function
- Change the tolerance of parse_parameters to accept a list of floats
[scisample/base_sampler.py]
- Change the format of the output from `get_samples` to a dictionary of dictionaries
- Convert samples to parameter dictionary in a format convenient for maestrowf
- Add a new OpenAI API for completions
- Lower the numeric tolerance for test files
- Add two tests for the inclusive string split function
- Refactor `downselect` function in `base_sampler.py` to allow for optional argument `previous_samples`
- Change `previous_samples` path in `test_samplers.py`
- Change `X1` constants to range and `X2` and `X3` ranges in `test_samplers.py`
- Add a check to ensure `previous_samples` is a DataFrame in

[tests/test_samplers.py]
- Change `previous_samples` path
- Change `X1` constants to range
- Change `X2` and `X3` ranges
[scisample/base_sampler.py]
- Allow for optional argument `previous_samples` in `downselect` function
- Refactor `downselect` function to handle `previous_samples` argument
- Add a check to ensure `previous_samples` is a DataFrame
- Added `return_indices` parameter to `downselect` function in `base_sampler.py`
- Changed `columns` variable to use `df.columns.tolist()` instead of `self.parameters`
- Added optional return of indices in `downselect` function

[scisample/base_sampler.py]
- Added `return_indices` parameter to `downselect` function
- Changed `columns` variable to use `df.columns.tolist()` instead of `self.parameters`
- Added optional return of indices in `downselect` function
- Update `__version__` and `VERSION` variables to `1.0.3`
- Add `encoding='utf-8'` to `open` calls
- Change argument name of `manhattan_distance` from `x` and `y` to `point_1` and `point_2`
- Check for duplicates in variables
- Add a `pointless-statement` disable comment
- Add constants to the samples
- Add

[scisample/__init__.py]
- Add a docstring to the `__init__.py` file
- Update the `__version__` and `VERSION` variables to `1.0.3`
[scisample/utils.py]
- Add `encoding='utf-8'` to `open` calls
- Change the argument name of `manhattan_distance` from `x` and `y` to `point_1` and `point_2`
[scisample/column_list_sampler.py]
- Check for duplicates in variables
- Add a `pointless-statement` disable comment
- Add constants to the samples
- Add parameter samples to the samples
[scisample/random_sampler.py]
- Replace `i` with `_` in loop for generating random samples
- Move `octokit` initialization to separate file
- Add `with suppress` blocks to catch `KeyError`
- Update `new_sample` with `constants` and `random_list`
[scisample/custom_sampler.py]
- Move the sample function initialization to a separate line
@daub1
Copy link
Contributor

daub1 commented Mar 7, 2023

@crkrenn it might make sense to include a threshold for your cost function rather than require me to decide what fraction of the points I want to use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants