This repository shows you how to run a hyperparameter optimization (HPO) system as an Outerbounds project.
This README.md will explain why you'd want to connect these concepts, and will show you how to launch HPO jobs for:
- classical ML models
- deep learning models
- end-to-end system tuning
If you have never deployed an Outerbounds project, please read the Outerbounds documentation before continuing.
Change the platform in obproject.toml to match your Outerbounds deployment.
uv init
uv add outerbounds optuna numpy pandas "psycopg[binary]>=3.2.0" scikit-learn torch torchvisionEnsure you've run your outerbounds configure ... command.
Then, run flows!
cd flows/tree
uv run python flow.py --environment=fast-bakery run --with kubernetesFor more information about the
fast-bakerytechnology, see Fast Bakery: Automatic Containerization.
To begin, copy the structure in /flows/nn or /flows/tree, which represent sample use cases:
config.jsoncontains system and hyperparameter config options.flow.pydefines the workflow structure. This should change little across use cases.objective_fn.pythis is the key piece of the puzzle - the function that converts samples from hyperparameter space into the measures the system optimizes - for a new use case. See examples at https://github.com/optuna/optuna-examples/tree/main.utils.pycontains small project-specific helpers.interactive.ipynbis a starter notebook for running and analyzing hyperparameter tuning runs in a REPL.- Symlink to
obproject.tomlat the root of the repository.
If desired, you can directly modify one of these sub-directories.
The key aspect of customization is about defining the objective function.
Check out the examples and reach out for assistance if you do not know how to parameterize your task as a tunable optimization problem.
From there, determine the dependencies needed for running the objective function
and update the config.json values accordingly, most notable the Python packages
section which flow.py will use when building consistent environments across
compute backends.
The Outerbounds app that will run your Optuna dashboard is defined in ./deployments/optuna-dashboard/config.yml.
When you push to the main branch of this repository, the obproject-deployer will create the application in your Outerbounds project branch.
If you'd like to manually deploy the application:
cd deployments/optuna-dashboard
uv run outerbounds app deploy --config-file config.ymlFrom your laptop or Outerbounds workstation run:
uv init
uv add outerbounds optuna numpy pandas "psycopg[binary]>=3.2.0" scikit-learn torch torchvisionConfigure Outerbounds token. Ask in Slack if not sure.
cd flows/tree
# cd flows/nnBefore running or deploying the workflows, investigate the relationship between the flow and the config.json file.
As long as you haven't changed anything when deploying the application hosting the Optuna dashboard, you do not need to change anything in that file, but it is useful to be familiar with these contents and the way the configuration files are interacting with Metaflow code.
There are two demos implemented within this project base in flows/tree and flows/nn.
Each workflow template defines:
- a
flow.pycontaining aFlowSpec, - a single
config.jsonto set system variables and hyperparameter configurations, - an
hpo_client.pycontaining entrypoints to run and trigger the flow, - notebooks showing how to run and analyze results of hyperparameter tuning runs, and
- the templates show how to define a modular, fully customizable objective function.
For the rest of this section, we'll use the flows/nn template, as everything else is the same as for flows/tree.
cd flows/nnuv run python flow.py --environment=fast-bakery run --with kubernetes
uv run python flow.py --environment=fast-bakery argo-workflows create
uv run python flow.py --environment=fast-bakery argo-workflows triggerThe examples also include a convenience wrapper around the workflows in the hpo_client.py.
You can use this for:
- running HPO jobs from notebooks, CLI, or other Metaflow flows, or
- as an example for creating your own experiment entrypoint abstractions.
uv run python hpo_client.py -m 1 # blocking
uv run python hpo_client.py -m 2 # async
uv run python hpo_client.py -m 3 # trigger deployed flowThere are three client modes:
- Blocking -
python hpo_client.py -m 1 - Async -
python hpo_client.py -m 2 - Trigger -
python hpo_client.py -m 3- Trigger option also works with a parameter
--namespace/-n, which determines the namespace within which this code path checks for already-deployed flows.
- Trigger option also works with a parameter
This system is an integration between Optuna, a feature-rich and open-source hyperparameter optimization framework, and Outerbounds. Using it leverages functionality built-into your Outerbounds deployment to run a persistent relational database that tasks and applications can communicate with. The Optuna dashboard is run as an Outerbounds app, enabling sophisticated analysis of hyperparameter tuning runs.
The implementation wraps the standard Optuna interface, aiming to balance two goals:
- Provide full expressiveness and compatibility with open-source Optuna features.
- Provide an opinionated and streamlined interface for launching HPO studies as Metaflow flows.
Typically, Optuna programs are developed in Python scripts.
An objective function returns 1 or 2 values.
It's argument is a trial,
representing a single execution of the objective function; in other words, a sample drawn from the hyperparameter search space.
def objective(trial):
x = trial.suggest_float("x", -100, 100)
y = trial.suggest_categorical("y", [-1, 0, 1])
f1 = x**2 + y
f2 = -((x - 2) ** 2 + y)
return f1, f2The key task of the user who wishes to use the from outerbounds.hpo import HPORunner abstraction this project affords is to determine:
- How to define the objective function?
- What data, model, and code does the objective function depend on?
- How many trials do you want to run per study?
With answers to these questions, you'll be ready to adapt your objective functions as demonstrated in the example flows/ and call the HPORunner interface to automate HPO workflows.
Notice that with Optuna, the user imperatively defines the hyperparameter space in how the trial object is used within the objective function.
The number of variables for which we have trial.suggest_* defines the dimensionality of the search space.
Be judicious with adding parameters. Many algorithms, especially bayesian optimization suffers performance degradation when there are many more than 5-10 parameters being tuned simultaneously.
To optimize the hyperparameters, we create a study.
Optuna implements many optimization algorithm families, called as optuna.samplers. These include grid, random, tree-structure parzen estimators, evolutionary (CMA-ES, NSGA-II), Gaussian processes, Quasi Monte Carlo methods, and more.
For example, if you wanted to purely random sample - no learning throughout the study - the hyperparameter space 10 times, you'd run:
study = optuna.create_study(sampler=optuna.samplers.RandomSampler())
study.optimize(objective, n_trials=10)Sometimes it is desirable to early stop unpromising trials. The mechanism for doing this in Optuna is called optuna.pruners, which uses intermediate objective function state variables of previous trials to determine a boolean representing whether the trial should be pruned.
To resume a study, simply pass in the name of the previous study.
If leveraging the Metaflow versioning scheme which uses the Metaflow Run pathspec as the study name - in other words not overriding the study name via configs or CLI - then
you can set this value in the config and resume the study. You can also override in the command line using the hpo_client's --resume-study/-r option:
python hpo_client.py -m 1 -r TreeModelHpoFlow/argo-hposystem.prod.treemodelhpoflow-7ntvz- Benchmark gRPC vs. pure RDB scaling thresholds. When is it worth it to do gRPC? How hard is that to implement? How do costs scale in each mode?