This repository is a template for structuring (empirical) machine-learning projects, especially for bachelor and master thesis work. It shows how to organize experiments, manage configurations, run evaluations, and generate results in a clean and reproducible way.
Planning experiments at the start adds minimal overhead, clarifies objectives, and accelerates scalable, reproducible research.
TBH, we recommend to first have a clear problem definition of what you are trying to do, think first, code second. Some things you might want to take into account:
- Entities & functions: What models, datasets, vector spaces, and mappings are involved on your problem?
- Goals & metrics: What do you want to achieve, and how will performance be measured?
- Parameters to vary: Which hyperparameters, models, processes, or data splits will change?
- Sanity checks: How will you validate results and avoid errors?
- Infrastructure & results: Where will runs execute, and how will outputs be stored and aggregated?
Answering these upfront defines a clear, repeatable roadmap for any experiment.
Hydra minimizes setup effort and maximizes flexibility:
- Centralize configuration in YAML files.
- Override any parameter at runtime via CLI.
- Use multirun mode to launch parameter sweeps automatically.
- Record each run’s config snapshot for full reproducibility.
- Scale locally or on HPC using launcher plugins with no code changes.
.
├── modules/ # Code for training, evaluation, and utilities
│ ├── training/
│ ├── utils/
├── data/ # Data, model checkpoints, and generated reports
│ ├── datasets/
│ ├── models/
│ └── reports/
├── config/ # Hydra configuration files
│ ├── train_model.yaml
│ ├── report.yaml
├── runs/ # Scripts to run experiments and workflows
│ ├── train.py
│ ├── report.py
│ └── run_all_tasks.*
└── env_setup/ # Environment setup files
├── Dockerfile
└── requirements.txt- Separates code, data, and configuration.
- Makes debugging and extension easier.
- Helps other people understand your work quickly.
A well-defined environment makes the code portable to different machines (local PC, lab server, or HPC cluster).
All required packages are listed in ./env_setup/requirements.txt.
Use either Conda or Virtualenv to create an isolated environment.
Using Conda
conda create --prefix ./.venv python=3.10.4
conda activate ./.venv
pip install -r ./env_setup/requirements.txtUsing Virtualenv
python -m venv .venv
source .venv/bin/activate # Linux/Mac
.venv\Scripts\activate # Windows
pip install -r ./env_setup/requirements.txt- Keeps dependencies under control.
- Prevents conflicts with other Python projects.
In this repository, a task is any shell call that triggers one logical step, such as a single training run or a report-generation job.
Run single tasks:
python runs/train.py # one training run
python runs/report.py # aggregate previous runs and build a reportRun the entire pipeline:
# Linux/Mac
./run_all_tasks.sh
# Windows
run_all_tasks.bat- Lets you test one step at a time or run everything in one command.
- Saves time when launching many experiments.
Hydra lets you launch Python functions from the command line and manage configuration files.
- Every YAML file in
./config/defines default settings for one part of the project (model, data, training, launcher). - You can override any field directly in the shell with
key=value. - Multirun mode runs many configurations back-to-back and stores each result in its own folder.
python runs/train.py # uses defaults in config/Pass command-line overrides:
python runs/train.py training.epochs=10 model=net5python runs/train.py --multirun model=net2,net5 training.epochs=2,5Hydra creates one sub-folder per setting.
python runs/train.py +experiment=sweep_modelsThe file ./config/experiment/sweep_models.yaml lists all overrides for this sweep.
Launchers let you run many jobs in parallel on one machine or an HPC cluster.
Launcher configs live in ./config/launcher/.
See the Hydra launcher docs: https://hydra.cc/docs/advanced/launcher_plugins/.
# Local CPU parallelism with joblib
python runs/train.py --multirun +launcher=joblib
# Slurm cluster
python runs/train.py --multirun +launcher=slurm
# Slurm with GPUs
python runs/train.py --multirun +launcher=slurmgpu- You can explore many settings with a single command.
- Hydra records every config, so you know exactly what produced each result.
- The same code you were running in your machine, can be run in a cluster with minimal changes or coding rabbit-holes (mostly).
Besides Python and Hydra, two tools are worth adding to your workflow.
Docker packages your code, environment, and dependencies into one container, so it runs the same on any operating system.
docker build -t example ./env_setup
docker run -d --rm --name example --gpus all -v $(pwd):/home/example example bash- Handy for sharing your work or moving it to a server.
- Removes “works on my machine” problems.
rclone moves large datasets between your workstation and remote storage (e.g., S3, Google Drive).
rclone config
rclone sync ./data/datasets remote:bucket/path -P --transfers=8- Keeps local disks clean and backed up.
- Speeds up transfers to HPC clusters.
This project is meant to help students organize and run machine-learning experiments. What's important is the abstract idea of organizing your thoughts and your code, not this specific implementation of it.
- Hydra Documentation
- Reproducibility in Machine Learning
- Ten Simple Rules for Reproducible Research
- ML Experiment Tracking Tools
- Hydra Launcher Plugins
- Docker Official Docs
- Rclone Documentation
The exact folder names and tools can change, but the key idea stays the same: a clear, automated workflow makes large-scale experimentation faster, easier to debug, and easier for others to reproduce.