YOLOv8 Synthetic Data Generator & Trainer

A project for the HackSeoul 2025 Hackathon.

This repository provides a complete, configuration-driven pipeline for generating synthetic object detection datasets and using them to fine-tune a YOLOv8 model. The entire workflow is managed through an interactive CLI, making it easy to generate data and kick off training.

This project was recently refactored to improve structure and usability.

🚀 Features

Interactive CLI: A user-friendly command-line interface to guide you through dataset generation and training.
Synthetic Dataset Generation: Automatically combine transparent object images with various backgrounds to create a rich dataset from scratch.
Extensive Data Augmentation: Leverages the albumentations library for a wide range of powerful image augmentation techniques.
Negative Sample Support: Include background images without objects (negative samples) to improve model robustness.
Configuration-Driven: All settings—from file paths and class names to augmentation and training parameters—are controlled via a single config.yaml file.
Automated Training Pipeline: Seamlessly transitions from dataset generation to model training with YOLOv8.

📂 Project Structure

/
├─── config.yaml              # Main configuration file for the entire pipeline.
├─── main.py                  # Entry point: Runs the interactive CLI.
├─── README.md                # This file.
└─── src/
     ├─── data_generator.py    # Handles synthetic data generation and augmentation.
     ├─── trainer.py           # Manages the YOLOv8 model training process.
     └─── utils.py             # Helper functions for file loading and image processing.

🛠️ How to Use

Step 1: Installation

First, install the necessary Python packages. It is recommended to use a virtual environment.

pip install ultralytics questionary opencv-python pyyaml numpy albumentations tqdm

Step 2: Configuration

Modify the config.yaml file to match your project setup. This is the most important step.

Define Classes: Under classes, define each object you want to detect.
- class_name: The name of the object.
- class_id: A unique integer ID, starting from 0.
- object_images_dir: Crucially, provide the path to a folder containing transparent PNG images of this object.
Set Paths:
- background_images_dir: Provide the path to a folder of background images.
- negative_images_dir: (Optional) Provide the path to a folder of images with no objects.
Adjust Parameters:
- Review augmentation parameters in augmentation_params.
- Review training parameters in training_params (e.g., epochs, batch size, device).

Step 3: Run the Pipeline

Once config.yaml is set up, run the interactive CLI from your terminal:

python main.py

You will be presented with the following options:

Generate dataset: Creates the synthetic dataset based on your config. The output will be in a new folder named after dataset_name in your config.
Train YOLO model: Starts training using an existing dataset. It will first generate the required dataset.yaml for YOLO.
Generate & Train All: A full-pipeline option that first generates the data and then immediately starts training.
Exit: Close the program.

Follow the on-screen prompts to complete your desired task.

⚙️ Configuration Details (`config.yaml`)

dataset_name: The name of the folder where your final dataset will be stored.
train_ratio: The split ratio between training and validation data (e.g., 0.8 for 80% train, 20% val).
canvas_size: The resolution of the images to be generated (e.g., 640 for 640x640).
classes: A list of objects to be learned. Each object needs a path to its source images.
augmentation_params: Fine-tune the data augmentation pipeline. See the Albumentations documentation for more options.
training_params: Control the YOLOv8 training process, including the base model, epochs, batch size, and hardware settings.

Developed for HackSeoul 2025.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

YOLOv8 Synthetic Data Generator & Trainer

🚀 Features

📂 Project Structure

🛠️ How to Use

Step 1: Installation

Step 2: Configuration

Step 3: Run the Pipeline

⚙️ Configuration Details (`config.yaml`)

About

Uh oh!

Releases

Packages

Languages

License

hackseoul-2025/yolo-train

Folders and files

Latest commit

History

Repository files navigation

YOLOv8 Synthetic Data Generator & Trainer

🚀 Features

📂 Project Structure

🛠️ How to Use

Step 1: Installation

Step 2: Configuration

Step 3: Run the Pipeline

⚙️ Configuration Details (config.yaml)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

⚙️ Configuration Details (`config.yaml`)

Packages