Improved Training Loop: Add Validation and Best Checkpoint Saving #23

DevManpreet5 · 2025-05-27T05:02:23Z

Changes Made

Added validation dataset support using val_dataloader
Introduced evaluate_model() for per-epoch validation loss evaluation
Checkpoints now saved when validation loss improves
Best model checkpoint stored in checkpoints/ directory
Automatically pushes best and final models to the HuggingFace Hub
Step-wise and epoch-wise logging integrated with Weights & Biases (wandb)

sergiopaniego

Nice! Thanks for the addition 😄
Left a comment for improvement!

sergiopaniego · 2025-05-27T13:59:12Z

train.py

+    total_samples = 0
+
+    with torch.no_grad():
+        for batch in val_dataloader:


Can we see a run for this training with the evaluation support?
It may run into issues here, since this code is different from the training procedure.

DevManpreet5 added 2 commits May 27, 2025 10:29

Improved Training Loop in train.py

162cb3f

added project_name for w&b tracking

bbf31ab

sergiopaniego reviewed May 27, 2025

View reviewed changes

sergiopaniego mentioned this pull request May 27, 2025

[Contributions Welcome] Improving Our Fine-Tuning Pipeline #12

Open

10 tasks

ajaymin28 mentioned this pull request Jun 17, 2025

lora/qlora training w/wo unsloth on NVIDIA L4 (~20GB VRAM usage) #33

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improved Training Loop: Add Validation and Best Checkpoint Saving #23

Improved Training Loop: Add Validation and Best Checkpoint Saving #23

Uh oh!

DevManpreet5 commented May 27, 2025

Uh oh!

sergiopaniego left a comment

Uh oh!

sergiopaniego May 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Improved Training Loop: Add Validation and Best Checkpoint Saving #23

Are you sure you want to change the base?

Improved Training Loop: Add Validation and Best Checkpoint Saving #23

Uh oh!

Conversation

DevManpreet5 commented May 27, 2025

Changes Made

Uh oh!

sergiopaniego left a comment

Choose a reason for hiding this comment

Uh oh!

sergiopaniego May 27, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants