Agentic Reinforcement Learning for Conference Meeting Optimization

This project implements a multi-stage training pipeline for optimizing meeting recommendations at conferences using agentic reinforcement learning techniques. You'll build an intelligent agent that learns to predict successful meeting outcomes and make optimal recommendations.

Getting Started

This starter code provides the framework for implementing supervised fine-tuning (SFT) and reinforcement learning fine-tuning (RLFT) using Direct Preference Optimization (DPO).

Dependencies

torch>=2.0.0
transformers>=4.36.0
peft>=0.7.0
trl>=0.7.0
datasets>=2.14.0
pandas>=2.0.0
numpy>=1.24.0
npcpy>=0.1.0

Installation

Clone this repository to your local machine
Create a virtual environment (recommended):

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install required dependencies:

pip install torch transformers peft trl datasets pandas numpy npcpy

Verify installation by checking imports:

python -c "import torch, transformers, peft, trl, npcpy; print('All dependencies installed successfully')"

Project Structure

starter/
├── data_classes.py           # Core data models and simulation logic
├── starter_sft.py            # Stage 1: Supervised Fine-Tuning implementation
├── starter_agentic_traces.py # Stage 2: Agent trace collection
└── starter_agentic_rlft.py   # Stage 3: Reinforcement Learning Fine-Tuning

Project Instructions

Stage 1: Supervised Fine-Tuning (SFT)

Open starter/starter_sft.py
Complete the SFTConfig dataclass by filling in the 'YOUR CODE HERE' placeholders:
- Set appropriate LoRA parameters (r, alpha, dropout)
- Configure training hyperparameters (epochs, learning rate, weight decay)
- Determine overfitting threshold
- Set max_new_tokens for generation
Run the SFT training:
```
python starter/starter_sft.py
```
Monitor training loss and validation metrics

Stage 2: Agent Trace Collection

Open starter/starter_agentic_traces.py
Review the agent personas and their decision-making strategies
Implement any missing tool functions (marked with 'YOUR CODE HERE')

Generate agent traces:

python starter/starter_agentic_traces.py

Examine the generated CSV file with trace data

Stage 3: Reinforcement Learning Fine-Tuning (RLFT)

Open starter/starter_agentic_rlft.py
Complete the calculate_reward() function with appropriate reward values
Implement the preference pairing strategy in create_preference_dataset_from_traces()
Run RLFT with DPO:
```
python starter/starter_agentic_rlft.py
```
Evaluate the improved model performance

Testing

Unit Tests

Test individual components:

# Test the reward function
from starter_agentic_rlft import calculate_reward
trace = {"final_recommendation_parsed": {"recommendation": "YES"},
         "tools_used": ["tool1"],
         "completed_naturally": True,
         "ground_truth": 0.8}
reward = calculate_reward(trace)
assert -1.0 <= reward <= 1.0

Integration Testing

Verify SFT model saves correctly to models/sft_prediction_model_gemma_270m
Check agent traces are saved to CSV with all required fields
Confirm RLFT model shows improved accuracy over baseline

Evaluation Metrics

SFT: Correlation and MAE between predicted and ground truth probabilities
Agent Traces: Completion rate, tool usage, and recommendation accuracy
RLFT: Final accuracy improvement over SFT baseline

Deliverables

Completed Code: All 'YOUR CODE HERE' placeholders filled with working implementations
Trained Models: SFT and RLFT model checkpoints
Training Report: Document including:
- Chosen hyperparameters and justification
- Training curves (loss, validation metrics)
- Performance comparison between stages
- Analysis of agent behavior patterns
- Lessons learned and challenges faced

Tips for Success

Start with conservative hyperparameters to avoid overfitting
Monitor training loss carefully - stop if it drops too low (< 0.01 suggested)
Experiment with different reward structures in RLFT
Use smaller batch sizes if running on limited hardware
Save checkpoints frequently during long training runs

Built With

PyTorch - Deep learning framework
Transformers - Pre-trained models and training utilities
PEFT - Parameter-efficient fine-tuning
TRL - Transformer Reinforcement Learning
NPCPy - Agent framework for NPC interactions

License

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
starter		starter
.gitignore		.gitignore
CODEOWNERS		CODEOWNERS
LICENSE.txt		LICENSE.txt
LOCAL_SETUP.md		LOCAL_SETUP.md
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Agentic Reinforcement Learning for Conference Meeting Optimization

Getting Started

Dependencies

Installation

Project Structure

Project Instructions

Stage 1: Supervised Fine-Tuning (SFT)

Stage 2: Agent Trace Collection

Stage 3: Reinforcement Learning Fine-Tuning (RLFT)

Testing

Unit Tests

Integration Testing

Evaluation Metrics

Deliverables

Tips for Success

Built With

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

udacity/cd14714-agentic-reinforcement-learning-project_starter-exercises

Folders and files

Latest commit

History

Repository files navigation

Agentic Reinforcement Learning for Conference Meeting Optimization

Getting Started

Dependencies

Installation

Project Structure

Project Instructions

Stage 1: Supervised Fine-Tuning (SFT)

Stage 2: Agent Trace Collection

Stage 3: Reinforcement Learning Fine-Tuning (RLFT)

Testing

Unit Tests

Integration Testing

Evaluation Metrics

Deliverables

Tips for Success

Built With

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages