David DavidBellamy

Hi, I'm David Bellamy.

I am a Post-Training & Reasoning Researcher and a Harvard PhD Statistician.

I am currently on an engineering sabbatical, building bare-metal reasoning stacks to understand the numerics of Reinforcement Learning for LLMs. My focus is on scaling inference-time compute and designing non-parametric value aggregation methods.

Featured Projects

Project	Description	Stack
grpo-gsm8k	DeepSeek-R1 Reproduction: A bare-metal implementation of Group Relative Policy Optimization (GRPO) on GSM8k. Decoupled training loop (Torch) and inference (vLLM) on distributed GPUs.	`PyTorch` `vLLM`
suttonbarto	RL Theory: Rigorous Python implementations and mathematical proofs for exercises in Sutton & Barto's Reinforcement Learning.	`Python` `LaTeX` `NumPy`
Labrador	ML4H Best Paper: Code for Limits of Masked Language Modeling, benchmarking Transformers vs. XGBoost on tabular EHR data.	`TensorFlow`

Research Highlights

DeepSeek-R1 Replication: Achieved 83.2% Pass@1 on GSM8k using GRPO, matching SFT baselines while recovering reasoning capabilities. Read the W&B Report.
Optimization: Implemented length-aware batch packing for SFT, reducing padding overhead from 50% → 21%.
Best Paper Award (ML4H 2024): Demonstrated empirical limits of transfer learning in medical tabular data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

David DavidBellamy

Achievements

Achievements

Highlights

Block or report DavidBellamy

Hi, I'm David Bellamy.

Featured Projects

Research Highlights

Pinned Loading

Uh oh!