I am a Post-Training & Reasoning Researcher and a Harvard PhD Statistician.
I am currently on an engineering sabbatical, building bare-metal reasoning stacks to understand the numerics of Reinforcement Learning for LLMs. My focus is on scaling inference-time compute and designing non-parametric value aggregation methods.
| Project | Description | Stack |
|---|---|---|
| grpo-gsm8k | DeepSeek-R1 Reproduction: A bare-metal implementation of Group Relative Policy Optimization (GRPO) on GSM8k. Decoupled training loop (Torch) and inference (vLLM) on distributed GPUs. | PyTorch vLLM |
| suttonbarto | RL Theory: Rigorous Python implementations and mathematical proofs for exercises in Sutton & Barto's Reinforcement Learning. | Python LaTeX NumPy |
| Labrador | ML4H Best Paper: Code for Limits of Masked Language Modeling, benchmarking Transformers vs. XGBoost on tabular EHR data. | TensorFlow |
- DeepSeek-R1 Replication: Achieved 83.2% Pass@1 on GSM8k using GRPO, matching SFT baselines while recovering reasoning capabilities. Read the W&B Report.
- Optimization: Implemented length-aware batch packing for SFT, reducing padding overhead from 50% → 21%.
- Best Paper Award (ML4H 2024): Demonstrated empirical limits of transfer learning in medical tabular data.

