Skip to content
View DavidBellamy's full-sized avatar

Highlights

  • Pro

Block or report DavidBellamy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
DavidBellamy/README.md

Python PyTorch vLLM Kubernetes Docker W&B

Hi, I'm David Bellamy.

I am a Post-Training & Reasoning Researcher and a Harvard PhD Statistician.

I am currently on an engineering sabbatical, building bare-metal reasoning stacks to understand the numerics of Reinforcement Learning for LLMs. My focus is on scaling inference-time compute and designing non-parametric value aggregation methods.


Featured Projects

Project Description Stack
grpo-gsm8k DeepSeek-R1 Reproduction: A bare-metal implementation of Group Relative Policy Optimization (GRPO) on GSM8k. Decoupled training loop (Torch) and inference (vLLM) on distributed GPUs. PyTorch vLLM
suttonbarto RL Theory: Rigorous Python implementations and mathematical proofs for exercises in Sutton & Barto's Reinforcement Learning. Python LaTeX NumPy
Labrador ML4H Best Paper: Code for Limits of Masked Language Modeling, benchmarking Transformers vs. XGBoost on tabular EHR data. TensorFlow

Research Highlights

  • DeepSeek-R1 Replication: Achieved 83.2% Pass@1 on GSM8k using GRPO, matching SFT baselines while recovering reasoning capabilities. Read the W&B Report.
  • Optimization: Implemented length-aware batch packing for SFT, reducing padding overhead from 50% → 21%.
  • Best Paper Award (ML4H 2024): Demonstrated empirical limits of transfer learning in medical tabular data.

Pinned Loading

  1. grpo-gsm8k grpo-gsm8k Public

    RL post-training open LLMs for math reasoning

    Python 1

  2. suttonbarto suttonbarto Public

    Solutions to the exercises in Sutton & Barto's textbook Reinforcement Learning: An Introduction

    Python

  3. labrador labrador Public

    Labrador: Exploring the Limits of Masked Language Modeling for Laboratory Data.

    Python 14 2

  4. beamlab-hsph/Neural-Moment-Matching-Regression beamlab-hsph/Neural-Moment-Matching-Regression Public

    Code for our NeurIPS 2022 work titled "Deep Learning Methods for Proximal Inference via Maximum Moment Restriction"

    Python 4 3