I'm an empirical AI safety researcher and ML systems engineer. I build benchmarks and evaluation pipelines that test how robust, interpretable, and reliable language models actually are — fine-tuning attack resistance, mechanistic interpretability, scaling laws, and adversarial robustness — and I publish what I find, including the results that don't confirm my hypotheses.
I'm the founder of Plum AI Labs, where this research lives alongside two first-author publications. I also write about LLM evaluation and AI safety at Applied Alignment.
- Fine-tuning attack resistance & safety evaluation — does alignment hold up under adversarial fine-tuning, and do our evaluations actually measure what we think they measure?
- Mechanistic interpretability — logit lens, activation patching, sparse autoencoders on transformers built from scratch
- Scaling laws & GPU systems — recovering empirical scaling relationships, benchmarking
torch.compileand Triton kernels against eager PyTorch - Production ML systems — pipelines, monitoring, and infrastructure that actually ship
| Project | What it does |
|---|---|
| SafetyLens | Measures divergence between eval-framed and deployment-framed model behaviour across model scale |
| AudioGuard | Adversarial robustness evaluation for audio classifiers — FGSM, PGD, adversarial training |
| JAX Interpretability | Mechanistic interpretability on a transformer built from scratch — logit lens, activation patching, SAEs |
| ScaleTrace | Empirically recovers Chinchilla-style scaling laws from a small training grid |
| KernelBench | GPU kernel benchmarking — PyTorch eager vs. torch.compile vs. Triton |
| e2e-ml-pipeline | Production-grade ML pipeline with DVC, MLflow, and Dockerized FastAPI deployment |
Infrastructure & AI Systems Engineer by day, independent researcher the rest of the time. My engineering background (Python, PyTorch, JAX, Docker, Kubernetes, AWS, distributed data systems) is what lets me build and run the experiments behind the research, not just theorize about them.
If you're working on AI safety evaluation, interpretability, or adversarial robustness — or just want to talk shop about empirical ML research — I'd love to connect.


