Senior AI Product Manager focused on evaluating, monitoring, and hardening machine learning systems in production.
I build tools for:
- Medical AI model monitoring (real-world drift detection, calibration analysis, post-market evaluation)
- Statistical evaluation frameworks for calibration, threshold selection, and robust performance analysis
- LLM behavioral reliability (drift, safety boundaries, consistency) — in progress
My work sits at the intersection of research and product, with an emphasis on making AI systems measurable, predictable, and operationally safe.
-
Model Eval & Drift Lab
Tools for detecting distribution shift, assessing calibration, and stress-testing deployed ML systems. -
GPT-Drift
Lightweight behavioral fingerprinting to detect silent behavior changes in LLM APIs.