I am a Research Engineer building evaluation and safety infrastructure for multimodal and agent systems, with prior large-scale medical/multimodal benchmark experience. My work spans scientific AI, VLM post-training (SFT, DPO, RLHF-style preference optimisation), VLM/RAG evaluation, diffusion models, and geometry-aware learning, with industry experience across GE HealthCare, Novartis, and QuantCo.
I hold a PhD from the University of Oxford (Visual Geometry Group), where I published at top venues and built production-quality codebases for multimodal learning and generative models.
Pip-installable Python package for evaluating multimodal RAG systems. Measures retrieval quality, hallucination rate, answer faithfulness, and cross-modal alignment. Supports checkpoint comparison and automated regression testing for CI integration.
pip install mmeval-vrag
Framework for auditing the physical state-transition commitments of vision-language models. Instead of scoring only the answer, it elicits a typed reasoning trace, verifies it with a hybrid checker, and surfaces hidden inconsistency. Includes WMW-TRACEBANK: 200 validated traces across 17 physics families plus 3,200 preference pairs.
End-to-end pipeline generating 1B+ labelled masks across 48K+ datasets for generalizable 3D segmentation. Semi-supervised learning with automated quality control.
Vision-language adaptation using frozen CLIP/BLIP embeddings. Strong multimodal performance with significantly fewer trainable parameters. Designed for fast iteration and low compute cost.
Retrieval-augmented generation system with grounding diagnostics, retrieval quality metrics, and evaluation of multimodal outputs. Built for decision-support use cases.
Cross-sectional diffusion model generating complete 3D volumes from sparse inputs. Production-oriented codebase with reproducible training and inference pipelines.
Scalable multi-view generation pipeline for 3D object reconstruction from single images with flexible viewpoint conditioning.