I build multimodal ML systems end-to-end from large-scale data pipelines and model training to evaluation infrastructure and production deployment. My work spans VLM post-training (SFT, DPO, RLHF-style preference optimisation), hallucination and faithfulness measurement, and evaluation tooling for vision-language models at scale.
I hold a PhD from the University of Oxford (Visual Geometry Group), where I published at top venues and built production-quality codebases for multimodal learning, 3D reconstruction, and generative models.
Pip-installable Python package for evaluating multimodal RAG systems. Measures retrieval quality, hallucination rate, answer faithfulness, and cross-modal alignment. Supports checkpoint comparison and automated regression testing for CI integration.
pip install mmeval-vrag
Agentic evaluation toolkit for multimodal scientific reasoning. Generates verified reasoning traces and SFT/DPO preference data for post-training pipelines. Designed for integration into automated model-improvement loops.
End-to-end pipeline generating 1B+ labelled masks across 48K+ datasets for generalizable 3D segmentation. Semi-supervised learning with automated quality control.
Vision-language adaptation using frozen CLIP/BLIP embeddings. Strong multimodal performance with significantly fewer trainable parameters. Designed for fast iteration and low compute cost.
Retrieval-augmented generation system with grounding diagnostics, retrieval quality metrics, and evaluation of multimodal outputs. Built for decision-support use cases.
Cross-sectional diffusion model generating complete 3D volumes from sparse inputs. Production-oriented codebase with reproducible training and inference pipelines.
Scalable multi-view generation pipeline for 3D object reconstruction from single images with flexible viewpoint conditioning.