I specialize in building multimodal AI systems that bridge vision, language, and 3D understanding. My research focuses on vision-language models, LLM fine-tuning, and generative AI, turning cutting-edge research into production-ready solutions. Prior to that, I completed my Ph.D. at the University of Oxford's Visual Geometry Group (VGG) working on multimodal learning, generative models, and 3D reconstruction.
Production-ready CLIP-based multimodal model training only a lightweight fusion network achieving 85-95% SOTA performance with 10 times fewer parameters.
View Project →Production RAG system for decision support integrating multimodal inputs.
View Project →Conditional diffusion models for generating realistic synthetic financial time series.
View Project →Safe Reinforcement Learning with Normalizing Flows for Uncertainty Quantification in Time Series.
View Project →Developing one billion labeled masks for generalizable 3D segmentation across diverse domains, enabling large-scale training of AI models.
View Project →Automated framework using Vision Transformers to estimate 3D shape from single 2D images, achieving state-of-the-art reconstruction accuracy.
View Project →Cross-sectional diffusion model for generating complete 3D volumes from single slices, setting new benchmarks in volumetric synthesis.
View Project →