Research Map
Align
Using structure and theory to make vision-language models generalize beyond familiar distributions.
Compose
Understanding how multimodal models forget, merge, and reuse specialized capabilities.
Reason
Eliciting reliable multimodal reasoning through lightweight reinforcement learning and curated supervision.