Research Map

01

Align

Using structure and theory to make vision-language models generalize beyond familiar distributions.

VLMs Prompt Tuning Neural Collapse
02

Compose

Understanding how multimodal models forget, merge, and reuse specialized capabilities.

Model Tailor REMEDY LoRA Merging
03

Reason

Eliciting reliable multimodal reasoning through lightweight reinforcement learning and curated supervision.

MLLMs RL Reasoning