Didi Zhu (朱迪迪)

01

Using structure and theory to make vision-language models generalize beyond familiar distributions.

VLMs Prompt Tuning Neural Collapse

02

Understanding how multimodal models forget, merge, and reuse specialized capabilities.

Model Tailor REMEDY LoRA Merging

03

Eliciting reliable multimodal reasoning through lightweight reinforcement learning and curated supervision.

MLLMs RL Reasoning

Research Map