I'm Xingjian Diao, a Ph.D. candidate in Computer Science at Dartmouth College 🌲, co-advised by Prof. Soroush Vosoughi and Prof. Jiang Gui. During my Ph.D. at Dartmouth, I interned twice at Amazon on computer vision and robotics (Summer 2025) and VLM systems (Summer 2026), and at Samsung Research America on parametric memory (Spring 2026).
Previously, I completed my M.S. in Computer Science at Northwestern University 💜, advised by Prof. Nabil Alshurafa. I received my B.S. in Computer Science from the University of Pittsburgh 💙, graduating with Cum Laude honors.
My research focuses on multimodal learning for video, audio, and language understanding. I develop methods for multimodal reasoning, efficient multimodal learning, and VLM-based GUI agents, with the goal of building scalable and generalizable models for multimodal question answering and agentic decision-making in complex, dynamic real-world environments. Highlights of my work include:
-
Doc-to-Atom: Learning to Compile and Compose Memory Atoms
arXiv Preprint 2026
Xingjian Diao, Wenbo Li, Yashas Malur Saidutta, Avinash Amballa, Lazar Valkov, Srinivas Chappidi -
Addressing Overthinking in Large Vision-Language Models via Gated Perception-Reasoning Optimization
Findings of ACL 2026
Xingjian Diao, Zheyuan Liu, Chunhui Zhang, Weiyi Wu, Keyi Kong, Lin Shi, Kaize Ding, Soroush Vosoughi, Jiang Gui -
SoundMind: RL-Incentivized Logic Reasoning for Audio-Language Models
EMNLP 2025 — (Oral Presentation, top 4.35%)
Xingjian Diao, Chunhui Zhang, Keyi Kong, Weiyi Wu, Chiyu Ma, Zhongyu Ouyang, Peijun Qing, Soroush Vosoughi, Jiang Gui -
ProtoVQA: An Adaptable Prototypical Framework for Explainable Fine-Grained Visual Question Answering
EMNLP 2025 — (Oral Presentation, top 4.35%)
Xingjian Diao, Weiyi Wu, Keyi Kong, Peijun Qing, Xinwen Xu, Ming Cheng, Soroush Vosoughi, Jiang Gui -
Temporal Working Memory: Query-Guided Temporal Segment Refinement for Enhanced Multimodal Understanding
Findings of NAACL 2025 — Guarini Graduate Student Travel Award (Dartmouth College)
Xingjian Diao, Chunhui Zhang, Weiyi Wu, Zhongyu Ouyang, Peijun Qing, Ming Cheng, Soroush Vosoughi, Jiang Gui -
Learning Musical Representations for Music Performance Question Answering
Findings of EMNLP 2024 — BMDS Travel Award (Dartmouth College)
Xingjian Diao, Chunhui Zhang, Tingxuan Wu, Ming Cheng, Zhongyu Ouyang, Weiyi Wu, Jiang Gui -
Learning Sparsity for Effective and Efficient Music Performance Question Answering
ACL 2025
Xingjian Diao, Tianzhen Yang, Chunhui Zhang, Weiyi Wu, Ming Cheng, Jiang Gui -
FT2TF: First-Person Statement Text-To-Talking Face Generation
WACV 2025
Xingjian Diao, Ming Cheng, Wayner Barrios, SouYoung Jin
-
Amazon Science (Jun 2026 – Sept 2026)
Applied Scientist Intern, Sunnyvale, CA
Research on vision language models. -
Samsung Research America (Mar 2026 – Jun 2026)
NLP Research Intern, Mountain View, CA
Research on agentic memories. -
Amazon Science (Jun 2025 – Sept 2025)
Applied Scientist Intern, Santa Cruz, CA
Research on computer vision and robotics.

