Boshen Xu

About

My research interests include video understanding and embodied AI. I'm currently focus on egocentric vision and related topics that benefit VR/AR/Embodied AI.

I am a second-year PhD student at Renmin University of China (RUC) under the supervision of Professor Qin Jin at AIM3 Lab. Prior to joining RUC, I got my bachelor degree from School of Computer Science and Engineering, University of Electronic Science and Technology of China (UESTC). I got a GPA 3.97/4.00(90.41/100), ranked 2/65 in the major.

Email  /  CV  /  Github  /  Google Scholar

profile photo

Publications

* denotes equal contributions.
clean-usnob Unveiling Visual Biases in Audio-Visual Localization Benchmarks
Liangyu Chen, Zihao Yue, Boshen Xu, Qin Jin
ECCV AVGenL Workshop, 2024
arxiv

We reveal that current audio-visual source localization benchmarks (VGG-SS, Epic-Sounding-Object) are easily hacked by vision-only models, therefore calling for a benchmark that requires more audio cues.

clean-usnob EgoNCE++: Do Egocentric Video-Language Models Really Understand Hand-Object Interactions?
Boshen Xu, Ziheng Wang*, Yang Du* , Zhinan Song, Sipeng Zheng*, Qin Jin
arxiv, 2024
code / arxiv

We propose EgoNCE++, an asymmetric contrastive learning pretraining objective for solving the lack of open-vocabulary EgoHOI recognition ability of EgoVLMs .

clean-usnob SPAFormer: Sequential 3D Part Assembly with Transformers
Boshen Xu, Sipeng Zheng, Qin Jin
3DV, 2025
code / arxiv

We present SPAFormer, a transformer-based framework that leverages assembly sequences constraints with three part encodings to address the combinatorial explosion challenge in 3D-PA task.

clean-usnob Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-View World
Boshen Xu, Sipeng Zheng, Qin Jin
ACM MM, 2023
project page / code / arxiv

We propose POV, a view adaptation framework that enables transfer learning from multi-view third-person videos to egocentric videos.

clean-usnob Open-Category Human-Object Interaction Pre-Training via Language Modeling Framework
Sipeng Zheng, Boshen Xu, Qin Jin
CVPR, 2023

We introduce OpenCat, a language modeling framework that reformulates HOI prediction as sequence generation.

Awards

  • 2023, Outstanding Graduate, Sichuan, China
  • 2021, Tencent Special Scholarship, UESTC & Tencent
  • 2021, Second Prize of China Undergraduate Mathematical Contest in Modeling, China
  • 2020, National Scholarship, UESTC, China

Services

  • Conference Reviewer for ICLR, ACM MM, ACCV.
  • Teaching Assistant for Multimedia Application Technology (RUC 2024 Fall).

Feel free to steal this website's template. Inspired by Jon's website.