Publications
* denotes equal contributions.
|
|
Unveiling Visual Biases in Audio-Visual Localization Benchmarks
Liangyu Chen, Zihao Yue, Boshen Xu, Qin Jin
ECCV AVGenL Workshop, 2024
arxiv
We reveal that current audio-visual source localization benchmarks (VGG-SS, Epic-Sounding-Object) are easily hacked by vision-only models, therefore calling for a benchmark that requires more audio cues.
|
|
EgoNCE++: Do Egocentric Video-Language Models Really Understand Hand-Object Interactions?
Boshen Xu, Ziheng Wang*, Yang Du* , Zhinan Song, Sipeng Zheng*, Qin Jin
arxiv, 2024
code
/
arxiv
We propose EgoNCE++, an asymmetric contrastive learning pretraining objective for solving the lack of open-vocabulary EgoHOI recognition ability of EgoVLMs .
|
|
SPAFormer: Sequential 3D Part Assembly with Transformers
Boshen Xu, Sipeng Zheng, Qin Jin
arxiv, 2024
code
/
arxiv
We present SPAFormer, a transformer-based framework that leverages assembly sequences constraints with three part encodings to address the combinatorial explosion challenge in 3D-PA task.
|
|
Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-View World
Boshen Xu, Sipeng Zheng, Qin Jin
ACM MM, 2023
project page
/
code
/
arxiv
We propose POV, a view adaptation framework that enables transfer learning from multi-view third-person videos to egocentric videos.
|
|
Open-Category Human-Object Interaction Pre-Training via Language Modeling Framework
Sipeng Zheng, Boshen Xu, Qin Jin
CVPR, 2023
We introduce
OpenCat, a language modeling framework that reformulates HOI prediction as sequence generation.
|
Awards
- 2023, Outstanding Graduate, Sichuan, China
- 2021, Tencent Special Scholarship, UESTC & Tencent
- 2021, Second Prize of China Undergraduate Mathematical Contest in Modeling, China
- 2020, National Scholarship, UESTC, China
Services
- Conference Reviewer for ICLR, ACM MM, ACCV.
|
|