Hongwei Xue

I am a 5th-year PhD student at University of Science and Technology of China (USTC), My advisors are Jiebo Luo and Houqiang Li. I received the B.S. degree from School of the Gifted Young, USTC in 2019.

I was fortunte to be involved in internship program at Tencent WeChat, Microsoft Azure Cognitive Service, Shanghai AI Lab, Microsoft Research Asia (MSRA).

Email  /  CV  /  Google Scholar  /  Github

profile photo
Research

I'm interested in computer vision, machine learning, and multimedia. Much of my research is about Vision-and-Language Pre-training. Representative papers are highlighted. (* indicates equal contribution)

MaskAlign Stare at What You See: Masked Image Modeling without Reconstruction
Hongwei Xue, Peng Gao, Hongyang Li, Yu Qiao, Hao Sun, Houqiang Li, Jiebo Luo
CVPR, 2023
[PDF] [arXiv] [Code]

we propose an efficient MIM paradigm named MaskAlign. MaskAlign simply learns the consistency of visible patch features extracted by the student model and intact image features extracted by the teacher model.

CLIP-VIP CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment
Hongwei Xue*, Yuchong Sun*, Bei Liu, Jianlong Fu, Ruihua Song, Houqiang Li, Jiebo Luo
ICLR, 2023
[PDF] [arXiv] [Code] [PaperWithCode]

We adapt image-text pre-trained models to video-text pre-training (i.e., post-pretraining). In this work, we propose an Omnisource Cross-modal Learning method equipped with a Video Proxy mechanism on the basis of CLIP, namely CLIP-ViP.

LF-VILA Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Yuchong Sun, Hongwei Xue, Ruihua Song, Bei Liu, Huan Yang, Jianlong Fu
NeurIPS, 2022
[PDF] [arXiv] [Code]

We introduce a Long-Form VIdeo-LAnguage pre-training model (LF-VILA) and train it on a large-scale long-form video and paragraph dataset constructed from HD-VILA-100M.

hdvila Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions
Hongwei Xue*, Tiankai Hang*, Yanhong Zeng*, Yuchong Sun*, Bei Liu, Huan Yang, Jianlong Fu, Baining Guo
CVPR, 2022
[PDF] [arXiv] [Code]

We collect a large dataset which is the first high-resolution dataset including 371.5k hours of 720p videos and the most diversified dataset covering 15 popular YouTube categories.

visualparsing Probing Inter-modality: Visual Parsing with Self-Attention for Vision-and-Language Pre-training
Hongwei Xue, Yupan Huang, Bei Liu, Houwen Peng, Jianlong Fu, Houqiang Li, Jiebo Luo
NeurIPS, 2021
[PDF] [arXiv] [Supp] [Presentation]

We propose a fully Transformer model for Vision-and-Language pre-training and explore to study the inter-modal interaction.

landscape Learning fine-grained motion embedding for landscape animation
Hongwei Xue, Yupan Huang, Bei Liu, Huan Yang, Jianlong Fu, Houqiang Li, Jiebo Luo
ACM MM Oral, 2021
[PDF] [arXiv]

We propose a model named FGLA to generate high-quality and realistic videos by learning Fine-Grained motion embedding for Landscape Animation.

unifiying Unifying Multimodal Transformer for Bi-directional Image and Text Generation
Yupan Huang, Hongwei Xue, Bei Liu, Yutong Lu
ACM MM, 2021
[PDF] [arXiv] [Code]

In this work, we propose a unified image-and-text generative framework based on a single multimodal model to jointly study the bi-directional tasks.

Semantic Tag Augmented XlanV Model for Video Captioning Semantic Tag Augmented XlanV Model for Video Captioning
Yiqing Huang*, Hongwei Xue*, Jiansheng Chen, Huimin Ma, Hongbing Ma
ACM MM, 2021

We propose to leverage the semantic tags to bridge the gap between the modalities of vision and language rather than directly concatenating or attending to the visual and linguistic features.

Sed-Net Sed-Net: Detecting Multi-Type Edits Of Images
Hongwei Xue, Haomiao Liu, Jun Li, Houqiang Li, Jiebo Luo
ICME, 2020

We propose a deep Siamese network model to to classify different types of image edits between an original image and an edited image.

Misc
  • Reviewer for ACM MM 2021, ICMR 2021, ICME 2022, TMM 2022, CVPR 2023, ICCV 2023.
  • National Scholarship in 2021.
  • Scholarship for Excellent Students in 2016, 2017, 2018. Freshmen Scholarship 2015.

Based on Jon Barron's website.