Long Chen

Assistant Professor
Department of Computer Science and Engineering (CSE)
School of Engineering (SENG)
The Hong Kong University of Science and Technology (HKUST)
Email: longchen A~T ust.hk
Office: Room CYT-3003, Cheng Yu Tung Building, HKUST, Clear Water Bay

Dr. Long CHEN (Chinese: 陈隆) is an assistant professor at the department of CSE, Hong Kong University of Science and Technology (HKUST). He is leading a computer vision and machine learning research group: LONG Group. Before joining HKUST, he was a postdoctoral research scientist at the DVMM Lab, Columbia University. He obtained his Ph.D. degree in Computer Science from the DCD Lab, Zhejiang University. During Ph.D. study period, he was also a visiting student at the MReal Lab, Nanyang Technological University (NTU), and the NExT Center, National University of Singapore (NUS). He obtained his B.Eng. degree from Dalian University of Technology. He was a senior research scientist at Tencent AI Lab.

His primary research directions are Computer Vision, Machine Learning, Multimedia, and Artificial Intelligence.

Recent Research Directions:

Foundation Models Efficient Finetuning: Parameter-efficient Tuning ([IterIS, CVPR’25], [ComPro, IJCV’24]), Memory-efficient Tuning ([UniPT, CVPR’24], [SHERL, ECCV’24]), Modality-efficient Tuning ([PathWeave, NeurIPS’24]), Reinforcment Learning with Human/AI Feedback (RLHF/RLAIF) ([B2-DiffuRL, CVPR’25], [Fast RL, EMNLP’24], [R3HF, arXiv’24]).
Visual Generation and Editing: Image Generation/Editing ([CLIPDrag, ICLR’25], [Free-Event, ICML’25]), Video Generation/Editing ([DisPose, ICLR’25], [Ca2-VDM, ICML’25]), 3D Mesh Generation ([Nautilus, arXiv’25]), 3D Gaussian Editing ([VcEdit, ECCV’24]).
Open-world/vocabulary Perception: Object Detection ([Survey, TPAMI’24], [CCKT-Det, ICLR’25]), Scene Graph Generation ([NICEST, TPAMI’24], [INOVA, arXiv’25], [RECORD, NeurIPS’23], Compositional Classification ([PLO, arXiv’23]), Image Classification ([Diff-II, CVPR’25]), Pose Estimation ([Di2Pose, NeurIPS’24]), Situation Recognition ([LEX, ACMMM’24]).
Multimodal Understanding and Reasoning: RL for Reasoning ([Relation-R1, arXiv’25]), Interleaved Generation ([CoMM, CVPR’25]), Hallucination ([DCD, arXiv’25]), Multimodal Editing ([DECap, ECCV’24]), Visual Question Answering ([IdealGPT, EMNLP’23 Findings]).

News

May, 2025	I will give a talk in VALSE 2025: Vision Foundation Model Workshop (视觉通用模型).
Apr, 2025	I will serve as an Associate Editor for ACM Trans. on Multimedia Computing, Communications & Applications.
Feb, 2025	I will serve as an Associate Editor for IEEE Transactions on Image Processing (TIP).
Feb, 2025	I will serve as an Area Chair for NeurIPS 2025 and an Area Char for ACM MM 2025.
Feb, 2025	We will organize a tutorial about Multimodal LLM in CVPR 2025.
Dec, 2024	I will serve as an Area Chair for ICML 2025.
Dec, 2024	I will give a talk in AAAI 2025 New Faculty Highlights program.
Nov, 2024	I will serve as a Senior PC for IJCAI 2025.
Nov, 2024	Our research group has the 4th group outing activity: Hiking in MacLehose Trail (Section 2), again!.
Sep, 2024	I was ranked as the World’s Top 2% Most-cited Scientists (in the single year 2023) by Stanford University.
Sep, 2024	I will serve as an Area Chair for CVPR 2025.
Aug, 2024	I will serve as an Area Chair for ICLR 2025.

Recent Publications

arXiv

[New!!] Relation-R1: Progressively Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relation Comprehension

Lin Li, Wei Chen, Jiahui Li, Kwang-Ting Cheng, and Long Chen

arXiv preprint (arXiv) , arXiv , Codes
arXiv

[New!!] Personalize Your Gaussian: Consistent 3D Scene Personalization from a Single Image

Yuxuan Wang, Xuanyu Yi, Qingshan Xu, Yuan Zhou, Long Chen, and Hanwang Zhang

arXiv preprint (arXiv) , arXiv
arXiv

[New!!] Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models

Wei Chen, Xin Yan, Bin Wen, Fan Yang, Tingting Gao, Di Zhang, and Long Chen

arXiv preprint (arXiv) , arXiv
arXiv

[New!!] Physics-Aware Human-Object Rendering from Sparse Views via 3D Gaussian Splatting

Weiquan Wang, Jun Xiao, Yueting Zhuang, and Long Chen

arXiv preprint (arXiv) , arXiv
arXiv

[New!!] Taking A Closer Look at Interacting Objects: Interaction-Aware Open Vocabulary Scene Graph Generation

Lin Li, Chuhan Zhang, Dong Zhang, Chong Sun, Chen Li, and Long Chen

arXiv preprint (arXiv) , arXiv
arXiv

[New!!] A Survey on Multimodal Benchmarks: In the Era of Large AI Models

Lin Li, Guikun Chen, Hanrong Shi, Jun Xiao, and Long Chen

arXiv preprint (arXiv) , arXiv , Codes
ICCV

[New!!] Nautilus: Locality-aware Autoencoder for Scalable Mesh Generation

Yuxuan Wang, Xuanyu Yi, Haohan Weng, Qingshan Xu, Xiaokang Wei, Xianghui Yang, Chunchao Guo, Long Chen, and Hanwang Zhang

International Conference on Computer Vision (ICCV) , 2025 , Codes , Website
ICML

[New!!] Event-Customized Image Generation

Zhen Wang, Yilei Jiang, Dong Zheng, Jun Xiao, and Long Chen

International Conference on Machine Learning (ICML) , 2025
ICML

[New!!] Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing

Kaifeng Gao, Jiaxin Shi, Hanwang Zhang, Chunping Wang, Jun Xiao, and Long Chen

International Conference on Machine Learning (ICML) , 2025 , Codes
CVPR

CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation

Wei Chen, Lin Li, Yongqi Yang, Bin Wen, Fan Yang, Tingting Gao, Yu Wu, and Long Chen

Computer Vision and Pattern Recognition (CVPR) , 2025 , Codes , Highlight presentation
CVPR

IterIS: Iterative Inference-Solving Alignment for LoRA Merging

Hongxu Chen, Zhen Wang, Runshi Li, Bowei Zhu, and Long Chen

Computer Vision and Pattern Recognition (CVPR) , 2025 , Codes
CVPR

Inversion Circle Interpolation: Diffusion-based Image Augmentation for Data-scarce Classification

Yanghao Wang, and Long Chen

Computer Vision and Pattern Recognition (CVPR) , 2025 , Codes
ICLR

DisPose: Disentangling Pose Guidance for Controllable Human Image Animation

Hongxiang Li, Yaowei Li, Yuhang Yang, Junjie Gao, Zhihong Zhu, Xuxin Cheng, and Long Chen

International Conference on Learning Representations (ICLR) , 2025 , Codes
ICLR

CLIPDrag: Combining Text-based and Drag-based Instructions for Image Editing

Ziqi Jiang, Zhen Wang, and Long Chen

International Conference on Learning Representations (ICLR) , 2025 , Codes
ICLR

Cyclic Contrastive Knowledge Transfer for Open-Vocabulary Object Detection

Chuhan Zhang, Chaoyang Zhu, Pingcheng Dong, Long Chen, and Dong Zhang

International Conference on Learning Representations (ICLR) , 2025 , Codes
NeurIPS

Di2Pose: Discrete Diffusion Model for Occluded 3D Human Pose Estimation

Weiquan Wang, Jun Xiao, Chunping Wang, Wei Liu, Zhao Wang, and Long Chen

Neural Information Processing Systems (NeurIPS) , 2024
NeurIPS

LLMs Can Evolve Continually on Modality for X-Modal Reasoning

Jiazuo Yu, Haomiao Xiong, Lu Zhang, Haiwen Diao, Yunzhi Zhuge, Lanqing Hong, Dong Wang, Huchuan Lu, You He, and Long Chen

Neural Information Processing Systems (NeurIPS) , 2024 , Codes
EMNLP

Optimizing Language Models with Fair and Stable Reward Composition in Reinforcement Learning

Jiahui Li, Hanlin Zhang, Fengda Zhang, Tai-Wei Chang, Kun Kuang, Long Chen, and Jun Zhou

Empirical Methods in Natural Language Processing (EMNLP) , 2024
ECCV

View-Consistent 3D Editing with Gaussian Splatting

Yuxuan Wang, Xuanyu Yi, Zike Wu, Na Zhao, Long Chen, and Hanwang Zhang

European Conference on Computer Vision (ECCV) , 2024 , Website
ECCV

DECap: Towards Generalized Explicit Caption Editing via Diffusion Mechanism

Zhen Wang, Xinyun Jiang, Jun Xiao, Tao Chen, and Long Chen

European Conference on Computer Vision (ECCV) , 2024
CVPR

UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory

Haiwen Diao, Bo Wan, Ying Zhang, Xu Jia, Huchuan Lu, and Long Chen

Computer Vision and Pattern Recognition (CVPR) , 2024 , Codes
ICLR

SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos

Yulei Niu, Wenliang Guo, Long Chen, Xudong Lin, and Shih-Fu Chang

International Conference on Learning Representations (ICLR) , 2024 , Codes
TPAMI

A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future

Chaoyang Zhu, and Long Chen

IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI) , 2024 , Codes
TPAMI

NICEST: Noisy Label Correction and Training for Robust Scene Graph Generation

Lin Li, Jun Xiao, Hanrong Shi, Hanwang Zhang, Yi Yang, Wei Liu, and Long Chen

IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI) , 2024 , Codes , extension of CVPR’22 work
TPAMI

CrossFormer++: A Versatile Vision Transformer Hinging on Cross-scale Attention

Wenxiao Wang, Wei Chen, Qibo Qiu, Long Chen, Boxi Wu, Binbin Lin, Xiaofei He, and Wei Liu

IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI) , 2024 , Codes , extension of ICLR’22 work