Long Chen

Assistant Professor
Department of Computer Science and Engineering (CSE)
School of Engineering (SENG)
The Hong Kong University of Science and Technology (HKUST)
Email: longchen A~T ust.hk
Office: Room CYT-3003, Cheng Yu Tung Building, HKUST, Clear Water Bay

Dr. Long CHEN (Chinese: 陈隆) is an assistant professor at the department of CSE, Hong Kong University of Science and Technology (HKUST). He is leading a computer vision and machine learning research group: LONG Group. Before joining HKUST, he was a postdoctoral research scientist at the DVMM Lab, Columbia University. He obtained his Ph.D. degree in Computer Science from the DCD Lab, Zhejiang University. During Ph.D. study period, he was also a visiting student at the MReal Lab, Nanyang Technological University (NTU), and the NExT Center, National University of Singapore (NUS). He obtained his B.Eng. degree from Dalian University of Technology. He was a senior research scientist at Tencent AI Lab.

His primary research directions are Computer Vision, Machine Learning, Multimedia, and Artificial Intelligence.

General Research Interests: Specifically, he aims to build an efficient multimodal AI system that can realize "human-like" multimodal understanding and generation. By “human-like”, we mean that the vision systems should be equipped with three types of abilities: 1) Explainable: The model should rely on (right) explicit evidences when making decisions, i.e., right for the right reasons. 2) Robust: The model should be robust to some situations with only low-quality training data (e.g., training samples are biased, noisy, or limited). 3) Universal: The model design is relatively universal, i.e., it is expected to be effective for various tasks. Meanwhile, with the rapid development of foundation models, such as large language model (LLMs), vision-language models (VLMs), and vision generation models (e.g, diffusion models), our group, we are also very interested in several releveant cutting-edge directions: 4) Building more explainable, robust, and universal vision models with the help of pretrained models (LLMs, diffusion models). 5) Designing more efficient and stronger multimodal LLMs. 6) The inherent weaknesses in existing LLMs and diffusion models.

Recent Research Directions:

Foundation Models Efficient Finetuning: Parameter-efficient Tuning ([IterIS, CVPR’25], [ComPro, IJCV’24]), Memory-efficient Tuning ([UniPT, CVPR’24], [SHERL, ECCV’24]), Modality-efficient Tuning ([PathWeave, NeurIPS’24]), Reinforcment Learning with Human/AI Feedback (RLHF/RLAIF) ([B2-DiffuRL, CVPR’25], [Fast RL, EMNLP’24], [R3HF, arXiv’24]).
Visual Generation and Editing: Image Generation/Editing ([CLIPDrag, ICLR’25], [Free-Event, ICML’25]), Video Generation/Editing ([DisPose, ICLR’25], [Ca2-VDM, ICML’25]), 3D Mesh Generation ([Nautilus, arXiv’25]), 3D Gaussian Editing ([VcEdit, ECCV’24]).
Open-world/vocabulary Perception: Object Detection ([Survey, TPAMI’24], [CCKT-Det, ICLR’25]), Scene Graph Generation ([NICEST, TPAMI’24], [INOVA, arXiv’25], [RECORD, NeurIPS’23], Compositional Classification ([PLO, arXiv’23]), Image Classification ([Diff-II, CVPR’25]), Pose Estimation ([Di2Pose, NeurIPS’24]), Situation Recognition ([LEX, ACMMM’24]).
Multimodal Understanding and Reasoning: RL for Reasoning ([DyME, arXiv’25][Relation-R1, arXiv’25]), Interleaved Generation ([CoMM, CVPR’25]), Hallucination ([DCD, arXiv’25]), Multimodal Editing ([DECap, ECCV’24]), Visual Question Answering ([IdealGPT, EMNLP’23 Findings]).

Research Group: LONG Group @ HKUST CSE

1. Based on the current funding situation, we have only extremely limited postdocs, research assistants, and visiting students openings. (Please also highlight if you have other funding sources or supports).
2. As for Ph.D. and M.Phil. positions, we always have the openings all year around.
3. To further increase the diversity, Ph.D./M.Phil applicants from overseas countries and HK are strongly recommended.

News

May, 2025	I will give a talk in VALSE 2025: Vision Foundation Model Workshop (视觉通用模型).
Apr, 2025	I will serve as an Associate Editor for ACM Trans. on Multimedia Computing, Communications & Applications.
Feb, 2025	I will serve as an Associate Editor for IEEE Transactions on Image Processing (TIP).
Feb, 2025	I will serve as an Area Chair for NeurIPS 2025 and an Area Char for ACM MM 2025.
Feb, 2025	We will organize a tutorial about Multimodal LLM in CVPR 2025.
Dec, 2024	I will serve as an Area Chair for ICML 2025.
Dec, 2024	I will give a talk in AAAI 2025 New Faculty Highlights program.
Nov, 2024	I will serve as a Senior PC for IJCAI 2025.
Nov, 2024	Our research group has the 4th group outing activity: Hiking in MacLehose Trail (Section 2), again!.
Sep, 2024	I was ranked as the World’s Top 2% Most-cited Scientists (in the single year 2023) by Stanford University.
Sep, 2024	I will serve as an Area Chair for CVPR 2025.
Aug, 2024	I will serve as an Area Chair for ICLR 2025.