Yu Sun

I am a Machine Learning Scientist at Meshcapade. Previously, I was a doctoral student at Harbin Institute of Technology (HIT) and an intern at JDAI CV lab in China.

I am focusing on the research of monocular 3D human motion estimation. At Meshcapade, I work with many talents in ML team, which is led by Naureen Mahmood, Talha Zaman, Michael J. Black, and Nicolas Heron.

At HIT, I am supervised by Wenpeng Gao and Yili Fu. At JDAI, I have been supervised by Qian Bao, Wu Liu, Yun Ye, Xiaolei Lv, and Tao Mei. In exploring this field, it's an honor to work with Michael J. Black.

Email / Google Scholar / Twitter / Github

	TRACE: 5D Temporal Regression of Avatars with Dynamic Cameras in 3D Environments Yu Sun, Qian Bao, Wu Liu, Tao Mei, Michael J. Black CVPR, 2023 project page / arXiv / code / dataset / video / bib With a holistic 5D representation, TRACE tracks the subjects presented in the first frame through time and recovers their 3D trajectories in global coordinates. It does so in one shot.
	Putting People in their Place: Monocular Regression of 3D People in Depth Yu Sun, Wu Liu, Qian Bao, Yili Fu, Tao Mei, Michael J. Black CVPR, 2022 project page / arXiv / code / dataset / video / bib BEV adopts an imaginary Bird's-Eye-View representation to explicitly reason about depth relationships between people.
	Monocular, One-stage, Regression of Multiple 3D People Yu Sun, Qian Bao, Wu Liu, Yili Fu, Michael J. Black, Tao Mei ICCV, 2021 arXiv / code / video / bib ROMP is the first one-stage method for monocular regression of multiple 3D people. It can run at over 20 FPS on a 1070Ti GPU.
	Learning Monocular Regression of 3D People in Crowds via Scene-Aware Blending and De-Occlusion Yu Sun, Lubing Xu, Qian Bao, Wu Liu, Wenpeng Gao, Yili Fu, T-MM, 2023 paper / bib
	Learning Monocular Mesh Recovery of Multiple Body Parts Via Synthesis Yu Sun, Tianyu Huang, Qian Bao, Wu Liu, Wenpeng Gao, Yili Fu, ICASSP, 2022 paper / bib
	Human Mesh Recovery from Monocular Images via a Skeleton-disentangled Representation Yu Sun, Yun Ye, Wu Liu, Wenpeng Gao, Yili Fu, Tao Mei ICCV, 2019 arXiv / code / video results / bib DSD-SATN attempts to disentangle the pose-related features from the others (e.g. shape, camera) and learn temporal dynamics via sorting shuffled frames.

Collaborative Research

	PromptHMR: Promptable Human Mesh Recovery Yufu Wang, Yu Sun, Priyanka Patel, Kostas Daniilidis, Michael J. Black, Muhammed Kocabas CVPR, 2025 project page / arXiv / code / video PromptHMR adopts multi-modal prompts, e.g. 2D detection, segmentation, text, etc, to promote the HPS estimation.
	TokenHMR: Advancing Human Mesh Recovery with a Tokenized Pose Representation Sai Kumar Dwivedi, Yu Sun, Priyanka Patel, Yao Feng, Michael J. Black CVPR, 2024 project page / arXiv / code / video TokenHMR adopts a discrete-token-based representation for SMPL-based regression.
	ChatPose: Chatting about 3D Human Pose Yao Feng, Jing Lin, Sai Kumar Dwivedi, Yu Sun, Priyanka Patel, Michael J. Black CVPR, 2024 project page / arXiv / code ChatPose adopts LLMs to reason about SMPL-based regression from images or textual descriptions.
	WOC: A Handy Webcam-based 3D Online Chatroom Chuanhang Yan, Yu Sun, Qian Bao, Jinhui Pang, Wu Liu, Tao Mei * equal contribution. MM demo, 2022 arXiv / bib WOC captures the 3D motion of users with a single camera and drives their individual 3D virtual avatars in real-time in an online chatroom.
	Recent Advances of Monocular 2D and 3D Human Pose Estimation: A Deep Learning Perspective Wu Liu, Qian Bao, Yu Sun, Tao Mei ACM Computing Surveys (CSUR), 2022 arXiv / bib A survey about monocular 2D and 3D human pose estimation.

The template is borrowed from this awesome website