W-HMR: Human Mesh Recovery in World Space with Weak-supervised Camera Calibration and Orientation Correction

1Nanjing University of Science and Technology 2Beijing Normal University

Overview of our approach Human Mesh Recovery in World Space (W-HMR)

Abstract

—For a long time, in the field of reconstructing 3D human bodies from monocular images, most methods opted to simplify the task by minimizing the influence of the camera. Using a coarse focal length setting results in the reconstructed bodies not aligning well with distorted images. Ignoring camera rotation leads to an unrealistic reconstructed body pose in world space. Consequently, existing methods' application scenarios are confined to controlled environments. And they struggle to achieve accurate and reasonable reconstruction in world space when confronted with complex and diverse in-the-wild images. To address the above issues, we propose W-HMR, which decouples global body recovery into camera calibration, local body recovery and global body orientation correction. We design the first weak-supervised camera calibration method for body distortion, eliminating dependence on focal length labels and achieving finer mesh-image alignment. We propose a novel orientation correction module to allow the reconstructed human body to remain normal in world space. Decoupling body orientation and body pose enables our model to consider the accuracy in camera coordinate and the reasonableness in world coordinate simultaneously, expanding the range of applications. As a result, W-HMR achieves high-quality reconstruction in dual coordinate systems, particularly in challenging scenes.


Videos

Demo (Frame by frame reconstruction. Smoothed by SmoothNet[1]

Comparison with SOTA Work[2]

Better Mesh-image Alignment and More Realistic Poses

Reference

[1] Ailing Zeng, Lei Yang, Xuan Ju, Jiefeng Li, Jianyi Wang, and Qiang Xu. SmoothNet: A plug-and-play network for refining human poses in videos. In European Conference on Computer Vision (ECCV), volume 13665, pages 625–642, 2022.

[2] Muhammed Kocabas, Chun-Hao P Huang, Joachim Tesch, Lea M ̈uller, Otmar Hilliges, and Michael J Black. Spec: Seeing people in the wild with an estimated camera. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11035–11045, 2021b