Full-Body Human-Object-Scene Interaction Generationon

Abstract

—Generating high-fidelity full-body human interactions with dynamic objects and static scenes remains a critical challenge in computer graphics and animation. Existing methods for human-object interaction often neglect scene context, leading to implausible penetrations, while human-scene interaction approaches struggle to coordinate fine-grained manipulations with long-range navigation. To address these limitations, we propose HOSIG, a novel framework for synthesizing full-body interactions through hierarchical scene perception. Our method decouples the task into three key components: 1) a scene-aware grasp pose generator that ensures collision-free whole-body postures with precise hand-object contact by integrating local geometry constraints, 2) a heuristic navigation algorithm that autonomously plans obstacle-avoiding paths in complex indoor environments via compressed 2D floor maps and dual-component spatial reasoning, and 3) a scene-guided motion diffusion model that generates trajectory-controlled, full-body motions with finger-level accuracy by incorporating spatial anchors and dual-space classifier-free guidance. Extensive experiments on the TRUMANS dataset demonstrate superior performance over state-of-the-art methods. Notably, our framework supports unlimited motion length through autoregressive generation and requires minimal manual intervention. This work bridges the critical gap between scene-aware navigation and dexterous object manipulation, advancing the frontier of embodied interaction synthesis. Codes will be available after publication.

Comparison with SOTA Methods —— LINGO[1], CHOIS[2]

Complex Interaction Display

BibTeX


@article{yao2025hosig,
  title={HOSIG: Full-Body Human-Object-Scene Interaction Generation with Hierarchical Scene Perception},
  author={Yao, Wei and Sun, Yunlian and Zhang, Hongwen and Liu, Yebin and Tang, Jinhui},
  journal={arXiv preprint arXiv:2506.01579},
  year={2025}
}

Reference

[1] Jiang, Nan, et al. "Autonomous character-scene interaction synthesis from text instruction." SIGGRAPH Asia 2024 Conference Papers. 2024.

[2] Li, Jiaman, et al. "Controllable human-object interaction synthesis." European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2024.

HOSIG: Full-Body Human-Object-Scene Interaction Generation with Hierarchical Scene Perception

Human-Object-Scene Interaction Generation (HOSIG)

Abstract

Comparison with SOTA Methods —— LINGO[1], CHOIS[2]

Complex Interaction Display

BibTeX

Reference