Humanoid Touch Dream
Learning Versatile Humanoid Manipulation
with Touch Dreaming
Yaru Niu1,3, Zhenlong Fang1, Binghong Chen1, Shuai Zhou1, Revanth Senthilkumaran1,3, Hao Zhang1,2, Bingqing Chen3, Chen Qiu3, H. Eric Tseng2, Jonathan Francis1,3, Ding Zhao1 1Carnegie Mellon University, 2UT Arlington, 3Bosch Center for AI

Humanoid Touch Dream provides a whole-body learning framework for versatile contact-rich humanoid loco-manipulation.

Abstract.
Humanoid robots promise general-purpose assistance, yet real-world humanoid loco-manipulation remains challenging because it requires whole-body stability, dexterous hands, and contact-aware perception under frequent contact changes. In this work, we study dexterous, contact-rich humanoid loco-manipulation. We first develop an RL-based whole-body controller that provides stable lower-body and torso execution during complex manipulation. Built on this controller, we develop a whole-body humanoid data collection system that combines VR-based teleoperation with human-to-humanoid motion mapping, enabling efficient collection of real-world demonstrations. We then propose Humanoid Transformer with Touch Dreaming (HTD), a multimodal encoder–decoder Transformer that models touch as a core modality alongside multi-view vision and proprioception. HTD is trained in a single stage with behavioral cloning augmented by touch dreaming: in addition to predicting action chunks, the policy predicts future hand-joint forces and future tactile latents, encouraging the shared Transformer trunk to learn contact-aware representations for dexterous interaction. Across five contact-rich tasks, Insert-T, Book Organization, Towel Folding, Cat Litter Scooping, and Tea Serving, HTD achieves a 90.9% relative improvement in average success rate over the stronger baseline. Ablation results further show that latent-space tactile prediction is more effective than raw tactile prediction, yielding a 30% relative gain in success rate. These results demonstrate that combining robust whole-body execution, scalable humanoid data collection, and predictive touch-centered learning enables versatile, high-dexterity humanoid manipulation in the real world.
Autonomous Policies
Main comparison results: success rate and task score across five tasks

Policy Performance. Comparison of success rate and task score across five contact-rich tasks. HTD (Ours) consistently outperforms ACT baselines with and without touch input, achieving the highest average success rate and task score.

Touch Dreaming Visualization
Explore touch dreaming predictions interactively. The left panel shows the robot's head camera view. The right panel visualizes predicted vs. ground-truth touch signals—switch between Force, Latent Tactile, and Raw Tactile modes. For the latent tactile heatmaps, each latent dimension is independently normalized over the episode, with a minimum-range threshold derived from the most active dimension across all fingers to distinguish active from inactive latent contact regions. Note that this per-dimension normalization amplifies subtle changes and prediction errors in the latent space for better visibility.
Loading video…
Head Camera (Right Eye)
0:00 / 0:00
Ablation study: effect of touch dreaming variants on Insert-T and Towel tasks

Ablation Study. Effect of touch dreaming on Insert-T and Towel Folding tasks. Dream Latent Tactile achieves the best overall performance, demonstrating the benefit of predicting future latent tactile representations.

Whole-Body Controller
Metric Ours AMO FALCON
Ev (m/s) 0.1420 ± 0.0568 0.1779 ± 0.0642 0.1641 ± 0.0309
Eω (rad/s) 0.1806 ± 0.0534 0.1540 ± 0.0316 0.1874 ± 0.0263
Eh (m) 0.0280 ± 0.0438 0.0568 ± 0.0814 0.1299 ± 0.0082
Ey (rad) 0.0126 ± 0.0051 0.1540 ± 0.0534 0.1215 ± 0.0111
Ep (rad) 0.0487 ± 0.1796 0.1519 ± 0.1254 (not tracked)
Er (rad) 0.0157 ± 0.0065 0.0735 ± 0.0447 (not tracked)

Tracking Error Comparison. Our whole-body controller achieves the lowest tracking error across most metrics compared to AMO (Li et al., RSS 2025) and FALCON (Zhang et al., L4DC 2026). Bold values indicate the best result in each row.

BibTeX
@misc{niu2026htd,
      title={Learning Versatile Humanoid Manipulation with Touch Dreaming},
      author={Yaru Niu and Zhenlong Fang and Binghong Chen and Shuai Zhou and Revanth Senthilkumaran and Hao Zhang and Bingqing Chen and Chen Qiu and H. Eric Tseng and Jonathan Francis and Ding Zhao},
      year={2026},
      eprint={2604.13015},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2604.13015},
}