Geometry-aware RL for Manipulation of Varying Shapes and Deformable Objects

1Autonomous Learning Robots (ALR), Karlsruhe Institute of Technology (KIT) 2Bosch Center for Artificial Intelligence
This paper was published at ICLR 2025 (Oral) .

Abstract

Manipulating objects with varying geometries and deformable objects is a major challenge in robotics. Tasks such as insertion with different objects or cloth hanging require precise control and effective modelling of complex dynamics. In this work, we frame this problem through the lens of a heterogeneous graph that comprises smaller sub-graphs, such as actuators and objects, accompanied by different edge types describing their interactions. This graph representation serves as a unified structure for both rigid and deformable objects tasks, and can be extended further to tasks comprising multiple actuators. To evaluate this setup, we present a novel and challenging reinforcement learning benchmark, including rigid insertion of diverse objects, as well as rope and cloth manipulation with multiple end-effectors. These tasks present a large search space, as both the initial and target configurations are uniformly sampled in 3D space. To address this issue, we propose a novel graph-based policy model, dubbed Heterogeneous Equivariant Policy (HEPi), utilizing SE(3) equivariant message passing networks as the main backbone to exploit the geometric symmetry. In addition, by modeling explicit heterogeneity, HEPi can outperform Transformer-based and non-heterogeneous equivariant policies in terms of average returns, sample efficiency, and generalization to unseen objects.

Robotic Manipulation as Heterogeneous Graphs



Left: A Cloth-Hanging task represented by a heterogeneous graph that comprises two disjoint node sets, objects, and actuators, connected through directed, fully-connected inter-edges. Intra-edges occur within each set (both objects and actuators) to capture relationships within clusters. Information is aggregated from objects to actuators via inter-edges. The target distance is absorbed into the feature representation rather than treated as a separate node type. Right: Overview of Heterogeneous Equivariant Policy (HEPi), consisting of multiple Equivariant Message Passing Networks (EMPNs) process the graph, and the outputs are aggregated to generate the final action.

BibTeX

@inproceedings{
  hoang2025geometryaware,
  title={Geometry-aware {RL} for Manipulation of Varying Shapes and Deformable Objects},
  author={Tai Hoang and Huy Le and Philipp Becker and Vien Anh Ngo and Gerhard Neumann},
  booktitle={The Thirteenth International Conference on Learning Representations},
  year={2025},
  url={https://openreview.net/forum?id=7BLXhmWvwF}
}