Abstract

Learning from expert demonstrations is a promising approach for training robotic manipulation policies from limited data. However, imitation learning algorithms require a number of design choices ranging from the input modality, training objective, and 6-DoF end-effector pose representation. Diffusion- based methods have gained popularity as they enable predicting long-horizon trajectories and handle multimodal action distributions. Recently, Conditional Flow Matching (CFM) (or Rectified Flow) has been proposed as a more flexible generalization of diffusion models. In this paper, we investigate the application of CFM in the context of robotic policy learning and specifically study the interplay with the other design choices required to build an imitation learning algorithm. We show that CFM gives the best performance when combined with point cloud input observations. Additionally, we study the feasibility of a CFM formulation on the SO(3) manifold and evaluate its suitability with a simplified example. We perform extensive experiments on RLBench which demonstrate that our proposed PointFlowMatch approach achieves a state-of-the-art average success rate of 67.8% over eight tasks, double the performance of the next best method.

Teaser Video

Code

For academic usage a software implementation of this project based on PyTorch can be found in our GitHub repository and is released under the GPLv3 license. For any commercial purpose, please contact the authors.

Publications

If you find our work useful, please consider citing our paper:

Eugenio Chisari, Nick Heppert, Max Argus, Tim Welschehold, Thomas Brox, Abhinav Valada

Learning Robotic Manipulation Policies from Point Clouds with Conditional Flow Matching
Conference on Robot Learning, 2024.
(PDF) (BibTeX)

Authors

Eugenio Chisari

Eugenio Chisari

University of Freiburg

Nick Heppert

Nick Heppert

University of Freiburg

Max Argus

Max Argus

University of Freiburg

Tim Welschehold

Tim Welschehold

University of Freiburg

Thomas Brox

Thomas Brox

University of Freiburg

Abhinav Valada

Abhinav Valada

University of Freiburg

Acknowledgment

This work was funded by the Carl Zeiss Foundation with the ReScaLe project and the German Research Foundation (DFG): 417962828.