Learning Robotic Manipulation Policies from Point Clouds with Conditional Flow Matching

Abstract

Learning from expert demonstrations is a promising approach for training robotic manipulation policies from limited data. However, imitation learning algorithms require a number of design choices ranging from the input modality, training objective, and 6-DoF end-effector pose representation. Diffusion- based methods have gained popularity as they enable predicting long-horizon trajectories and handle multimodal action distributions. Recently, Conditional Flow Matching (CFM) (or Rectified Flow) has been proposed as a more flexible generalization of diffusion models. In this paper, we investigate the application of CFM in the context of robotic policy learning and specifically study the interplay with the other design choices required to build an imitation learning algorithm. We show that CFM gives the best performance when combined with point cloud input observations. Additionally, we study the feasibility of a CFM formulation on the SO(3) manifold and evaluate its suitability with a simplified example. We perform extensive experiments on RLBench which demonstrate that our proposed PointFlowMatch approach achieves a state-of-the-art average success rate of 67.8% over eight tasks, double the performance of the next best method.

Code

For academic usage a software implementation of this project based on PyTorch can be found in our GitHub repository and is released under the GPLv3 license. For any commercial purpose, please contact the authors.

Publications

If you find our work useful, please consider citing our paper:

Eugenio Chisari, Nick Heppert, Max Argus, Tim Welschehold, Thomas Brox, Abhinav Valada

Learning Robotic Manipulation Policies from Point Clouds with Conditional Flow Matching
Conference on Robot Learning, 2024.

(PDF) (BibTeX)

Authors

Eugenio Chisari

University of Freiburg

Nick Heppert

University of Freiburg & Zuse School ELIZA

Max Argus

University of Freiburg

Tim Welschehold

University of Freiburg

Thomas Brox

University of Freiburg

Abhinav Valada

University of Freiburg

Acknowledgment

This work was funded by the Carl Zeiss Foundation with the ReScaLe project and the German Research Foundation (DFG): 417962828. Nick Heppert is supported by the Konrad Zuse School of Excellence in Learning and Intelligent Systems (ELIZA) through the DAAD programme Konrad Zuse Schools of Excellence in Artificial Intelligence, sponsored by the Federal Ministry of Education and Research.

Abstract

Teaser Video

Code

Publications

Authors

Eugenio Chisari

Nick Heppert

Max Argus

Tim Welschehold

Thomas Brox

Abhinav Valada

Acknowledgment