Undergraduate Thesis: Generalizable Robot End-Effector 6D Pose Estimation

Sheldon Liang, Peking University
Advised by Prof. Hao Dong

We present a marker-free, training-free method that can estimate and track the pose of an arbitrary end-effector given its CAD model.

Abstract

One of the fundamental problems in the field of robot manipulation is estimating the pose of a robot with respect to an external camera. This is to transfer visual observations of the camera to the robot's operating space, allowing the robot to act accordingly. Traditionally this is done by using fiducial markers with known sizes and patterns and calculating its pose with respect to the robot and the camera. This tedious process can only be done offline, calling for methods that can perform calibration online. Hoping to take a step towards online calibration, this paper presents a method that, provided with a CAD model of the target object, can estimate the 6D pose of a novel unseen robot end-effector.

Method

Method

Given a textured 3D model of an end-effector, we first render RGB-D templates from multiple viewpoints offline. We then extract foundation features and retrieve the top K similar references everytime we process a target image, and compute 2D-3D matches between the target and the templates to obtain pose candidates. To address ambiguities from occlusions, we introduce a global memory pool that records key frames and robot states for pose optimization. To resolve ambiguities from symmetry, we propose a symmetry disambiguation module to eliminate incorrect matches.

Results

Synthetic data samples and model prediction results: green bounding box indicates ground truth pose, red bounding box indicates predicted pose.

High precision targeting with the proposed method: our method achieves errors within 1mm with calibration using 15 frames.