3D Pose Estimation

BB8: 3D Poses Estimator

About BB8

BB8 is a novel method for 3D object detection and pose estimation from color images only. It predicts the 3D poses of the objects in the form of 2D projections of the 8 corners of their 3D bounding boxes. The full approach is also scalable, as a single network can be trained for multiple objects simultaneously. This work was supported by the Christian Doppler Laboratory for Semantic 3D Computer Vision, funded in part by Qualcomm Inc.

BB8 on Mobile Devince

Our 3D object pose estimation method runs on a mobile device using only color images in real time.

The implementation on mobile device is done by Markus-Philipp Gherman and Mahdi Rad.

Technical Description

We first use segmentation to detect the objects of interest in 2D even in presence of partial occlusions and cluttered background. It relies on a "holistic" approach: it applies to the detected objects a Convolutional Neural Network (CNN) trained to predict their 3D poses in the form of 2D projections of the corners of their 3D bounding boxes for the pose of objects' parts. We improve the state-of-the-art on the LINEMOD dataset from 73.7% to 89.3% of correctly registered RGB frames. We are also the first to report results on the Occlusion dataset using color images only.

Sequence	Brachmann et al.	Ours
Ape	85.2	96.6
Bench Vise	67.9	90.1
Camera	58.7	86.0
Can	70.8	91.2
Cat	84.2	98.8
Driller	73.9	80.9
Duck	73.1	92.2
Egg Box	83.1	91.0
Glue	74.2	92.3
Hole Puncher	78.9	95.3
Iron	83.6	84.8
Lamp	64.0	75.8
Phone	60.6	85.3
Average	73.7	89.3

We also handle challenging objects, which exhibit an axis of rotational symmetry such as the ones from the recent T-LESS dataset: These objects makes training the CNN challenging because of having the similarity of two images of such an object under two different poses. We solve this problem by restricting the range of poses used for training, and by introducing a classifier to identify the range of a pose at run-time before estimating it. We obtain 54% of frames passing the Pose 6D criterion on average on several sequences of the T-LESS dataset, compared to the 67% of the state-of-the-art on the same sequences which uses both color and depth.

Code and Data

This is the code for the core part of the BB8: generating data, and predicting 2D projections of the 3D bounding box. https://github.com/radmahdi/BB8

Publications

BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth
Mahdi Rad and Vincent Lepetit
In Proc. IEEE Int'l Conf. on Computer Vision (ICCV), 2017