This is the code for our CVPR'15 paper "Learning Descriptors for Object Recognition and 3D Pose Estimation"
. It is distributed in two packages:
There is no documentation yet, other than the readme file explaining some basics. Over time, when we receive feedback, we will provide more details. Meanwhile, if you have questions just contact us.
The data used in the paper is essentially the LineMOD dataset
created by Stefan Hinterstoisser. However, we have our own way to render the synthetic images with Blender and a median-inpainting-filter for the real-world Kinect depth data.
Thus, here we provide our version of the data for unmodified use with the code above: ape
Additionally, you can download the Blender file
we used to render the synthetic data.
Update (Oct 15th 2015):
As state above the data was taken from the LineMOD dataset. There the origin of the objects was defined as a central point the object stands on (or really the center of the marker-board on which they were captured). For cropping the training and test images, we defined the "center point" which the camera looks at to be the point (0,0,5) (in cm, so 5cm above ground).
Recently, however, for further work Wadim Kehl
decided to go fo a more practical scheme and centered all objects. You can download the updated blender file
and the corresponding ground-truth poses
for the real-world data sequences here. The poses file contains a poseXXXX.txt file for each image containing a homogenous 4x4 transformation matrix that maps camera world coordinate to camera coordinates. The translation part of these matrices is in meters!
Also, if you do not want to render the images yourself, but use the images exactly how we cropped them, you can download the set of images cropped from GT location and rescaled to 64x64
New code to work with this data will follow. Again, the readme file contains information about how to use this data with the code.