Affordable RGB-D cameras have gained broad research attention in computer vision. This imaging modality provides registered 3-D and RGB data of the perceived environment in the point cloud data format. RGB-D cameras and point clouds allow for a seamless integration of object recognition in mobile robotics applications. Not only do point clouds pose a potential for novel algorithms, the provided geometric information also facilitates estimating adequate positions for mobile manipulation – an action that often follows the recognition process in mobile robotics. Research on vision algorithms based on 3-D data is far less advanced than on 2-D images. Further, RGB-D cameras also pose some technical challenges. Due to their inner workings, they provide erroneous measurements, have a limited range and a much lower resolution than RGB cameras. In this thesis, I will focus on the object classification and detection parts of vision pipelines for mobile robotics. The main research question is whether point clouds from low-cost RGB-D cameras can be used to reliably solve these tasks. I decided to use a traditional algorithm for my investigations. Traditional algorithms usually demand less computational resources for training and require less data to build a model for inference. For these and other reasons some application scenarios might restrict or prohibit using methods based on deep neural networks (DNN). The presented point cloud processing pipeline is a non-parametric approach to object classification and detection. It is inspired by the local Naive-Bayes Nearest Neighbor (NBNN) and the Implicit Shape Model (ISM) algorithms originally introduced for 2-D images. Local feature descriptors are used to construct a spatial code-book during the training stage. In the test stage this codebook is used in a Hough voting scheme to generate object hypotheses. I will carefully adapt ideas from the above methods and extend several pipeline steps by novel contributions. At all times, I will target fast processing with limited computational resources in mind. In a first step, I will focus on isolated objects for classification to gain insights into using point clouds for vision. Subsequently, the presented pipeline will be extended to handle noisy data for object detection in cluttered environments. The contributions of this thesis include an efficient sampling method to find suitable locations for local descriptors and the creation of a descriptive codebook with ranked feature descriptors. The hypothesis generation is followed by an elaborate hypothesis verification step and an additional verification with global feature descriptors in an ensemble classifier. Further, I introduce modifications to two popular local descriptors and also extend them to the global scale. The developed approaches are evaluated on publicly available datasets with simulated and real sensor data. Further, the mobile service robot Lisa was used for evaluation during several competitions and achieved excellent results. The results of this work enable fast and reliable shape classification of isolated objects, as well as object detection in cluttered environments. The complete pipeline is open-source and is published online in a software repository under a permissive license.
Viktor Seib (Thu,) studied this question.