Motion and appearance are two essential quantities to perceive and interact with the real world. As technology continues to progress, numerous types of sensors have been invented to enable the direct or indirect measurement of these two quantities, such as cameras, LiDARs and Inertial Measurement Units (IMUs). Among these sensors, cameras have become the most ubiquitous tools for recording and perceiving the world, largely because their output data can be intuitively understood by humans. Accordingly, recovering the motion (including camera ego-motion and object movement) and reconstructing the scene appearance from images have long stood as central research topics in the fields of computer vision and robotics. To this end, researchers have introduced a variety of concepts and techniques to parameterize motion and appearance, so that they can be solved through optimization algorithms or deep learning models. For example, Lie Group Theory is commonly used to represent camera motion, while optical flow describes the motion of points in the world projected onto the image plane. Intensity images describe the appearance on the image plane; panoramas display the global scene appearance captured by a purely rotating camera; and point clouds are used to represent the sparse 3D structure of the scene. The estimation of motion and appearance using traditional cameras is typically performed in an alternating way because images directly capture the static scene appearance at specific timestamps. The camera motion is first calculated from the data association between consecutive images (local appearances), the global appearance is then recovered by stitching these local appearances. Such pipeline, usually called Visual Odometry (VO), has been able to work reliably and accurately after decades of research. However, algorithmic advancements alone are insufficient to transcend the inherent physical limitations of sensors. For instance, motion blur resulting from rapid motion or underexposure caused by challenging lighting conditions can degrade image quality and ultimately lead to the failure of downstream algorithms. The advent of event cameras provides a potential solution for overcoming the aforementioned limitations. Event cameras are newly-developed bio-inspired visual sensors that capture the brightness at each pixel and output an asynchronous event stream. Their unique working principle endow event cameras great advantages over conventional cameras in terms of high dynamic range, low temporal latency and low power consumption. Meanwhile, the completely different data format requires novel algorithms to process data, which has been partially tackled by converting into image-like representations or batch processing used in the contrast maximization framework. However, the other key challenge has been ignored for a long time, that is, the entanglement of the motion and appearance encoded in the event data. More specifically, the motion-dependent nature of event cameras causes the fact that appearance and motion are inherently linked: either both are present and recorded in the event data, or neither is captured. Most previous research treats the recovery of these two visual quantities as separate tasks, which does not fit with the above-mentioned nature of event cameras and overlooks the inherent relations between them. Starting from first principles, this thesis investigates algorithms to perform joint estimation of motion and appearance using an event camera. The following is a list of contributions of this thesis: 1. A comprehensive and systematic comparison of state-of-the-art event-based rotational motion estimation in terms of theory, accuracy and efficiency. 2. The first event-based rotation-only budle adjustment (BA) method to refine the continuous-time trajectory of an event camera while reconstructing a sharp panoramic map of scene edges. 3. An event-based rotation-only Simultaneous Localization and Mapping (SLAM) system called CMax-SLAM, comprising both a front-end and a back-end for the first time. 4. The first event-only photometric rotational bundle adjustment approach that jointly refines the camera motion and a panoramic intensity map of the scene. 5. A novel event-only mosaicing bundle adjustment method, which refines an event-camera’s trajectory orientation and gradient map, producing a high-quality grayscale panoramic map of the scene. 6. The first attempt to leverage matrix sparsity to speed up optimization in eventbased bundle adjustment. 7. The first unsupervised learning framework for the joint estimation, with a single network, of event-based optical flow and image intensity. Its working principle fits naturally with the characteristics of event data.
Building similarity graph...
Analyzing shared references across papers
Loading...
Shuang Guo (Thu,) studied this question.
www.synapsesocial.com/papers/69df2b2ce4eeef8a2a6b018f — DOI: https://doi.org/10.14279/depositonce-25561
Shuang Guo
Building similarity graph...
Analyzing shared references across papers
Loading...