Refine
Document Type
- Doctoral Thesis (2)
Language
- English (2)
Has Fulltext
- yes (2)
Is part of the Bibliography
- no (2)
Keywords
- SLAM (2) (remove)
Institute
- Informatik (1)
- Informatik und Mathematik (1)
Driving can be dangerous. Humans become inattentive when performing a monotonous task like driving. Also the risk implied while multi-tasking, like using the cellular phone while driving, can break the concentration of the driver and increase the risk of accidents. Others factors like exhaustion, nervousness and excitement affect the performance of the driver and the response time. Consequently, car manufacturers have developed systems in the last decades which assist the driver under various circumstances. These systems are called driver assistance systems. Driver assistance systems are meant to support the task of driving, and the field of action varies from alerting the driver, with acoustical or optical warnings, to taking control of the car, such as keeping the vehicle in the traffic lane until the driver resumes control. For such a purpose, the vehicle is equipped with on-board sensors which allow the perception of the environment and/or the state of the vehicle. Cameras are sensors which extract useful information about the visual appearance of the environment. Additionally, a binocular system allows the extraction of 3D information. One of the main requirements for most camera-based driver assistance systems is the accurate knowledge of the motion of the vehicle. Some sources of information, like velocimeters and GPS, are of common use in vehicles today. Nevertheless, the resolution and accuracy usually achieved with these systems are not enough for many real-time applications. The computation of ego-motion from sequences of stereo images for the implementation of driving intelligent systems, like autonomous navigation or collision avoidance, constitutes the core of this thesis. This dissertation proposes a framework for the simultaneous computation of the 6 degrees of freedom of ego-motion (rotation and translation in 3D Euclidean space), the estimation of the scene structure and the detection and estimation of independently moving objects. The input is exclusively provided by a binocular system and the framework does not call for any data acquisition strategy, i.e. the stereo images are just processed as they are provided. Stereo allows one to establish correspondences between left and right images, estimating 3D points of the environment via triangulation. Likewise, feature tracking establishes correspondences between the images acquired at different time instances. When both are used together for a large number of points, the result is a set of clouds of 3D points with point-to-point correspondences between clouds. The apparent motion of the 3D points between consecutive frames is caused by a variety of reasons. The most dominant motion for most of the points in the clouds is caused by the ego-motion of the vehicle; as the vehicle moves and images are acquired, the relative position of the world points with respect to the vehicle changes. Motion is also caused by objects moving in the environment. They move independently of the vehicle motion, so the observed motion for these points is the sum of the ego-vehicle motion and the independent motion of the object. A third reason, and of paramount importance in vision applications, is caused by correspondence problems, i.e. the incorrect spatial or temporal assignment of the point-to-point correspondence. Furthermore, all the points in the clouds are actually noisy measurements of the real unknown 3D points of the environment. Solving ego-motion and scene structure from the clouds of points requires some previous analysis of the noise involved in the imaging process, and how it propagates as the data is processed. Therefore, this dissertation analyzes the noise properties of the 3D points obtained through stereo triangulation. This leads to the detection of a bias in the estimation of 3D position, which is corrected with a reformulation of the projection equation. Ego-motion is obtained by finding the rotation and translation between the two clouds of points. This problem is known as absolute orientation, and many solutions based on least squares have been proposed in the literature. This thesis reviews the available closed form solutions to the problem. The proposed framework is divided in three main blocks: 1) stereo and feature tracking computation, 2) ego-motion estimation and 3) estimation of 3D point position and 3D velocity. The first block solves the correspondence problem providing the clouds of points as output. No special implementation of this block is required in this thesis. The ego-motion block computes the motion of the cameras by finding the absolute orientation between the clouds of static points in the environment. Since the cloud of points might contain independently moving objects and outliers generated by false correspondences, the direct computation of the least squares might lead to an erroneous solution. The first contribution of this thesis is an effective rejection rule that detects outliers based on the distance between predicted and measured quantities, and reduces the effects of noisy measurement by assigning appropriate weights to the data. This method is called Smoothness Motion Constraint (SMC). The ego-motion of the camera between two frames is obtained finding the absolute orientation between consecutive clouds of weighted 3D points. The complete ego-motion since initialization is achieved concatenating the individual motion estimates. This leads to a super-linear propagation of the error, since noise is integrated. A second contribution of this dissertation is a predictor/corrector iterative method, which integrates the clouds of 3D points of multiple time instances for the computation of ego-motion. The presented method considerably reduces the accumulation of errors in the estimated ego-position of the camera. Another contribution of this dissertation is a method which recursively estimates the 3D world position of a point and its velocity; by fusing stereo, feature tracking and the estimated ego-motion in a Kalman Filter system. An improved estimation of point position is obtained this way, which is used in the subsequent system cycle resulting in an improved computation of ego-motion. The general contribution of this dissertation is a single framework for the real time computation of scene structure, independently moving objects and ego-motion for automotive applications.
This dissertation is concerned with the task of map-based self-localization, using images of the ground recorded with a downward-facing camera. In this context, map-based (self-)localization is the task of determining the position and orientation of a query image that is to be localized. The map used for this purpose consists of a set of reference images with known positions and orientations in a common coordinate system. For localization, the considered methods determine correspondences between features of the query image and those of the reference images.
In comparison with localization approaches that use images of the surrounding environment, we expect that using images of the ground has the advantage that, unlike the surrounding, the visual appearance of the ground is often long-term stable. Also, by using active lighting of the ground, localization becomes independent of external lighting conditions.
This dissertation includes content of several published contributions, which present research on the development and testing of methods for feature-based localization of ground images. Our first contribution examines methods for the extraction of image features that have not been designed to be used on ground images. This survey shows that, with appropriate parametrization, several of these methods are well suited for the task.
Based on this insight, we develop and examine methods for various subtasks of map-based localization in the following contributions. We examine global localization, where all reference images have to be considered, as well as local localization, where an approximation of the query image position is already known, which allows for disregarding reference images with a large distance to this position.
In our second contribution, we present the first systematic comparison of state-of-the-art methods for ground texture based localization. Furthermore, we present a method, which is characterized by its usage of our novel feature matching technique. This technique is called identity matching, as it matches only those features with identical descriptors, in contrast to the state-of-the-art that also matches features with similar descriptors. We show that our method is well suited for global and local localization, as it has favorable scaling with the number of reference images considered during the localization process. In another contribution, we develop a variant of our localization method that is significantly faster to compute, as it applies a sampling approach to determine the image positions at which local features are extracted, instead of using classical feature detectors.
Two further contributions are concerned with global localization. The first one introduces a prediction model for the global localization performance, based on an evaluation of the local localization performance. This allows us to quickly evaluate any considered parameter settings of global localization methods. The second contribution introduces a learning-based method that computes compact descriptors of ground images. This descriptor can be used to retrieve the overlapping reference images of a query image from a large set of reference images with little computational effort.
The most recent contribution included in this dissertation presents a new ground image database, which was recorded with a dedicated platform using a downward-facing camera. In addition to the data, we also explain our guidelines for the construction of the platform. In comparison with existing databases, our database contains more images and presents a larger variety of ground textures. Furthermore, this database enables us to perform the first systematic evaluation of how localization performance is affected by the time interval between the point in time at which the reference images are recorded and the point in time at which the query image is recorded. We find out that for outdoor areas all ground texture based localization methods have reliability issues, if the time interval between the recording of the query and reference images is large, and also if there are different weather conditions. These findings point to remaining challenges in ground texture base localization that should be addressed in future work.