Refine
Year of publication
Document Type
- Doctoral Thesis (14)
Has Fulltext
- yes (14)
Is part of the Bibliography
- no (14)
Keywords
- Adaptronik (1)
- Bilderkennung (1)
- Claude Elwood (1)
- Computer Vision (1)
- Ego-motion Estimation (1)
- Eigenbewegungsschaetzung (1)
- Entropie (1)
- Entropie <Informationstheorie> (1)
- Gedächtnis (1)
- Gedächtnisbildung (1)
Institute
Already today modern driver assistance systems contribute more and more to make individual mobility in road traffic safer and more comfortable. For this purpose, modern vehicles are equipped with a multitude of sensors and actuators which perceive, interpret and react to the environment of the vehicle. In order to reach the next set of goals along this path, for example to be able to assist the driver in increasingly complex situations or to reach a higher degree of autonomy of driver assistance systems, a detailed understanding of the vehicle environment and especially of other moving traffic participants is necessary.
It is known that motion information plays a key role for human object recognition [Spelke, 1990]. However, full 3D motion information is mostly not taken into account for Stereo Vision-based object segmentation in literature. In this thesis, novel approaches for motion-based object segmentation of stereo image sequences are proposed from which a generic environmental model is derived that contributes to a more precise analysis and understanding of the respective traffic scene. The aim of the environmental model is to yield a minimal scene description in terms of a few moving objects and stationary background such as houses, crash barriers or parking vehicles. A minimal scene description aggregates as much information as possible and it is characterized by its stability, precision and efficiency.
Instead of dense stereo and optical flow information, the proposed object segmentation builds on the so-called Stixel World, an efficient superpixel-like representation of space-time stereo data. As it turns out this step substantially increases stability of the segmentation and it reduces the computational time by several orders of magnitude, thus enabling real-time automotive use in the first place. Besides the efficient, real-time capable optimization, the object segmentation has to be able to cope with significant noise which is due to the measurement principle of the used stereo camera system. For that reason, in order to obtain an optimal solution under the given extreme conditions, the segmentation task is formulated as a Bayesian optimization problem which allows to incorporate regularizing prior knowledge and redundancies into the object segmentation.
Object segmentation as it is discussed here means unsupervised segmentation since typically the number of objects in the scene and their individual object parameters are not known in advance. This information has to be estimated from the input data as well.
For inference, two approaches with their individual pros and cons are proposed, evaluated and compared. The first approach is based on dynamic programming. The key advantage of this approach is the possibility to take into account non-local priors such as shape or object size information which is impossible or which is prohibitively expensive with more local, conventional graph optimization approaches such as graphcut or belief propagation.
In the first instance, the Dynamic Programming approach is limited to one-dimensional data structures, in this case to the first Stixel row. A possible extension to capture multiple Stixel rows is discussed at the end of this thesis.
Further novel contributions include a special outlier concept to handle gross stereo errors associated with so-called stereo tear-off edges. Additionally, object-object interactions are taken into account by explicitly modeling object occlusions. These extensions prove to be dramatic improvements in practice.
This first approach is compared with a second approach that is based on an alternating optimization of the Stixel segmentation and of the relevant object parameters in an expectation maximization (EM) sense. The labeling step is performed by means of the _−expansion graphcut algorithm, the parameter estimation step is done via one-dimensional sampling and multidimensional gradient descent. By using the Stixel World and due to an efficient implementation, one step of the optimization only takes about one millisecond on a standard single CPU core. To the knowledge of the author, at the time of development there was no faster global optimization in a demonstrator car.
For both approaches, various testing scenarios have been carefully selected and allow to examine the proposed methods thoroughly under different real-world conditions with limited groundtruth at hand. As an additional innovative application, the first approach was successfully implemented in a demonstrator car that drove the so-called Bertha Benz Memorial Route from Mannheim to Pforzheim autonomously in real traffic.
At the end of this thesis, the limits of the proposed systems are discussed and a prospect on possible future work is given.
In order to investigate the role of neuronal synchronization in perceptual grouping, a new method was developed to record selectively from multiple cortical sites of known functional specificity as determined by optical imaging of intrinsic signals. To this end, a matrix of closely spaced guide tubes was developed in cooperation with a company providing the essential manufacturing technique RMPD® (Rapid Micro Product Development). The matrix was embedded into a framework of hard and software that allowed for the mapping of each guide tube onto the cortical site an electrode would be led to if inserted into that guide tube. With these developments, it was possible to determine the functional layout of the cortex by optical imaging and subsequently perform targeted recordings with multiple electrodes in parallel. The method was tested for its accuracy and found to target the electrodes with a precision of 100 µm to the desired cortical locations. Using the developed technique, neuronal activity was recorded from area 18 of anesthetized cats. For stimulation, Gabor-patches in different geometrical configurations were placed over the recorded receptive fields merging into visual objects appropriate for testing the hypothesis of feature binding by synchrony. Synchronization strength was measured by the height of the cross-correlation centre peaks. All pairwise synchronizations were summarized in a correlation index which determined the mean difference of the correlation strengths between conditions in which recording sites should or should not fire in synchrony according to the binding hypothesis. The correlation index deviated significantly from zero for several of these configurations, further supporting the hypothesis that synchronization plays an important role in the process of perceptual grouping. Furthermore, direct evidence was found for the independence of the synchronization strength from the neuronal firing rate and for neurons that change dynamically the ensemble they participate in. In parallel to the experimental approach, mechanisms of oscillatory long range synchronization were studied by network simulations. To this end, a biologically plausible model was implemented using pyramidal and basket cells with Hodgkin-Huxley like conductances. Several columns were built from these cells and intra- and inter-columnar connections were mimicked from physiological data. When activated by independent Poisson spike trains, the columns showed oscillatory activity in the gamma frequency range. Correlation analysis revealed the tendency to locally synchronize the oscillations among the columns, but a rapid phase transition occurred with increasing cortical distance. This finding suggests that the present view of the inter-columnar connectivity does not fully explain oscillatory long range synchronization and predicts that other processes such as top-down influences are necessary for long range synchronization phenomena.
The technology of advanced driver assistance systems (ADAS) has rapidly developed in the last few decades. The current level of assistance provided by the ADAS technology significantly makes driving much safer by using the developed driver protection systems such as automatic obstacle avoidance and automatic emergency braking. With the use of ADAS, driving not only becomes safer but also easier as ADAS can take over some routine tasks from the driver, e.g. by using ADAS features of automatic lane keeping and automatic parking. With the continuous advancement of the ADAS technology, fully autonomous cars are predicted to be a reality in the near future.
One of the most important tasks in autonomous driving is to accurately localize the egocar and continuously track its position. The module which performs this task, namely odometry, can be built using different kinds of sensors: camera, LIDAR, GPS, etc. This dissertation covers the topic of visual odometry using a camera. While stereo visual odometry frameworks are widely used and dominating the KITTI odometry benchmark (Geiger, Lenz and Urtasun 2012), the accuracy and performance of monocular visual odometry is much less explored.
In this dissertation, a new monocular visual odometry framework is proposed, namely Predictive Monocular Odometry (PMO). PMO employs the prediction-and-correction mechanism in different steps of its implementation. PMO falls into the category of sparse methods. It detects and chooses keypoints from images and tracks them on the subsequence frames. The relative pose between two consecutive frames is first pre-estimated using the pitch-yaw-roll estimation based on the far-field view (Barnada, Conrad, Bradler, Ochs and Mester 2015) and the statistical motion prediction based on the vehicle motion model (Bradler, Wiegand and Mester 2015). The correction and optimization of the relative pose estimates are carried out by minimizing the photometric error of the keypoints matches using the joint epipolar tracking method (Bradler, Ochs, Fanani and Mester 2017).
The monocular absolute scale is estimated by employing a new approach to ground plane estimation. The camera height over ground is assumed to be known. The scale is first estimated using the propagation-based scale estimation. Both of the sparse matching and the dense matching of the ground features between two consecutive frames are then employed to refine the scale estimates. Additionally, street masks from a convolutional neural network (CNN) are also utilized to reject non-ground objects in the region of interest.
PMO also has a method to detect independently moving objects (IMO). This is important for visual odometry frameworks because the localization of the ego-car should be estimated only based on static objects. The IMO candidate masks are provided by a CNN. The case of crossing IMOs is handled by checking the epipolar consistency. The parallel-moving IMOs, which are epipolar conformant, are identified by checking the depth consistency against the depth maps from CNN.
In order to evaluate the accuracy of PMO, a full simulation on the KITTI odometry dataset was performed. PMO achieved the best accuracy level among the published monocular frameworks when it was submitted to the KITTI odometry benchmark in July 2017. As of January 2018, it is still one of the leading monocular methods in the KITTI odometry benchmark.
It is important to note that PMO was developed without employing random sampling consensus (RANSAC) which arguably has been long considered as one of the irreplaceable components in a visual odometry framework. In this sense, PMO introduces a new style of visual odometry framework. PMO was also developed without a multi-frame bundle adjustment step. This reflects the high potential of PMO when such multi-frame optimization scheme is also taken into account.
The main topic of the present thesis is scene flow estimation in a monocular camera system. Scene flow describes the joint representation of 3D positions and motions of the scene. A special focus is placed on approaches that combine two kinds of information, deep-learning-based single-view depth estimation and model-based multi-view geometry.
The first part addresses single-view depth estimation focussing on a method that provides single-view depth information in an advantageous form for monocular scene flow estimation methods. A convolutional neural network, called ProbDepthNet, is proposed, which provides pixel-wise well-calibrated depth distributions. The experiments show that different strategies for quantifying the measurement uncertainty provide overconfident estimates due to overfitting effects. Therefore, a novel recalibration technique is integrated as part of the ProbDepthNet, which is validated to improve the calibration of the uncertainty measures. The monocular scene flow methods presented in the subsequent parts confirm that the integration of single-view depth information results in the best performance if the neural network provides depth distributions instead of single depth values and contains a recalibration.
Three methods for monocular scene flow estimation are presented, each one designed to combine multi-view geometry-based optimization with deep learning-based single-view depth estimation such as ProbDepthNet. While the first method, SVD-MSfM, performs the motion and depth estimation as two subsequent steps, the second method, Mono-SF, jointly optimizes the motion estimates and the depth structure. Both methods are tailored to address scenes, where the objects and motions can be represented by a set of rigid bodies. Dynamic traffic scenes are one kind of scenes that essentially fulfill this characteristic. The method, Mono-Stixel, uses an even more specialized scene model for traffic scenes, called stixel world, as underlying scene representation.
The proposed methods provide new state of the art for monocular scene flow estimation with Mono-SF being the first and leading monocular method on the KITTI scene flow benchmark at the time of submission of the present thesis. The experiments validate that both kind of information, the multi-view geometric optimization and the single-view depth estimates, contribute to the monocular scene flow estimates and are necessary to achieve the new state of the art accuracy.
At present, there is a huge lag between the artificial and the biological information processing systems in terms of their capability to learn. This lag could be certainly reduced by gaining more insight into the higher functions of the brain like learning and memory. For instance, primate visual cortex is thought to provide the long-term memory for the visual objects acquired by experience. The visual cortex handles effortlessly arbitrary complex objects by decomposing them rapidly into constituent components of much lower complexity along hierarchically organized visual pathways. How this processing architecture self-organizes into a memory domain that employs such compositional object representation by learning from experience remains to a large extent a riddle. The study presented here approaches this question by proposing a functional model of a self-organizing hierarchical memory network. The model is based on hypothetical neuronal mechanisms involved in cortical processing and adaptation. The network architecture comprises two consecutive layers of distributed, recurrently interconnected modules. Each module is identified with a localized cortical cluster of fine-scale excitatory subnetworks. A single module performs competitive unsupervised learning on the incoming afferent signals to form a suitable representation of the locally accessible input space. The network employs an operating scheme where ongoing processing is made of discrete successive fragments termed decision cycles, presumably identifiable with the fast gamma rhythms observed in the cortex. The cycles are synchronized across the distributed modules that produce highly sparse activity within each cycle by instantiating a local winner-take-all-like operation. Equipped with adaptive mechanisms of bidirectional synaptic plasticity and homeostatic activity regulation, the network is exposed to natural face images of different persons. The images are presented incrementally one per cycle to the lower network layer as a set of Gabor filter responses extracted from local facial landmarks. The images are presented without any person identity labels. In the course of unsupervised learning, the network creates simultaneously vocabularies of reusable local face appearance elements, captures relations between the elements by linking associatively those parts that encode the same face identity, develops the higher-order identity symbols for the memorized compositions and projects this information back onto the vocabularies in generative manner. This learning corresponds to the simultaneous formation of bottom-up, lateral and top-down synaptic connectivity within and between the network layers. In the mature connectivity state, the network holds thus full compositional description of the experienced faces in form of sparse memory traces that reside in the feed-forward and recurrent connectivity. Due to the generative nature of the established representation, the network is able to recreate the full compositional description of a memorized face in terms of all its constituent parts given only its higher-order identity symbol or a subset of its parts. In the test phase, the network successfully proves its ability to recognize identity and gender of the persons from alternative face views not shown before. An intriguing feature of the emerging memory network is its ability to self-generate activity spontaneously in absence of the external stimuli. In this sleep-like off-line mode, the network shows a self-sustaining replay of the memory content formed during the previous learning. Remarkably, the recognition performance is tremendously boosted after this off-line memory reprocessing. The performance boost is articulated stronger on those face views that deviate more from the original view shown during the learning. This indicates that the off-line memory reprocessing during the sleep-like state specifically improves the generalization capability of the memory network. The positive effect turns out to be surprisingly independent of synapse-specific plasticity, relying completely on the synapse-unspecific, homeostatic activity regulation across the memory network. The developed network demonstrates thus functionality not shown by any previous neuronal modeling approach. It forms and maintains a memory domain for compositional, generative object representation in unsupervised manner through experience with natural visual images, using both on- ("wake") and off-line ("sleep") learning regimes. This functionality offers a promising departure point for further studies, aiming for deeper insight into the learning mechanisms employed by the brain and their consequent implementation in the artificial adaptive systems for solving complex tasks not tractable so far.
Bayessche Methoden zur Schätzung von Stammbäumen mit Verzweigungszeitpunkten aus molekularen Daten
(2009)
Ein großes Ziel der Evolutionsbiologie ist es, die Stammesgeschichte der Arten zu rekonstruieren. Historisch verwendeten Systematiker hierfür morphologische und anatomische Merkmale. Mit dem stetigen Zuwachs an verfügbaren Sequenzdaten werden heute verstärkt Methoden entwickelt und eingesetzt, welche die Rekonstruktion auf Basis von molekularen Daten ermöglichen. Im Fokus der aktuellen Forschung steht die Anwendung und Weiterentwicklung Bayesscher Methoden. Diese Methoden besitzen große Popularität, da sie in Verbindung mit Markov-Ketten-Monte-Carlo-Verfahren eingesetzt werden können, um einen Stammbaum zu vorgegebenen Spezies zu schätzen und dessen Variabilität zu bestimmen. Im Rahmen dieser Dissertation wurde die erweiterbare Software TreeTime entwickelt. TreeTime bietet Schnittstellen für die Einbindung von molekularen Evolutions- und Ratenänderungsmodellen und stellt neu entwickelte Methoden bereit, um Stammbäume mit Verzweigungszeitpunkten zu rekonstruieren. In TreeTime werden die molekularen Daten und die zeitlichen Informationen, wie z.B. Fossilfunde, in einem Bayes-Verfahren simultan berücksichtigt, um die Zeitpunkte der Artaufspaltungen genauer zu datieren. Für die Anwendung Bayesscher Methoden in der Rekonstruktion von Stammbäumen wird ein stochastisches Modell benötigt, das die Evolution der molekularen Sequenzen entlang den Kanten eines Stammbaums beschreibt. Der Mutationsprozess der Sequenzen wird durch ein molekulares Evolutionsmodell definiert. Die Verwendung der klassischen molekularen Evolutionsmodelle impliziert die Annahme einer konstanten Evolutionsgeschwindigkeit der Sequenzen im Stammbaum. Diese Annahme wird als Hypothese der molekularen Uhr bezeichnet und bildet die Grundlage zum Schätzen der Verzweigungszeiten des Stammbaums. Der Verzweigungszeitpunkt, an dem sich zwei Spezies im Stammbaum aufspalten, spiegelt sich in der Ähnlichkeit der zugehörigen molekularen Sequenzen. Je älter dieser Verzweigungszeitpunkt ist, desto größer ist die Anzahl der unterschiedlichen Positionen in den Sequenzen. Häufig ist jedoch die Annahme der molekularen Uhr verletzt, so dass in gewissen Teilbereichen eines Stammbaums eine erhöhte Evolutionsgeschwindigkeit nachweisbar ist. Falls die Verletzung konstanter Evolutionsgeschwindigkeiten nicht ausgeschlossen werden kann, sollten schwankende Mutationsraten in der Modellierung explizit berücksichtigt werden. Hierfür wurden verschiedene Ratenänderungsmodelle vorgeschlagen. Bisher sind nur wenige dieser Ratenänderungsmodelle in Softwarepaketen verfügbar und ihre Eigenschaften sind nicht ausreichend erforscht. Das Ziel dieser Arbeit ist die Entwicklung und Bereitstellung von Bayesschen Modellen und Methoden zum Schätzen von Stammbäumen mit Verzweigungszeitpunkten. Die Methoden sollten auch bei unterschiedlichen Evolutionsgeschwindigkeiten im Stammbaum anwendbar sein. Vorgestellt wird ein neues Ratenänderungsmodell, eine neue Möglichkeit der Angabe von flexiblen Beschränkungen für die Topologie des Stammbaums sowie die Nutzung dieser Beschränkungen für die zeitliche Kalibrierung. Das neue Raten Änderungsmodell sowie die topologischen und zeitlichen Beschränkungen werden in einen modularen Softwareentwurf eingebettet. Durch den erweiterbaren Entwurf können bestehende und zukünftige molekulare Evolutionsmodelle und Ratenänderungsmodelle in die Software eingebunden und verwendet werden. Die vorgestellten Modelle und Methoden werden gemäß dem Softwareentwurf in das neu entwickelte Programm TreeTime aufgenommen und effzient implementiert. Zusätzlich werden bereits vorhandene Modelle programmiert und eingebunden, die nicht in anderen Softwarepaketen verfügbar sind. Des Weiteren wird eine neue Methode entwickelt und angewendet, um die Passgenauigkeit eines Modells für die Apriori-Verteilung auf der Menge der Baumtopologien zu beurteilen. Diese Methode wird zur Auswahl geeigneter Modelle benutzt, indem eine Auswertung der beobachteten Baumtopologien der Datenbank TreeBASE durchgeführt wird. Anschließend wird die Software TreeTime in einer Simulationsstudie eingesetzt, um die Eigenschaften der implementierten Ratenänderungsmodelle zu vergleichen. Die Software wird für die Rekonstruktion des Stammbaums zu 38 Spezies aus der Familie der Eidechsen (Lacertidae) verwendet. Da die zugehörigen molekularen Daten von der Hypothese der molekularen Uhr abweichen, werden unterschiedliche Ratenänderungsmodelle bei der Rekonstruktion verwendet und abschließend bewertet. ........
Visual perception has increasingly grown important during the last decades in the robotics domain. Mobile robots have to localize themselves in known environments and carry out complex navigation tasks. This thesis presents an appearance-based or view-based approach to robot self-localization and robot navigation using holistic, spherical views obtained by cameras with large fields of view. For view-based methods, it is crucial to have a compressed image representation where different views can be stored and compared efficiently. Our approach relies on the spherical Fourier transform, which transforms a signal defined on the sphere to a small set of coefficients, approximating the original signal by a weighted sum of orthonormal basis functions, the so-called spherical harmonics. The truncated low order expansion of the image signal allows to compare input images efficiently, and the mathematical properties of spherical harmonics also allow for estimating rotation between two views, even in 3D. Since no geometrical measurements need to be done, modest quality of the vision system is sufficient. All experiments shown in this thesis are purely based on visual information to show the applicability of the approach. The research presented on robot self localization was focused on demonstrating the usability of the compressed spherical harmonics representation to solve the well-known kidnapped robot problem. To address this problem, the basic idea is to compare the current view to a set of images from a known environment to obtain a likelihood of robot positions. To localize the robot, one could choose the most probable position from the likelihood map; however, it is more beneficial to apply standard methods to integrate information over time while the robot moves, that is, particle or Kalman filters. The first step was to design a fast expansion method to obtain coefficient vectors directly in image space. This was achieved by back-projecting basis functions on the input image. The next steps were to develop a dissimilarity measure, an estimator for rotations between coefficient vectors, and a rotation-invariant dissimilarity measure, all of them purely based on the compact signal representation. With all these techniques at hand, generating likelihood maps is straightforward, but first experiments indicated strong dependence on illumination conditions. This is obviously a challenge for all holistic methods, in particular for a spherical harmonics approach, since local changes usually affect each single element of the coefficient vector. To cope with illumination changes, we investigated preprocessing steps leading to feature images (e.g. edge images, depth images), which bring together our holistic approach and classical feature-based methods. Furthermore, we concentrated on building a statistical model for typical changes of the coefficient vectors in presence of changes in illumination. This task is more demanding but leads to even better results. The second major topic of this thesis is appearance-based robot navigation. I present a view-based approach called Optical Rails (ORails), which leads a robot along a prerecorded track. The robot navigates in a network of known locations which are denoted as waypoints. At each waypoint, we store a compressed view representation. A visual servoing method is used to reach a current target waypoint based on the appearance and the current camera image. Navigating in a network of views is achieved by reaching a sequence of stopover locations, one after another. The main contribution of this work is a model which allows to deduce the best driving direction of the robot based purely on the coefficient vectors of the current and the target image. It is based on image registration as the classical method by Lucas-Kanade, but has been transferred to the spectral domain, which allows for great speedup. ORails also includes a waypoint selection strategy and a module for steering our nonholonomic robot. As for our self-localization algorithm, dependance on illumination changes is also problematic in ORails. Furthermore, occlusions have to be handled for ORails to work properly. I present a solution based on the optimal expansion, which is able to deal with incomplete image signals. To handle dynamic occlusions, i.e. objects appearing in an arbitrary region of the image, we use the linearity of the expansion process and cut the image into segments. These segments can be treated separately, and finally we merge the results. At this point, we can decide to disregard certain segments. Slicing the view allows for local illumination compensation, which is inherently non-robust if applied to the whole view. In conclusion, this approach allows to handle the most important criticism to holistic view-based approaches, that is, occlusions and illumination changes, and consequently improves the performance of Optical Rails.
In this thesis, we opened the door towards a novel estimation theory for homogeneous vectors and have taken several steps into this new and uncharted territory. Present state of the art for homogeneous estimation problems treats such vectors p 2 Pn as unit vectors embedded in Rn+1 and approximates the unit hypersphere by a tangent plane (which is a n-dimensional real space, thus having the same number of degrees of freedom as Pn). This approach allows to use known and established methods from real space (e.g. the variational approach which leads to the FNS algorithm), but it only works well for small errors and has several drawbacks: • The unit sphere is a two-sheeted covering space of the projective space. Embedding approaches cannot model this fact and therefore can cause a degradation of estimation quality. • Linearization breaks down if distributions are not highly concentrated (e.g. if data configurations approach degenerate situations). • While estimation in tangential planes is possible with little error, the characterization of uncertainties with covariance matrices is much more problematic. Covariance matrices are not suited for modelling axial uncertainties if distributions are not concentrated. Therefore, we linked approaches from directional statistics and estimation theory together. (Homogeneous) TLS estimation could be identified as central model for homogeneous estimation and links to axial statistics were established. In the first chapters, a unified estimation theory for the point data and axial data was developed. In contrast to present approaches, we identified axial data as a specific data model (and not just as directional data with symmetric probability density function); this led to the development of novel terms like axial mean vectors, axial variances and axial expectation values. Like a tunnel which is constructed from both ends simultaneously, we also drilled from the parameter estimation side towards directional/axial statistics in the second part. The presentation of parameter estimation given in this thesis deviates strongly from all known textbooks by presenting homogeneous estimation problems as a distinguished class of problems which calls for different estimation tools. Using the results from the first part, the TLS solution can be interpreted as the weighted anti-mean vector of an axial sample. This link allows to use our results from axial statistics; for instance, the certainty of the anti-mode (i.e. of the TLS solution!) can be described with a weighted Bingham distribution (see (3.91)). While present approaches are only interested in the eigenvector of the some matrix, we can now exploit the whole mean scatter matrix to describe TLS solution and its certainty. Algorithms like FNS, HEIV or renormalization were presented in a common context and linked to each other. One central result is that all iterative homogeneous estimation algorithms essentially minimize a series of evolving Rayleigh coefficients which corresponds to a series of (converging?) cost functions. Statistical optimization is only possible if we clearly identify every step as what it exactly is. For instance, the vague statement “solving Xp ... 0” means nothing but setting ˆp := arg minp pTXp pT p . We identified the most complex scenario for which closed form optimal solutions are possible (in terms of axial statistics: the type-I matrix weighted model). The IETLS approach which is developed in this thesis then solves general type-II matrix weighted problems with an iterative solution of a series of type-I matrix weighted problems. This approach also allows to built converging schemes including robust and/or constrained estimation – in contrast to other approaches which can have severe convergence problems even without such extensions if error levels are not low. Chapter 6 then is another big step forward. We presented the theoretical background of homogeneous estimation by introducing novel concepts like singular vector unbiasedness of random matrices and solved the problem of optimal estimation for correlated data. For instance, these results could be used for better estimation of local image orientation / optical flow (see section 7.2). At the end of this thesis, simulations and experiments for a few computer vision applications were presented; besides orientation estimation, especially the results for robust and constrained estimation for fundamental matrices is impressive. The novel algorithms are applicable for a lot of other applications not presented here, for instance camera calibration, factorization algorithm formulti-view structure from motion, or conic fitting. The fact that this work paved the way for a lot of further research is certainly a good sign.
Driving can be dangerous. Humans become inattentive when performing a monotonous task like driving. Also the risk implied while multi-tasking, like using the cellular phone while driving, can break the concentration of the driver and increase the risk of accidents. Others factors like exhaustion, nervousness and excitement affect the performance of the driver and the response time. Consequently, car manufacturers have developed systems in the last decades which assist the driver under various circumstances. These systems are called driver assistance systems. Driver assistance systems are meant to support the task of driving, and the field of action varies from alerting the driver, with acoustical or optical warnings, to taking control of the car, such as keeping the vehicle in the traffic lane until the driver resumes control. For such a purpose, the vehicle is equipped with on-board sensors which allow the perception of the environment and/or the state of the vehicle. Cameras are sensors which extract useful information about the visual appearance of the environment. Additionally, a binocular system allows the extraction of 3D information. One of the main requirements for most camera-based driver assistance systems is the accurate knowledge of the motion of the vehicle. Some sources of information, like velocimeters and GPS, are of common use in vehicles today. Nevertheless, the resolution and accuracy usually achieved with these systems are not enough for many real-time applications. The computation of ego-motion from sequences of stereo images for the implementation of driving intelligent systems, like autonomous navigation or collision avoidance, constitutes the core of this thesis. This dissertation proposes a framework for the simultaneous computation of the 6 degrees of freedom of ego-motion (rotation and translation in 3D Euclidean space), the estimation of the scene structure and the detection and estimation of independently moving objects. The input is exclusively provided by a binocular system and the framework does not call for any data acquisition strategy, i.e. the stereo images are just processed as they are provided. Stereo allows one to establish correspondences between left and right images, estimating 3D points of the environment via triangulation. Likewise, feature tracking establishes correspondences between the images acquired at different time instances. When both are used together for a large number of points, the result is a set of clouds of 3D points with point-to-point correspondences between clouds. The apparent motion of the 3D points between consecutive frames is caused by a variety of reasons. The most dominant motion for most of the points in the clouds is caused by the ego-motion of the vehicle; as the vehicle moves and images are acquired, the relative position of the world points with respect to the vehicle changes. Motion is also caused by objects moving in the environment. They move independently of the vehicle motion, so the observed motion for these points is the sum of the ego-vehicle motion and the independent motion of the object. A third reason, and of paramount importance in vision applications, is caused by correspondence problems, i.e. the incorrect spatial or temporal assignment of the point-to-point correspondence. Furthermore, all the points in the clouds are actually noisy measurements of the real unknown 3D points of the environment. Solving ego-motion and scene structure from the clouds of points requires some previous analysis of the noise involved in the imaging process, and how it propagates as the data is processed. Therefore, this dissertation analyzes the noise properties of the 3D points obtained through stereo triangulation. This leads to the detection of a bias in the estimation of 3D position, which is corrected with a reformulation of the projection equation. Ego-motion is obtained by finding the rotation and translation between the two clouds of points. This problem is known as absolute orientation, and many solutions based on least squares have been proposed in the literature. This thesis reviews the available closed form solutions to the problem. The proposed framework is divided in three main blocks: 1) stereo and feature tracking computation, 2) ego-motion estimation and 3) estimation of 3D point position and 3D velocity. The first block solves the correspondence problem providing the clouds of points as output. No special implementation of this block is required in this thesis. The ego-motion block computes the motion of the cameras by finding the absolute orientation between the clouds of static points in the environment. Since the cloud of points might contain independently moving objects and outliers generated by false correspondences, the direct computation of the least squares might lead to an erroneous solution. The first contribution of this thesis is an effective rejection rule that detects outliers based on the distance between predicted and measured quantities, and reduces the effects of noisy measurement by assigning appropriate weights to the data. This method is called Smoothness Motion Constraint (SMC). The ego-motion of the camera between two frames is obtained finding the absolute orientation between consecutive clouds of weighted 3D points. The complete ego-motion since initialization is achieved concatenating the individual motion estimates. This leads to a super-linear propagation of the error, since noise is integrated. A second contribution of this dissertation is a predictor/corrector iterative method, which integrates the clouds of 3D points of multiple time instances for the computation of ego-motion. The presented method considerably reduces the accumulation of errors in the estimated ego-position of the camera. Another contribution of this dissertation is a method which recursively estimates the 3D world position of a point and its velocity; by fusing stereo, feature tracking and the estimated ego-motion in a Kalman Filter system. An improved estimation of point position is obtained this way, which is used in the subsequent system cycle resulting in an improved computation of ego-motion. The general contribution of this dissertation is a single framework for the real time computation of scene structure, independently moving objects and ego-motion for automotive applications.
In the context of information theory, the term Mutual Information has first been formulated by Claude Elwood Shannon. Information theory is the consistent mathematical description of technical communication systems. To this day, it is the basis of numerous applications in modern communications engineering and yet became indispensable in this field. This work is concerned with the development of a concept for nonlinear feature selection from scalar, multivariate data on the basis of the mutual information. From the viewpoint of modelling, the successful construction of a realistic model depends highly on the quality of the employed data. In the ideal case, high quality data simply consists of the relevant features for deriving the model. In this context, it is important to possess a suitable method for measuring the degree of the, mostly nonlinear, dependencies between input- and output variables. By means of such a measure, the relevant features could be specifically selected. During the course of this work, it will become evident that the mutual information is a valuable and feasible measure for this task and hence the method of choice for practical applications. Basically and without the claim of being exhaustive, there are two possible constellations that recommend the application of feature selection. On the one hand, feature selection plays an important role, if the computability of a derived system model cannot be guaranteed, due to a multitude of available features. On the other hand, the existence of very few data points with a significant number of features also recommends the employment of feature selection. The latter constellation is closely related to the so called "Curse of Dimensionality". The actual statement behind this is the necessity to reduce the dimensionality to obtain an adequate coverage of the data space. In other word, it is important to reduce the dimensionality of the data, since the coverage of the data space exponentially decreases, for a constant number of data points, with the dimensionality of the available data. In the context of mapping between input- and output space, this goal is ideally reached by selecting only the relevant features from the available data set. The basic idea for this work has its origin in the rather practical field of automotive engineering. It was motivated by the goals of a complex research project in which the nonlinear, dynamic dependencies among a multitude of sensor signals should be identified. The final goal of such activities was to derive so called virtual sensors from identified dependencies among the installed automotive sensors. This enables the real-time computability of the required variable without the expenses of additional hardware. The prospect of doing without additional computing hardware is a strong motive force in particular in automotive engineering. In this context, the major problem was to find a feasible method to capture the linear- as well as the nonlinear dependencies. As mentioned before, the goal of this work is the development of a flexibly applicable system for nonlinear feature selection. The important point here is to guarantee the practicable computability of the developed method even for high dimensional data spaces, which are rather realistic in technical environments. The employed measure for the feature selection process is based on the sophisticated concept of mutual information. The property of the mutual information, regarding its high sensitivity and specificity to linear- and nonlinear statistical dependencies, makes it the method of choice for the development of a highly flexible, nonlinear feature selection framework. In addition to the mere selection of relevant features, the developed framework is also applicable for the nonlinear analysis of the temporal influences of the selected features. Hence, a subsequent dynamic modelling can be performed more efficiently, since the proposed feature selection algorithm additionally provides information about the temporal dependencies between input- and output variables. In contrast to feature extraction techniques, the developed feature selection algorithm in this work has another considerable advantage. In the case of cost intensive measurements, the variables with the highest information content can be selected in a prior feasibility study. Hence, the developed method can also be employed to avoid redundance in the acquired data and thus prevent for additional costs.