Monocular scene flow estimation for dynamic traffic scenes

  • The main topic of the present thesis is scene flow estimation in a monocular camera system. Scene flow describes the joint representation of 3D positions and motions of the scene. A special focus is placed on approaches that combine two kinds of information, deep-learning-based single-view depth estimation and model-based multi-view geometry. The first part addresses single-view depth estimation focussing on a method that provides single-view depth information in an advantageous form for monocular scene flow estimation methods. A convolutional neural network, called ProbDepthNet, is proposed, which provides pixel-wise well-calibrated depth distributions. The experiments show that different strategies for quantifying the measurement uncertainty provide overconfident estimates due to overfitting effects. Therefore, a novel recalibration technique is integrated as part of the ProbDepthNet, which is validated to improve the calibration of the uncertainty measures. The monocular scene flow methods presented in the subsequent parts confirm that the integration of single-view depth information results in the best performance if the neural network provides depth distributions instead of single depth values and contains a recalibration. Three methods for monocular scene flow estimation are presented, each one designed to combine multi-view geometry-based optimization with deep learning-based single-view depth estimation such as ProbDepthNet. While the first method, SVD-MSfM, performs the motion and depth estimation as two subsequent steps, the second method, Mono-SF, jointly optimizes the motion estimates and the depth structure. Both methods are tailored to address scenes, where the objects and motions can be represented by a set of rigid bodies. Dynamic traffic scenes are one kind of scenes that essentially fulfill this characteristic. The method, Mono-Stixel, uses an even more specialized scene model for traffic scenes, called stixel world, as underlying scene representation. The proposed methods provide new state of the art for monocular scene flow estimation with Mono-SF being the first and leading monocular method on the KITTI scene flow benchmark at the time of submission of the present thesis. The experiments validate that both kind of information, the multi-view geometric optimization and the single-view depth estimates, contribute to the monocular scene flow estimates and are necessary to achieve the new state of the art accuracy.

Download full text files

Export metadata

Author:Fabian Brickwedde
Place of publication:Frankfurt am Main
Referee:Rudolf MesterORCiD, Kostas Alexis
Document Type:Doctoral Thesis
Date of Publication (online):2022/01/09
Year of first Publication:2021
Publishing Institution:Universitätsbibliothek Johann Christian Senckenberg
Granting Institution:Johann Wolfgang Goethe-Universität
Date of final exam:2021/12/09
Release Date:2022/02/16
Tag:Computer Vision; Monocular Scene Flow; Traffic Scenes
Page Number:212
Institutes:Informatik und Mathematik
Dewey Decimal Classification:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Licence (German):License LogoDeutsches Urheberrecht