Applications of spherical harmonics in robot vision

  • Visual perception has increasingly grown important during the last decades in the robotics domain. Mobile robots have to localize themselves in known environments and carry out complex navigation tasks. This thesis presents an appearance-based or view-based approach to robot self-localization and robot navigation using holistic, spherical views obtained by cameras with large fields of view. For view-based methods, it is crucial to have a compressed image representation where different views can be stored and compared efficiently. Our approach relies on the spherical Fourier transform, which transforms a signal defined on the sphere to a small set of coefficients, approximating the original signal by a weighted sum of orthonormal basis functions, the so-called spherical harmonics. The truncated low order expansion of the image signal allows to compare input images efficiently, and the mathematical properties of spherical harmonics also allow for estimating rotation between two views, even in 3D. Since no geometrical measurements need to be done, modest quality of the vision system is sufficient. All experiments shown in this thesis are purely based on visual information to show the applicability of the approach. The research presented on robot self localization was focused on demonstrating the usability of the compressed spherical harmonics representation to solve the well-known kidnapped robot problem. To address this problem, the basic idea is to compare the current view to a set of images from a known environment to obtain a likelihood of robot positions. To localize the robot, one could choose the most probable position from the likelihood map; however, it is more beneficial to apply standard methods to integrate information over time while the robot moves, that is, particle or Kalman filters. The first step was to design a fast expansion method to obtain coefficient vectors directly in image space. This was achieved by back-projecting basis functions on the input image. The next steps were to develop a dissimilarity measure, an estimator for rotations between coefficient vectors, and a rotation-invariant dissimilarity measure, all of them purely based on the compact signal representation. With all these techniques at hand, generating likelihood maps is straightforward, but first experiments indicated strong dependence on illumination conditions. This is obviously a challenge for all holistic methods, in particular for a spherical harmonics approach, since local changes usually affect each single element of the coefficient vector. To cope with illumination changes, we investigated preprocessing steps leading to feature images (e.g. edge images, depth images), which bring together our holistic approach and classical feature-based methods. Furthermore, we concentrated on building a statistical model for typical changes of the coefficient vectors in presence of changes in illumination. This task is more demanding but leads to even better results. The second major topic of this thesis is appearance-based robot navigation. I present a view-based approach called Optical Rails (ORails), which leads a robot along a prerecorded track. The robot navigates in a network of known locations which are denoted as waypoints. At each waypoint, we store a compressed view representation. A visual servoing method is used to reach a current target waypoint based on the appearance and the current camera image. Navigating in a network of views is achieved by reaching a sequence of stopover locations, one after another. The main contribution of this work is a model which allows to deduce the best driving direction of the robot based purely on the coefficient vectors of the current and the target image. It is based on image registration as the classical method by Lucas-Kanade, but has been transferred to the spectral domain, which allows for great speedup. ORails also includes a waypoint selection strategy and a module for steering our nonholonomic robot. As for our self-localization algorithm, dependance on illumination changes is also problematic in ORails. Furthermore, occlusions have to be handled for ORails to work properly. I present a solution based on the optimal expansion, which is able to deal with incomplete image signals. To handle dynamic occlusions, i.e. objects appearing in an arbitrary region of the image, we use the linearity of the expansion process and cut the image into segments. These segments can be treated separately, and finally we merge the results. At this point, we can decide to disregard certain segments. Slicing the view allows for local illumination compensation, which is inherently non-robust if applied to the whole view. In conclusion, this approach allows to handle the most important criticism to holistic view-based approaches, that is, occlusions and illumination changes, and consequently improves the performance of Optical Rails.
  • Die visuelle Erfassung der Umgebung ist nicht nur für den Menschen, sondern auch für mobile Roboter essentiell. Autonome Fahrzeuge stehen oft vor dem Problem, sich in bekannter Umgebung selbst lokalisieren zu müssen oder zu einem Ziel zu navigieren. Die von der Kamera gelieferten Daten sollen möglichst effizient genutzt werden, zumal mobile Roboter oft nur eine eingeschränkte Rechenleistung aufweisen. Die vorliegende Arbeit betrachtet die Anwendung von Kugelflächenfunktionen (Spherical Harmonics) in diesem Aufgabenbereich. Diese Funktionen ermöglichen es, das Eingabebild stark komprimiert abzulegen, indem es als Überlagerung verschiedener Kugelflächenfunktionen dargestellt wird. Diese Expansion in sphärische Basisfunktionen liefert Spektralkoeffizienten, vergleichbar mit denen der Fouriertransformation. Die Entwicklung in Kugelfächenfunktionen wird deshalb oft als sphärische Fouriertransformation bezeichnet. Eine niedrigdimensionale Approximation des Eingabebildes aus wenigen Basisfunktionen kann mit nur wenigen Zahlen beschrieben werden und enthält trotzdem wesentliche Informationen über das ursprüngliche Bildsignal. Hintergrund dieser Arbeit ist, die kompakte Darstellung des Kamerabildes für typische Anwendungen aus der Robotik nutzbar zu machen. Hier geht es um die Selbstlokalisierung und später auch um Navigationsaufgaben, deren Umsetzung ausschließlich auf Berechnungen im Raum der Spektralkoeffizienten basiert. Ich möchte mich auf diese beiden Bereiche aus der Roboternavigation konzentrieren, jedoch darauf hinweisen, dass Kugelflächenfunktionen in der Computergraphik ebenfalls weit verbreitet sind, beispielsweise beim Shading oder als Repräsentation für dreidimensionale Objekte...

Download full text files

  • dissertation-friedrich.pdf
    deu

Export metadata

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Holger Friedrich
URN:urn:nbn:de:hebis:30-115286
Referee:Rudolf MesterORCiD
Document Type:Doctoral Thesis
Language:English
Date of Publication (online):2011/09/14
Year of first Publication:2011
Publishing Institution:Universitätsbibliothek Johann Christian Senckenberg
Granting Institution:Johann Wolfgang Goethe-Universität
Date of final exam:2011/04/13
Release Date:2011/09/14
Note:
Diese Dissertation steht außerhalb der Universitätsbibliothek leider (aus urheberrechtlichen Gründen) nicht im Volltext zur Verfügung, die CD-ROM kann (auch über Fernleihe) bei der UB Frankfurt am Main ausgeliehen werden.
HeBIS-PPN:425318249
Institutes:Informatik und Mathematik / Informatik
Dewey Decimal Classification:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Sammlungen:Universitätspublikationen
Licence (German):License LogoArchivex. zur Lesesaalplatznutzung § 52b UrhG