Visual perception has increasingly grown important during the last decades in the robotics domain. Mobile robots have to localize themselves in known environments and carry out complex navigation tasks. This thesis presents an appearance-based or view-based approach to robot self-localization and robot navigation using holistic, spherical views obtained by cameras with large fields of view. For view-based methods, it is crucial to have a compressed image representation where different views can be stored and compared efficiently. Our approach relies on the spherical Fourier transform, which transforms a signal defined on the sphere to a small set of coefficients, approximating the original signal by a weighted sum of orthonormal basis functions, the so-called spherical harmonics. The truncated low order expansion of the image signal allows to compare input images efficiently, and the mathematical properties of spherical harmonics also allow for estimating rotation between two views, even in 3D. Since no geometrical measurements need to be done, modest quality of the vision system is sufficient. All experiments shown in this thesis are purely based on visual information to show the applicability of the approach. The research presented on robot self localization was focused on demonstrating the usability of the compressed spherical harmonics representation to solve the well-known kidnapped robot problem. To address this problem, the basic idea is to compare the current view to a set of images from a known environment to obtain a likelihood of robot positions. To localize the robot, one could choose the most probable position from the likelihood map; however, it is more beneficial to apply standard methods to integrate information over time while the robot moves, that is, particle or Kalman filters. The first step was to design a fast expansion method to obtain coefficient vectors directly in image space. This was achieved by back-projecting basis functions on the input image. The next steps were to develop a dissimilarity measure, an estimator for rotations between coefficient vectors, and a rotation-invariant dissimilarity measure, all of them purely based on the compact signal representation. With all these techniques at hand, generating likelihood maps is straightforward, but first experiments indicated strong dependence on illumination conditions. This is obviously a challenge for all holistic methods, in particular for a spherical harmonics approach, since local changes usually affect each single element of the coefficient vector. To cope with illumination changes, we investigated preprocessing steps leading to feature images (e.g. edge images, depth images), which bring together our holistic approach and classical feature-based methods. Furthermore, we concentrated on building a statistical model for typical changes of the coefficient vectors in presence of changes in illumination. This task is more demanding but leads to even better results. The second major topic of this thesis is appearance-based robot navigation. I present a view-based approach called Optical Rails (ORails), which leads a robot along a prerecorded track. The robot navigates in a network of known locations which are denoted as waypoints. At each waypoint, we store a compressed view representation. A visual servoing method is used to reach a current target waypoint based on the appearance and the current camera image. Navigating in a network of views is achieved by reaching a sequence of stopover locations, one after another. The main contribution of this work is a model which allows to deduce the best driving direction of the robot based purely on the coefficient vectors of the current and the target image. It is based on image registration as the classical method by Lucas-Kanade, but has been transferred to the spectral domain, which allows for great speedup. ORails also includes a waypoint selection strategy and a module for steering our nonholonomic robot. As for our self-localization algorithm, dependance on illumination changes is also problematic in ORails. Furthermore, occlusions have to be handled for ORails to work properly. I present a solution based on the optimal expansion, which is able to deal with incomplete image signals. To handle dynamic occlusions, i.e. objects appearing in an arbitrary region of the image, we use the linearity of the expansion process and cut the image into segments. These segments can be treated separately, and finally we merge the results. At this point, we can decide to disregard certain segments. Slicing the view allows for local illumination compensation, which is inherently non-robust if applied to the whole view. In conclusion, this approach allows to handle the most important criticism to holistic view-based approaches, that is, occlusions and illumination changes, and consequently improves the performance of Optical Rails.