-
How do we approach intrinsic motivation computationally? : A commentary on What is intrinsic motivation? A typology of computational approaches by Pierre-Yves Oudeyer and Frederic Kaplan
(2008)
- A commentary on: What is intrinsic motivation? A typology of computational approaches by Pierre-Yves Oudeyer and Frederic Kaplan What is the energy function guiding behavior and learning? Representationbased approaches like maximum entropy, generative models, sparse coding, or slowness principles can account for unsupervised learning of biologically observed structure in sensory systems from raw sensory data. However, they do not relate to behavior. Behavior-based approaches like reinforcement learning explain animal behavior in well-described situations. However, they rely on high-level representations which they cannot extract from raw sensory data. Combinations of multiple goal functions seems the methodology of choice to understand the complexity of the brain. But what is the set of possible goals? Focusing on the reinforcement learning framework, this question is addressed in the article "What is intrinsic motivation? A typology of computational approaches" by Pierre-Yves Oudeyer and Frederic Kaplan. It lists and classifies equations which extend the traditional concept of a "reward function". Our behavior is not only driven by external rewards such as food, but there is a variety of intrinsic motivations. Some are aimed at exploration and so ensure delivery of rich sensory data, aiding unsupervised learning by active data acquisition, where the learning progress of the sensory system becomes the goal. A novice reader may first want to familiarize himself with an example of a motivation function implemented in a model and applied in some scenario. A fun example is Schmidhuber (2006), which would be classified as "Learning Progress Motivation" (LPM) in the article of Oudeyer and Kaplan. The model consists of a predictor and a controller, aka critic and actor, respectively. The critic is a sensory system that gives rewards to the actor whenever its learning progresses. The actor hence learns to act in such a way that the critic is presented data which leads to the critic´s learning progress. This can explain the learning of the actor´s parameters by a reinforcement learning algorithm. The structure, parameters and the learning paradigm of the critic are not specified, but unsupervised learning as to learning to predict would be suitable. The broad overview of intrinsic motivation functions offered by Oudeyer and Kaplan leads to novel ways of conceptualizing and gaining new insights into the variety of computational mechanisms driving behavior and learning. A possible extension of the typology could include goal functions of unsupervised learning. Then an assessment of the relations between all relevant goal functions may provide a well-founded systems view of the brain.
-
Learning the optimal control of coordinated eye and head movements
(2011)
- Various optimality principles have been proposed to explain the characteristics of coordinated eye and head movements during visual orienting behavior. At the same time, researchers have suggested several neural models to underly the generation of saccades, but these do not include online learning as a mechanism of optimization. Here, we suggest an open-loop neural controller with a local adaptation mechanism that minimizes a proposed cost function. Simulations show that the characteristics of coordinated eye and head movements generated by this model match the experimental data in many aspects, including the relationship between amplitude, duration and peak velocity in head-restrained and the relative contribution of eye and head to the total gaze shift in head-free conditions. Our model is a first step towards bringing together an optimality principle and an incremental local learning mechanism into a unified control scheme for coordinated eye and head movements.
