Refine
Document Type
- Doctoral Thesis (3)
Language
- English (3)
Has Fulltext
- yes (3)
Is part of the Bibliography
- no (3)
Keywords
- Entwicklungspsychologie (1)
- Gedächtnis (1)
- Gedächtnisbildung (1)
- Großhirnrinde (1)
- Langzeitgedächtnis (1)
- NREM-Schlaf (1)
- Neural net (1)
- Neuronales Netz (1)
- Objekterkennung (1)
- Reaktivierung der Gedächtnisspuren (1)
Institute
- Frankfurt Institute for Advanced Studies (FIAS) (3) (remove)
This thesis investigates the development of early cognition in infancy using neural network models. Fundamental events in visual perception such as caused motion, occlusion, object permanence, tracking of moving objects behind occluders, object unity perception and sequence learning are modeled in a unifying computational framework while staying close to experimental data in developmental psychology of infancy. In the first project, the development of causality and occlusion perception in infancy is modeled using a simple, three-layered, recurrent network trained with error backpropagation to predict future inputs (Elman network). The model unifies two infant studies on causality and occlusion perception. Subsequently, in the second project, the established framework is extended to a larger prediction network that models the development of object unity, object permanence and occlusion perception in infancy. It is shown that these different phenomena can be unified into a single theoretical framework thereby explaining experimental data from 14 infant studies. The framework shows that these developmental phenomena can be explained by accurately representing and predicting statistical regularities in the visual environment. The models assume (1) different neuronal populations processing different motion directions of visual stimuli in the visual cortex of the newborn infant which are supported by neuroscientific evidence and (2) available learning algorithms that are guided by the goal of predicting future events. Specifically, the models demonstrate that no innate force notions, motion analysis modules, common motion detectors, specific perceptual rules or abilities to "reason" about entities which have been widely postulated in the developmental literature are necessary for the explanation of the discussed phenomena. Since the prediction of future events turned out to be fruitful for theoretical explanation of various developmental phenomena and a guideline for learning in infancy, the third model addresses the development of visual expectations themselves. A self-organising, fully recurrent neural network model that forms internal representations of input sequences and maps them onto eye movements is proposed. The reinforcement learning architecture (RLA) of the model learns to perform anticipatory eye movements as observed in a range of infant studies. The model suggests that the goal of maximizing the looking time at interesting stimuli guides infants' looking behavior thereby explaining the occurrence and development of anticipatory eye movements and reaction times. In contrast to classical neural network modelling approaches in the developmental literature, the model uses local learning rules and contains several biologically plausible elements like excitatory and inhibitory spiking neurons, spike-timing dependent plasticity (STDP), intrinsic plasticity (IP) and synaptic scaling. It is also novel from the technical point of view as it uses a dynamic recurrent reservoir shaped by various plasticity mechanisms and combines it with reinforcement learning. The model accounts for twelve experimental studies and predicts among others anticipatory behavior for arbitrary sequences and facilitated reacquisition of already learned sequences. All models emphasize the development of the perception of the discussed phenomena thereby addressing the questions of how and why this developmental change takes place - questions that are difficult to be assessed experimentally. Despite the diversity of the discussed phenomena all three projects rely on the same principle: the prediction of future events. This principle suggests that cognitive development in infancy may largely be guided by building internal models and representations of the visual environment and using those models to predict its future development.
This thesis will first introduce in more detail the Bayesian theory and its use in integrating multiple information sources. I will briefly talk about models and their relation to the dynamics of an environment, and how to combine multiple alternative models. Following that I will discuss the experimental findings on multisensory integration in humans and animals. I start with psychophysical results on various forms of tasks and setups, that show that the brain uses and combines information from multiple cues. Specifically, the discussion will focus on the finding that humans integrate this information in a way that is close to the theoretical optimal performance. Special emphasis will be put on results about the developmental aspects of cue integration, highlighting experiments that could show that children do not perform similar to the Bayesian predictions. This section also includes a short summary of experiments on how subjects handle multiple alternative environmental dynamics. I will also talk about neurobiological findings of cells receiving input from multiple receptors both in dedicated brain areas but also primary sensory areas. I will proceed with an overview of existing theories and computational models of multisensory integration. This will be followed by a discussion on reinforcement learning (RL). First I will talk about the original theory including the two different main approaches model-free and model-based reinforcement learning. The important variables will be introduced as well as different algorithmic implementations. Secondly, a short review on the mapping of those theories onto brain and behaviour will be given. I mention the most in uential papers that showed correlations between the activity in certain brain regions with RL variables, most prominently between dopaminergic neurons and temporal difference errors. I will try to motivate, why I think that this theory can help to explain the development of near-optimal cue integration in humans. The next main chapter will introduce our model that learns to solve the task of audio-visual orienting. Many of the results in this section have been published in [Weisswange et al. 2009b,Weisswange et al. 2011]. The model agent starts without any knowledge of the environment and acts based on predictions of rewards, which will be adapted according to the reward signaling the quality of the performed action. I will show that after training this model performs similarly to the prediction of a Bayesian observer. The model can also deal with more complex environments in which it has to deal with multiple possible underlying generating models (perform causal inference). In these experiments I use di#erent formulations of Bayesian observers for comparison with our model, and find that it is most similar to the fully optimal observer doing model averaging. Additional experiments using various alterations to the environment show the ability of the model to react to changes in the input statistics without explicitly representing probability distributions. I will close the chapter with a discussion on the benefits and shortcomings of the model. The thesis continues whith a report on an application of the learning algorithm introduced before to two real world cue integration tasks on a robotic head. For these tasks our system outperforms a commonly used approximation to Bayesian inference, reliability weighted averaging. The approximation is handy because of its computational simplicity, because it relies on certain assumptions that are usually controlled for in a laboratory setting, but these are often not true for real world data. This chapter is based on the paper [Karaoguz et al. 2011]. Our second modeling approach tries to address the neuronal substrates of the learning process for cue integration. I again use a reward based training scheme, but this time implemented as a modulation of synaptic plasticity mechanisms in a recurrent network of binary threshold neurons. I start the chapter with an additional introduction section to discuss recurrent networks and especially the various forms of neuronal plasticity that I will use in the model. The performance on a task similar to that of chapter 3 will be presented together with an analysis of the in uence of different plasticity mechanisms on it. Again benefits and shortcomings and the general potential of the method will be discussed. I will close the thesis with a general conclusion and some ideas about possible future work.
At present, there is a huge lag between the artificial and the biological information processing systems in terms of their capability to learn. This lag could be certainly reduced by gaining more insight into the higher functions of the brain like learning and memory. For instance, primate visual cortex is thought to provide the long-term memory for the visual objects acquired by experience. The visual cortex handles effortlessly arbitrary complex objects by decomposing them rapidly into constituent components of much lower complexity along hierarchically organized visual pathways. How this processing architecture self-organizes into a memory domain that employs such compositional object representation by learning from experience remains to a large extent a riddle. The study presented here approaches this question by proposing a functional model of a self-organizing hierarchical memory network. The model is based on hypothetical neuronal mechanisms involved in cortical processing and adaptation. The network architecture comprises two consecutive layers of distributed, recurrently interconnected modules. Each module is identified with a localized cortical cluster of fine-scale excitatory subnetworks. A single module performs competitive unsupervised learning on the incoming afferent signals to form a suitable representation of the locally accessible input space. The network employs an operating scheme where ongoing processing is made of discrete successive fragments termed decision cycles, presumably identifiable with the fast gamma rhythms observed in the cortex. The cycles are synchronized across the distributed modules that produce highly sparse activity within each cycle by instantiating a local winner-take-all-like operation. Equipped with adaptive mechanisms of bidirectional synaptic plasticity and homeostatic activity regulation, the network is exposed to natural face images of different persons. The images are presented incrementally one per cycle to the lower network layer as a set of Gabor filter responses extracted from local facial landmarks. The images are presented without any person identity labels. In the course of unsupervised learning, the network creates simultaneously vocabularies of reusable local face appearance elements, captures relations between the elements by linking associatively those parts that encode the same face identity, develops the higher-order identity symbols for the memorized compositions and projects this information back onto the vocabularies in generative manner. This learning corresponds to the simultaneous formation of bottom-up, lateral and top-down synaptic connectivity within and between the network layers. In the mature connectivity state, the network holds thus full compositional description of the experienced faces in form of sparse memory traces that reside in the feed-forward and recurrent connectivity. Due to the generative nature of the established representation, the network is able to recreate the full compositional description of a memorized face in terms of all its constituent parts given only its higher-order identity symbol or a subset of its parts. In the test phase, the network successfully proves its ability to recognize identity and gender of the persons from alternative face views not shown before. An intriguing feature of the emerging memory network is its ability to self-generate activity spontaneously in absence of the external stimuli. In this sleep-like off-line mode, the network shows a self-sustaining replay of the memory content formed during the previous learning. Remarkably, the recognition performance is tremendously boosted after this off-line memory reprocessing. The performance boost is articulated stronger on those face views that deviate more from the original view shown during the learning. This indicates that the off-line memory reprocessing during the sleep-like state specifically improves the generalization capability of the memory network. The positive effect turns out to be surprisingly independent of synapse-specific plasticity, relying completely on the synapse-unspecific, homeostatic activity regulation across the memory network. The developed network demonstrates thus functionality not shown by any previous neuronal modeling approach. It forms and maintains a memory domain for compositional, generative object representation in unsupervised manner through experience with natural visual images, using both on- ("wake") and off-line ("sleep") learning regimes. This functionality offers a promising departure point for further studies, aiming for deeper insight into the learning mechanisms employed by the brain and their consequent implementation in the artificial adaptive systems for solving complex tasks not tractable so far.