Refine
Document Type
- Article (1)
- Doctoral Thesis (1)
- Preprint (1)
Language
- English (3)
Has Fulltext
- yes (3)
Is part of the Bibliography
- no (3)
Keywords
- cognitive psychology (1)
- deep neural network models (1)
- fMRI/EEG (1)
- human (1)
- memory (1)
- multivariate pattern analysis (1)
- neuroscience (1)
- real-world structure (1)
- scene representation (1)
- vision (1)
Institute
With every glimpse of our eyes, we sample only a small and incomplete fragment of the visual world, which needs to be contextualized and integrated into a coherent scene representation. Here we show that the visual system achieves this contextualization by exploiting spatial schemata, that is our knowledge about the composition of natural scenes. We measured fMRI and EEG responses to incomplete scene fragments and used representational similarity analysis to reconstruct their cortical representations in space and time. We observed a sorting of representations according to the fragments' place within the scene schema, which occurred during perceptual analysis in the occipital place area and within the first 200 ms of vision. This schema-based coding operates flexibly across visual features (as measured by a deep neural network model) and different types of environments (indoor and outdoor scenes). This flexibility highlights the mechanism's ability to efficiently organize incoming information under dynamic real-world conditions.
Dual coding theories of knowledge suggest that meaning is represented in the brain by a double code, which comprises language-derived representations in the Anterior Temporal Lobe and sensory-derived representations in perceptual and motor regions. This approach predicts that concrete semantic features should activate both codes, whereas abstract features rely exclusively on the linguistic code. Using magnetoencephalography (MEG), we adopted a temporally resolved multiple regression approach to identify the contribution of abstract and concrete semantic predictors to the underlying brain signal. Results evidenced early involvement of anterior-temporal and inferior-frontal brain areas in both abstract and concrete semantic information encoding. At later stages, occipito-temporal regions showed greater responses to concrete compared to abstract features. The present findings shed new light on the temporal dynamics of abstract and concrete semantic representations in the brain and suggest that the concreteness of words processed first with a transmodal/linguistic code, housed in frontotemporal brain systems, and only after with an imagistic/sensorimotor code in perceptual and motor regions.
Our mind has the function of representing the physical and social world we are in, so that we can efficiently interact with it. This results in a constant and dynamic interaction between mind and world that produces a balance when representations are at the same time accurate with respect to what the world is communicating to our organism, but also compatible with how our mind works.
A paradigmatic case of this interaction is offered by perception, which is the mental function that represents contingent aspects of the world built from what is captured by our senses. Indeed, the dominant philosophical view in cognitive science is that our perceptual states are representations of the world and not direct access to that world. These representational perceptual states therefor include the aspects of the world they represent and that initiate the perception by stimulating our sensory organs.
Perceptual representations are built using information from the sensory system, i.e., bottom-up information, but are also integrated with information previously acquired, i.e., top-down information, so that perception interacts with memory through language and other mental functions. Such organization is believed to reflect a general mechanism of our mind/brain, which is to acquire and use information to make efficient predictions about the future, continuously updating older information with present information.
This predictive processing works because the world is not random, but shows a regular structure from which reliable expectations can be built. One way that our minds make these predictions is by adapting to the structure of the world in an implicit, automatic and unconscious way, a process that has been called Implicit Statistical Learning (ISL). ISL is a learning process that does not require awareness and happens in an incidental and spontaneous way, with mere exposure to statistical regularities of the world. It is what happens when we learn a language during early childhood, and that allows us to be implicitly sensitive to the phonological structure of speech, or to associate speech patterns with objects and events to learn word meaning.
A specific case of ISL is the learning of spatial configuration in the visual world, which we apply to abstract arrays of items, but most importantly, also to more ecological settings such as the visual scenes we are immersed in during our everyday life. The knowledge we acquire about the structure of visual scenes has been called “Scene Grammar”, because it informs about presence and position of objects in a similar way to what linguistic grammar tells us about the presence and position of words. So, we implicitly acquire the semantics of scenes, learning which objects are consistent with a certain scene, as well as the syntax of scenes, learning where objects are positioned in a consistent way within a certain scene.
More recent developments have proposed that scene grammar knowledge might be organized based on a hierarchical system: objects are arranged in the scene, which offers the more general context, but within a scene we can identify different spatial and functional clusters of objects, called “phrases”, that offer a second level of context; within every phrase, then, objects have different status, with usually one object (“anchor object”) offering strong prediction of where and which are the other objects within the phrase (“local objects”). However, these further aspects of the organization of objects In scenes remain poorly understood.
Another problem relates to the way we measure the structure of scenes to compare the organization of the visual world with the organization in the mind. Typically, to decide if an object appears or not in a certain scene, and whether or not it appears in a certain position within a scene, researchers based their decision on intuition and common-sense, maybe validating those decisions with independent raters. But it has been shown that often these decisions can be limited and more complex information about objects’ arrangement in scenes can be lost.
A potential solution to this problem might be using large set of real-world images, that have annotations and segmentations of objects, to measures statistics about how objects are arranged in the environment. This idea exploits the nowadays larger availability of this kind of datasets due to increasing developments of computer vision algorithms, and also parallels with the established usage of large text corpora in language research.
The goals of the current investigation were to extract object statistics from this image datasets and test if they reliably predict behavioural responses during object processing, as well as to use these statistics to investigate more complex aspects of scene grammar, such as its hierarchical organization, to see if this organization is reflected in the organization of objects in our mind.