Refine
Document Type
- Article (4)
- Doctoral Thesis (1)
Language
- English (5)
Has Fulltext
- yes (5)
Is part of the Bibliography
- no (5)
Keywords
- Areas of Interest (1)
- Clinical groups (1)
- Collaboration (1)
- Competition (1)
- Data quality (1)
- Developmental studies (1)
- Electroencephalography – EEG (1)
- Eye movements (1)
- Eye-tracking (1)
- Head movement (1)
Institute
- Psychologie (5)
Objects that are semantically related to the visual scene context are typically better recognized than unrelated objects. While context effects on object recognition are well studied, the question which particular visual information of an object’s surroundings modulates its semantic processing is still unresolved. Typically, one would expect contextual influences to arise from high-level, semantic components of a scene but what if even low-level features could modulate object processing? Here, we generated seemingly meaningless textures of real-world scenes, which preserved similar summary statistics but discarded spatial layout information. In Experiment 1, participants categorized such textures better than colour controls that lacked higher-order scene statistics while original scenes resulted in the highest performance. In Experiment 2, participants recognized briefly presented consistent objects on scenes significantly better than inconsistent objects, whereas on textures, consistent objects were recognized only slightly more accurately. In Experiment 3, we recorded event-related potentials and observed a pronounced mid-central negativity in the N300/N400 time windows for inconsistent relative to consistent objects on scenes. Critically, inconsistent objects on textures also triggered N300/N400 effects with a comparable time course, though less pronounced. Our results suggest that a scene’s low-level features contribute to the effective processing of objects in complex real-world environments.
The marketing materials of remote eye-trackers suggest that data quality is invariant to the position and orientation of the participant as long as the eyes of the participant are within the eye-tracker’s headbox, the area where tracking is possible. As such, remote eye-trackers are marketed as allowing the reliable recording of gaze from participant groups that cannot be restrained, such as infants, schoolchildren and patients with muscular or brain disorders. Practical experience and previous research, however, tells us that eye-tracking data quality, e.g. the accuracy of the recorded gaze position and the amount of data loss, deteriorates (compared to well-trained participants in chinrests) when the participant is unrestrained and assumes a non-optimal pose in front of the eye-tracker. How then can researchers working with unrestrained participants choose an eye-tracker? Here we investigated the performance of five popular remote eye-trackers from EyeTribe, SMI, SR Research, and Tobii in a series of tasks where participants took on non-optimal poses. We report that the tested systems varied in the amount of data loss and systematic offsets observed during our tasks. The EyeLink and EyeTribe in particular had large problems. Furthermore, the Tobii eye-trackers reported data for two eyes when only one eye was visible to the eye-tracker. This study provides practical insight into how popular remote eye-trackers perform when recording from unrestrained participants. It furthermore provides a testing method for evaluating whether a tracker is suitable for studying a certain target population, and that manufacturers can use during the development of new eye-trackers.
Object recognition is such an everyday task it seems almost mundane. We look at the spaces around us and name things seemingly effortlessly. Yet understanding how the process of object recognition unfolds is a great challenge to vision science. Models derived from abstract stimuli have little predictive power for the way people explore "naturalistic" scenes and the objects in them. Naturalistic here refers to unaltered photographs of real scenes. This thesis therefore focusses on the process of recognition of the objects in such naturalistic scenes. People can, for instance, find objects in scenes much more efficiently than models derived from abstract stimuli would predict. To explain this kind of behavior, we describe scenes not solely in terms of physical characteristics (colors, contrasts, lines, orientations, etc.) but by the meaning of the whole scene (kitchen, street, bathroom, etc.) and of the objects within the scene (oven, fire hydrant, soap, etc.). Object recognition now refers to the process of the visual system assigning meaning to the object.
The relationship between objects in a naturalistic scene is far from random. Objects do not typically float in mid-air and cannot take up the same physical space. Moreover, certain scenes typically contain certain objects. A fire hydrant in the kitchen would seem like an anomaly to the average observer. These "rules" can be described as the "grammar" of the scene. Scene grammar is involved in multiple aspects of scene- and object perception. There is, for instance, evidence that overall scene category influences identification of individual objects. Typically, experiments that directly target object recognition do not involve eye movements and studies that involve eye movements are not directly aimed at object recognition, but at gaze allocation. But eye movements are abundant in everyday life, they happen roughly 4 times per second. Here we therefore present two studies that use eye movements to investigate when object recognition takes place while people move their eyes from object to object in a scene. The third study is aimed at the application of novel methods for analyzing data from combined eye movement and neurophysiology (EEG) measurements.
One way to study object perception is to violate the grammar of a scene by placing an object in a scene it does not typically occur in and measuring how long people look at the so-called semantic inconsistency, compared to an object that one would expect in the given scene. Typically, people look at semantic inconsistencies longer and more often, signaling that it requires extra processing. In Study 1 we make use of this behavior to ask whether object recognition still happens when it is not necessary for the task. We designed a search task that made it unnecessary to register object identities. Still, participants looked at the inconsistent objects longer than consistent objects, signaling they did indeed process object and scene identities. Interestingly, the inconsistent objects were not remembered better than the consistent ones. We conclude that object and scene identities (their semantics) are processed in an obligatory fashion; when people are involved in a task that does not require it. In Study 2, we investigate more closely when the first signs of object semantic processing are visible while people make eye movements.
Although the finding that semantic inconsistencies are looked at longer and more often has been replicated often, many of these replications look at gaze duration over a whole trial. The question when during a trial differences between consistencies occur, has yielded mixed results. Some studies only report effects of semantic consistency that accumulate over whole trials, whereas others report influences already on the duration of the very first fixations on inconsistent objects. In study 2 we argue that prior studies reporting first fixation duration may have suffered from methodological shortcomings, such as low trial- and sample sizes, in addition to the use of non-robust statistics and data descriptions. We show that a subset of fixations may be influenced more than others (as is indicated by more skewed fixation duration distributions). Further analyses show that the relationship between the effect of object semantics on fixation durations and its effect on oft replicated cumulative measures is not straightforward (fixation duration distributions do not predict dwell effects) but the effect on both measures may be related in a different way. Possibly, the processing of object meaning unfolds over multiple fixations, only when one fixation does not suffice. However, it would be very valuable to be able to study how processing continues, after a fixation ends.
Study 3 aims to make such a measure possible by combining EEG recordings with eye tracking measurements. Difficulties in analyzing eye tracking–EEG data exist because neural responses vary with different eye movements characteristics. Moreover, fixations follow one another in short succession, causing neural responses to each fixation to overlap in time. These issues make the well-established approach of averaging single trial EEG data into ERPs problematic. As an alternative, we propose the use of multiple regression, explicitly modelling both temporal overlap and eye movement parameters. In Study 3 we show that such a method successfully estimates the influence of covariates it is meant to control for. Moreover, we discuss and explore what additional covariates may be modeled and in what way, in order to obtain confound-free estimates of EEG differences between conditions. One important finding is that stimulus properties of physically variable stimuli such as complex scenes, can influence EEG signals and deserve close consideration during experimental design or modelling efforts. Overall, the method compares favorably to averaging methods.
From the studies in this thesis, we directly learn that object recognition is a process that happens in an obligatory fashion, when the task does not require it. We also learn that only a subset of first fixations to objects are affected by the processing of object meaning and its fit to its surroundings. Comparison between first fixation and first dwell effects suggest that, in active vision, object semantics processing sometimes unfolds over multiple fixations. And finally, we learn that regression-based methods for combined eye tracking-EEG analysis provide a plausible way forward for investigating how object recognition unfolds in active vision.
Although in real life people frequently perform visual search together, in lab experiments this social dimension is typically left out. Here, we investigate individual, collaborative and competitive visual search with visualization of search partners’ gaze. Participants were instructed to search a grid of Gabor patches while being eye tracked. For collaboration and competition, searchers were shown in real time at which element the paired searcher was looking. To promote collaboration or competition, points were rewarded or deducted for correct or incorrect answers. Early in collaboration trials, searchers rarely fixated the same elements. Reaction times of couples were roughly halved compared with individual search, although error rates did not increase. This indicates searchers formed an efficient collaboration strategy. Overlap, the proportion of dwells that landed on hexagons that the other searcher had already looked at, was lower than expected from simulated overlap of two searchers who are blind to the behavior of their partner. The proportion of overlapping dwells correlated positively with ratings of the quality of collaboration. During competition, overlap increased earlier in time, indicating that competitors divided space less efficiently. Analysis of the entropy of the dwell locations and scan paths revealed that in the competition condition, a less fixed looking pattern was exhibited than in the collaborate and individual search conditions. We conclude that participants can efficiently search together when provided only with information about their partner’s gaze position by dividing up the search space. Competing search exhibited more random gaze patterns, potentially reflecting increased interaction between searchers in this condition.
When mapping eye-movement behavior to the visual information presented to an observer, Areas of Interest (AOIs) are commonly employed. For static stimuli (screen without moving elements), this requires that one AOI set is constructed for each stimulus, a possibility in most eye-tracker manufacturers' software. For moving stimuli (screens with moving elements), however, it is often a time-consuming process, as AOIs have to be constructed for each video frame. A popular use-case for such moving AOIs is to study gaze behavior to moving faces. Although it is technically possible to construct AOIs automatically, the standard in this field is still manual AOI construction. This is likely due to the fact that automatic AOI-construction methods are (1) technically complex, or (2) not effective enough for empirical research. To aid researchers in this field, we present and validate a method that automatically achieves AOI construction for videos containing a face. The fully-automatic method uses an open-source toolbox for facial landmark detection, and a Voronoi-based AOI-construction method. We compared the position of AOIs obtained using our new method, and the eye-tracking measures derived from it, to a recently published semi-automatic method. The differences between the two methods were negligible. The presented method is therefore both effective (as effective as previous methods), and efficient; no researcher time is needed for AOI construction. The software is freely available from https://osf.io/zgmch/.