Refine
Document Type
- Article (6)
- Doctoral Thesis (1)
Language
- English (7)
Has Fulltext
- yes (7) (remove)
Is part of the Bibliography
- no (7)
Keywords
- Speech (7) (remove)
Institute
Natural sounds convey perceptually relevant information over multiple timescales, and the necessary extraction of multi-timescale information requires the auditory system to work over distinct ranges. The simplest hypothesis suggests that temporal modulations are encoded in an equivalent manner within a reasonable intermediate range. We show that the human auditory system selectively and preferentially tracks acoustic dynamics concurrently at 2 timescales corresponding to the neurophysiological theta band (4–7 Hz) and gamma band ranges (31–45 Hz) but, contrary to expectation, not at the timescale corresponding to alpha (8–12 Hz), which has also been found to be related to auditory perception. Listeners heard synthetic acoustic stimuli with temporally modulated structures at 3 timescales (approximately 190-, approximately 100-, and approximately 30-ms modulation periods) and identified the stimuli while undergoing magnetoencephalography recording. There was strong intertrial phase coherence in the theta band for stimuli of all modulation rates and in the gamma band for stimuli with corresponding modulation rates. The alpha band did not respond in a similar manner. Classification analyses also revealed that oscillatory phase reliably tracked temporal dynamics but not equivalently across rates. Finally, mutual information analyses quantifying the relation between phase and cochlear-scaled correlations also showed preferential processing in 2 distinct regimes, with the alpha range again yielding different patterns. The results support the hypothesis that the human auditory system employs (at least) a 2-timescale processing mode, in which lower and higher perceptual sampling scales are segregated by an intermediate temporal regime in the alpha band that likely reflects different underlying computations.
The neural processing of speech and music is still a matter of debate. A long tradition that assumes shared processing capacities for the two domains contrasts with views that assume domain-specific processing. We here contribute to this topic by investigating, in a functional magnetic imaging (fMRI) study, ecologically valid stimuli that are identical in wording and differ only in that one group is typically spoken (or silently read), whereas the other is sung: poems and their respective musical settings. We focus on the melodic properties of spoken poems and their sung musical counterparts by looking at proportions of significant autocorrelations (PSA) based on pitch values extracted from their recordings. Following earlier studies, we assumed a bias of poem-processing towards the left and a bias for song-processing on the right hemisphere. Furthermore, PSA values of poems and songs were expected to explain variance in left- vs. right-temporal brain areas, while continuous liking ratings obtained in the scanner should modulate activity in the reward network. Overall, poem processing compared to song processing relied on left temporal regions, including the superior temporal gyrus, whereas song processing compared to poem processing recruited more right temporal areas, including Heschl's gyrus and the superior temporal gyrus. PSA values co-varied with activation in bilateral temporal regions for poems, and in right-dominant fronto-temporal regions for songs. Continuous liking ratings were correlated with activity in the default mode network for both poems and songs. The pattern of results suggests that the neural processing of poems and their musical settings is based on their melodic properties, supported by bilateral temporal auditory areas and an additional right fronto-temporal network known to be implicated in the processing of melodies in songs. These findings take a middle ground in providing evidence for specific processing circuits for speech and music in the left and right hemisphere, but simultaneously for shared processing of melodic aspects of both poems and their musical settings in the right temporal cortex. Thus, we demonstrate the neurobiological plausibility of assuming the importance of melodic properties in spoken and sung aesthetic language alike, along with the involvement of the default mode network in the aesthetic appreciation of these properties.
Pitch peaks tend to be higher at the beginning of longer than shorter sentences (e.g., ‘A farmer is pulling donkeys’ vs ‘A farmer is pulling a donkey and goat’), whereas pitch valleys at the ends of sentences are rather constant for a given speaker. These data seem to imply that speakers avoid dropping their voice pitch too low by planning the height of sentence-initial pitch peaks prior to speaking. However, the length effect on sentence-initial pitch peaks appears to vary across different types of sentences, speakers and languages. Therefore, the notion that speakers plan sentence intonation in advance due to the limitations in low voice pitch leaves part of the data unexplained. Consequently, this study suggests a complementary cognitive account of length-dependent pitch scaling. In particular, it proposes that the sentence-initial pitch raise in long sentences is related to high demands on mental resources during the early stages of sentence planning. To tap into the cognitive underpinnings of planning sentence intonation, this study adopts the methodology of recording eye movements during a picture description task, as the eye movements are the established approximation of the real-time planning processes. Measures of voice pitch (Fundamental Frequency) and incrementality (eye movements) are used to examine the relationship between (verbal) working memory (WM), incrementality of sentence planning and the height of sentence-initial pitch peaks.
Research on the music-language interface has extensively investigated similarities and differences of poetic and musical meter, but largely disregarded melody. Using a measure of melodic structure in music––autocorrelations of sound sequences consisting of discrete pitch and duration values––, we show that individual poems feature distinct and text-driven pitch and duration contours, just like songs and other pieces of music. We conceptualize these recurrent melodic contours as an additional, hitherto unnoticed dimension of parallelistic patterning. Poetic speech melodies are higher order units beyond the level of individual syntactic phrases, and also beyond the levels of individual sentences and verse lines. Importantly, auto-correlation scores for pitch and duration recurrences across stanzas are predictive of how melodious naive listeners perceive the respective poems to be, and how likely these poems were to be set to music by professional composers. Experimentally removing classical parallelistic features characteristic of prototypical poems (rhyme, meter, and others) led to decreased autocorrelation scores of pitches, independent of spoken renditions, along with reduced ratings for perceived melodiousness. This suggests that the higher order parallelistic feature of poetic melody strongly interacts with the other parallelistic patterns of poems. Our discovery of a genuine poetic speech melody has great potential for deepening the understanding of the music-language interface.
Background
Cochlear Implants (CIs) provide near normal speech intelligibility in quiet environments to individuals suffering from sensorineural hearing loss. Perception of speech in situations with competing background noise and especially music appraisal however are still insufficient. Hence, improving speech perception in ambient noise and music intelligibility is a core challenge in CI research. Quantitatively assessing music intelligibility is a demanding task due to its inherently subjective nature. However, previous approaches have related electrophysiological measurements to speech intelligibility, a corresponding relation to music intelligibility, can be assumed. Recent studies have investigated the relation between results obtained from hearing performance tests and Spread of Excitations (SoEs) measurements. SoE functions are acquired by measuring Electrically Evoked Compound Action Potentials (ECAPs) which represent the electrical response generated in the neural structures of the auditory nerve. The parameters designed to describe SoE functions are used to estimate the dispersal of the electric field in the cochlea. The quality of spatial separation of the electrical field generated by adjacent electrodes are assumed to correlate with hearing performance measures.
Aim of study
This study investigated the relation of parameters derived by ECAP measurements and perceptive skills which aim to access the level of speech and music intelligibility in CI users. In addition, the ratings assessed in a questionnaire on self-rated music intelligibility were correlated to a test battery consisting of measures for speech reception threshold (SRT) in noise (Oldenburger Satztest (OLSA)) and music intelligibility (Adaptive Melody-Pattern-Discrimination Test (AMPDT)). We hypothesised that results from this test battery correlated to subjective ratings and measures describing SoE functions.
Methods
The patient collective covered 17 well-experienced bilateral CI listeners (8 females, 9 males) between the age of 14 and 77 years with a minimum CI experience of two years. Music enjoyment and self-rated musicality was evaluated by means of a questionnaire. The AMPDT included two psychoacoustic tests: timbre difference discrimination threshold (TDDT) and background contour discrimination threshold (BCDT). The accentuation of harmonics in a foreground melody created a background melody. Accentuation was realised by sound level increment, frequency detuning and onset asynchrony. Subjects had to detect target intervals comprising both foreground and background melody by discriminating timbre differences in a Three-Interval Three-Alternative Forced-Choice (3I3AFC) procedure. In a One-Interval Two-Alternative Forced-Choice (1I2AFC) procedure, subjects had to classify the background melody’s contour. SoE was measured via a spatial forward-masking paradigm. A basal, medial and apical recording electrode was measured. Probe electrodes were one electrode position apical to the recording. The width of normalised SoE functions was calculated at their 25 % and 50 % level (excitation distance (DIST)). Furthermore, exponential functions were calculated for SoE profiles with more than three data points for each side. The OLSA assessed SRT in noise. The noisy environment was presented through an array of four loudspeakers (MSNF). The Fastl noise-condition allows to make use of gap listening representing the temporal characteristics of speech as a fluctuating noise. The OLnoise-condition is a continuous noise resulting in a maximum portion of masking.
Results
We found that background melody contour classiffication (BCDT) is more challenging to CI users than the detection of small perceptual timbre differences (TDDT). Background melody contour classification was possible with harmonic accentuation by sound level increment whereas accentuation by onset asynchrony was more demanding. CI users failed in background melody contour classification obtained by frequency detuning. SRTs assessed in the OLSA were significantly lower in the OLnoise than in the Fastl noise masking condition. A number of N = 90 SoE functions were acquired from ECAP measurements, in which N = 48 showed a clearly present ECAP response. The DIST at the 25 % and 50 % level was narrower for the basal than for the apical and medial electrode. SoE functions showed asymmetric profiles with larger amplitudes towards the basal end of the cochlea. Correlation analysis between the AMPDT, OLSA and DISTs showed no significant correlation. Correlation analysis between the AMPDT, OLSA and the questionnaire’s results could not prove that musical activities (music listening, singing or playing instruments) improve music intelligibility. However, CI supply has restored the importance of music, self-rated musicality and musical enjoyment in this study’s subjects.
Conclusions
The present study’s results imply that CI listeners are only able to detect distinct timbre alterations throughout the course of a musical piece whereas they cannot discriminate background melodies hidden in a pattern of complex harmonic sounds. Furthermore, SoE measurements do not seem to be an adequate tool to predict neither speech nor music intelligibility in CI listeners, contrary to our initial hypothesis. This finding is consistent with a number of studies who did not find a correlation between music or speech intelligibility and channel interactions assessed by SoE measurements. It can be concluded that albeit CI supply restores musical enjoyment in patients with sensorineural hearing loss, music perception is still poor and does not significantly improve by regular musical activities such as listening to music, singing or playing instruments.
Auditory and visual percepts are integrated even when they are not perfectly temporally aligned with each other, especially when the visual signal precedes the auditory signal. This window of temporal integration for asynchronous audiovisual stimuli is relatively well examined in the case of speech, while other natural action-induced sounds have been widely neglected. Here, we studied the detection of audiovisual asynchrony in three different whole-body actions with natural action-induced sounds–hurdling, tap dancing and drumming. In Study 1, we examined whether audiovisual asynchrony detection, assessed by a simultaneity judgment task, differs as a function of sound production intentionality. Based on previous findings, we expected that auditory and visual signals should be integrated over a wider temporal window for actions creating sounds intentionally (tap dancing), compared to actions creating sounds incidentally (hurdling). While percentages of perceived synchrony differed in the expected way, we identified two further factors, namely high event density and low rhythmicity, to induce higher synchrony ratings as well. Therefore, we systematically varied event density and rhythmicity in Study 2, this time using drumming stimuli to exert full control over these variables, and the same simultaneity judgment tasks. Results suggest that high event density leads to a bias to integrate rather than segregate auditory and visual signals, even at relatively large asynchronies. Rhythmicity had a similar, albeit weaker effect, when event density was low. Our findings demonstrate that shorter asynchronies and visual-first asynchronies lead to higher synchrony ratings of whole-body action, pointing to clear parallels with audiovisual integration in speech perception. Overconfidence in the naturally expected, that is, synchrony of sound and sight, was stronger for intentional (vs. incidental) sound production and for movements with high (vs. low) rhythmicity, presumably because both encourage predictive processes. In contrast, high event density appears to increase synchronicity judgments simply because it makes the detection of audiovisual asynchrony more difficult. More studies using real-life audiovisual stimuli with varying event densities and rhythmicities are needed to fully uncover the general mechanisms of audiovisual integration.
Speech perception is mediated by both left and right auditory cortices but with differential sensitivity to specific acoustic information contained in the speech signal. A detailed description of this functional asymmetry is missing, and the underlying models are widely debated. We analyzed cortical responses from 96 epilepsy patients with electrode implantation in left or right primary, secondary, and/or association auditory cortex (AAC). We presented short acoustic transients to noninvasively estimate the dynamical properties of multiple functional regions along the auditory cortical hierarchy. We show remarkably similar bimodal spectral response profiles in left and right primary and secondary regions, with evoked activity composed of dynamics in the theta (around 4–8 Hz) and beta–gamma (around 15–40 Hz) ranges. Beyond these first cortical levels of auditory processing, a hemispheric asymmetry emerged, with delta and beta band (3/15 Hz) responsivity prevailing in the right hemisphere and theta and gamma band (6/40 Hz) activity prevailing in the left. This asymmetry is also present during syllables presentation, but the evoked responses in AAC are more heterogeneous, with the co-occurrence of alpha (around 10 Hz) and gamma (>25 Hz) activity bilaterally. These intracranial data provide a more fine-grained and nuanced characterization of cortical auditory processing in the 2 hemispheres, shedding light on the neural dynamics that potentially shape auditory and speech processing at different levels of the cortical hierarchy.