Refine
Document Type
- Part of a Book (3)
- Doctoral Thesis (1)
Language
- English (4)
Has Fulltext
- yes (4) (remove)
Is part of the Bibliography
- no (4) (remove)
Keywords
- Sprachwahrnehmung (4) (remove)
Institute
Reduction in natural speech
(2009)
Natural (conversational) speech, compared to cannonical speech, is earmarked by the tremendous amount of variation that often leads to a massive change in pronunciation. Despite many attempts to explain and theorize the variability in conversational speech, its unique characteristics have not played a significant role in linguistic modeling. One of the reasons for variation in natural speech lies in a tendency of speakers to reduce speech, which may drastically alter the phonetic shape of words. Despite the massive loss of information due to reduction, listeners are often able to understand conversational speech even in the presence of background noise. This dissertation investigates two reduction processes, namely regressive place assimilation across word boundaries, and massive reduction and provides novel data from the analyses of speech corpora combined with experimental results from perception studies to reach a better understanding of how humans handle natural speech. The successes and failures of two models dealing with data from natural speech are presented: The FUL-model (Featurally Underspecified Lexicon, Lahiri & Reetz, 2002), and X-MOD (an episodic model, Johnson, 1997). Based on different assumptions, both models make different predictions for the two types of reduction processes under investigation. This dissertation explores the nature and dynamics of these processes in speech production and discusses its consequences for speech perception. More specifically, data from analyses of running speech are presented investigating the amount of reduction that occurs in naturally spoken German. Concerning production, the corpus analysis of regressive place assimilation reveals that it is not an obligatory process. At the same time, there emerges a clear asymmetry: With only very few exceptions, only [coronal] segments undergo assimilation, [labial] and [dorsal] segments usually do not. Furthermore, there seem to be cases of complete neutralization where the underlying Place of Articulation feature has undergone complete assimilation to the Place of Articulation feature of the upcoming segment. Phonetic analyses further underpin these findings. Concerning deletions and massive reductions, the results clearly indicate that phonological rules in the classical generative tradition are not able to explain the reduction patterns attested in conversational speech. Overall, the analyses of deletion and massive reduction in natural speech did not exhibit clear-cut patterns. For a more in-depth examination of reduction factors, the case of final /t/ deletion is examined by means of a new corpus constructed for this purpose. The analysis of this corpus indicates that although phonological context plays an important role on the deletion of segments (i.e. /t/), this arises in the form of tendencies, not absolute conditions. This is true for other deletion processes, too. Concerning speech perception, a crucial part for both models under investigation (X-MOD and FUL) is how listeners handle reduced speech. Five experiments investigate the way reduced speech is perceived by human listeners. Results from two experiments show that regressive place assimilations can be treated as instances of complete neutralizations by German listeners. Concerning massively reduced words, the outcome of transcription and priming experiments suggest that such words are not acceptable candidates of the intended lexical items for listeners in the absence of their proper phrasal context. Overall, the abstractionist FUL-model is found to be superior in explaining the data. While at first sight, X-MOD deals with the production data more readily, FUL provides a better fit for the perception results. Another important finding concerns the role of phonology and phonetics in general. The results presented in this dissertation make a strong case for models, such as FUL, where phonology and phonetics operate at different levels of the mental lexicon, rather than being integrated into one. The findings suggest that phonetic variation is not part of the representation in the mental lexicon.
It is well known that English children between the age of 4 and 6 display a so-called Delay of Principle B Effect (DPBE) in that they allow pronouns to refer to a local c-commanding antecedent. Their guessing pattern with pronouns contrasts with their adult-like interpretation of reflexives. The DPBE has been explained as resulting from a lack of pragmatic knowledge or insufficient cognitive resources. However, such extra-grammatical accounts cannot explain why the DPBE only shows up in particular languages and in particular syntactic environments. Moreover, such accounts fail to explain why the DPBE only emerges in comprehension and not in production. This paper hypothesizes that the presence or absence of the DPBE can be explained from the properties of the grammar. Fischer's (2004) optimality-theoretic analysis of binding, explaining cross-linguistic variation, and Hendriks and Spenader's (2005/6) optimality-theoretic account of the acquisition of pronouns and reflexives are combined into a single model. This model yields testable predictions with respect to the presence or absence of the DPBE in particular languages, in particular syntactic environments, and in comprehension and/or production.
It has been shown that visual cues play a crucial role in the perception of vowels and consonants. Conflicting consonantal stimuli presented in the visual and auditory modalities can even result in the emergence of a third perceptual unit (McGurk effect). From a developmental point of view, several studies report that newborns can associate the image of a face uttering a given vowel to the auditory signal corresponding to this vowel; visual cues are thus used by the newborns. Despite the large number of studies carried out with adult speakers and newborns, very little work has been conducted with preschool-aged children. This contribution is aimed at describing the use of auditory and visual cues by 4 and 5-year-old French Canadian speakers, compared to adult speakers, in the identification of voiced consonants. Audiovisual recordings of a French Canadian speaker uttering the sequences [aba], [ada], [aga], [ava], [ibi], [idi], [igi], [ivi] have been carried out. The acoustic and visual signals have been extracted and analysed so that conflicting and non-conflicting stimuli, between the two modalities, were obtained. The resulting stimuli were presented as a perceptual test to eight 4 and 5-year-old French Canadian speakers and ten adults in three conditions: visual-only, auditory-only, and audiovisual. Results show that, even though the visual cues have a significant effect on the identification of the stimuli for adults and children, children are less sensitive to visual cues in the audiovisual condition. Such results shed light on the role of multimodal perception in the emergence and the refinement of the phonological system in children.
It has been hypothesized that sounds which are less perceptible are more likely to be altered than more salient sounds, the rationale being that the loss of information resulting from a change in a sound which is difficult to perceive is not as great as the loss resulting from a change in a more salient sound. Kohler (1990) suggested that the tendency to reduce articulatory movements is countered by perceptual and social constraints, finding that fricatives are relatively resistant to reduction in colloquial German. Kohler hypothesized that this is due to the perceptual salience of fricatives, a hypothesis which was supported by the results of a perception experiment by Hura, Lindblom, and Diehl (1992). These studies showed that the relative salience of speech sounds is relevant to explaining phonological behavior. An additional factor is the impact of different acoustic environments on the perceptibility of speech sounds. Steriade (1997) found that voicing contrasts are more common in positions where more cues to voicing are available. The P-map, proposed by Steriade (2001a, b), allows the representation of varying salience of segments in different contexts. Many researchers have posited a relationship between speech perception and phonology. The purpose of this paper is to provide experimental evidence for this relationship, drawing on the case of Turkish /h/ deletion.