Linguistik
Refine
Year of publication
Document Type
- Conference Proceeding (106) (remove)
Language
- English (106) (remove)
Has Fulltext
- yes (106)
Is part of the Bibliography
- no (106)
Keywords
- Computerlinguistik (17)
- Informationsstruktur (16)
- Phonetik (12)
- Japanisch (9)
- Englisch (7)
- Grammatik (7)
- Maschinelle Übersetzung (6)
- Nungisch (6)
- Tibetobirmanische Sprachen (6)
- Deutsch (5)
Institute
Research on dialectal varieties was for a long time concentrated on phonetic aspects of language. While there was a lot of work done on segmental aspects, suprasegmentals remained unexploited until the last few years, despite the fact that prosody was remarked as a salient aspect of dialectal variants by linguists and by naive speakers. Actual research on dialectal prosody in the German speaking area often deals with discourse analytic methods, correlating intonations curves with communicative functions (P. Auer et al. 2000, P. Gilles & R. Schrambke 2000, R. Kehrein & S. Rabanus 2001). The project I present here has another focus. It looks at general prosodic aspects, abstracted from actual situations. These global structures are modelled and integrated in a speech synthesis system. Today, mostly intonation is being investigated. However, rhythm, the temporal organisation of speech, is not a core of actual research on prosody. But there is evidence that temporal organisation is one of the main structuring elements of speech (B. Zellner 1998, B. Zellner Keller 2002). Following this approach developed for speech synthesis, I will present the modelling of the timing of two Swiss German dialects (Bernese and Zurich dialect) that are considered quite different on the prosodic level. These models are part of the project on the "development of basic knowledge for research on Swiss German prosody by means of speech synthesis modelling" founded by the Swiss National Science Foundation.
A model is proposed that interprets a variety of connected speech processes as resulting from prosodic modulations at different tiers of functional speech motor control along the hypo-hyper dimension [10]. The general background of the model is given by the trichotomy of A-, B- and C-prosodic phenomena [15] that together constitute the acoustic makeup of any speech utterance (with regard to their respective time domains at the uttarance/phrase level, the syllabic level and the segmental level).
This paper is an inductive look at the constituents found in a randomly selected Tagalog text, Bob Ong’s Alamat ng Gubat (Makati City, MM: Visual Print Enterprises, 2004). The analysis is based on the full text, but we will only be able to go through the first few lines of the text here, which we will do one by one, and discuss the structures found in each line of the text in bullet format after the relevant line. At the end of the paper we will bring up some important questions about the structures found in Tagalog based on this text.
We present an effort for the development of multilingual named entity grammars in a unification-based finite-state formalism (SProUT). Following an extended version of the MUC7 standard, we have developed Named Entity Recognition grammars for German, Chinese, Japanese, French, Spanish, English, and Czech. The grammars recognize person names, organizations, geographical locations, currency, time and date expressions. Subgrammars and gazetteers are shared as much as possible for the grammars of the different languages. Multilingual corpora from the business domain are used for grammar development and evaluation. The annotation format (named entity and other linguistic information) is described. We present an evaluation tool which provides detailed statistics and diagnostics, allows for partial matching of annotations, and supports user-defined mappings between different annotation and grammar output formats.
In this paper we show an approach to the customization of GermaNet to the German HPSG grammar lexicon developed in the Verbmobil project. GermaNet has a broad coverage of the German base vocabulary and fine-grained semantic classification; while the HPSG grammar lexicon is comparatively small und has a coarse-grained semantic classification. In our approach, we have developed a mapping algorithm to relate the synsets in GermaNet with the semantic sorts in HPSG. The evaluation result shows that this approach is useful for the lexical extension of our deep grammar development to cope with real-world text understanding.
What role does language play in the development of numerical cognition? In the present paper I argue that the evolution of symbolic thinking (as a basis for language) laid the grounds for the emergence of a systematic concept of number. This concept is grounded in the notion of an infinite sequence and encompasses number assignments that can focus on cardinal aspects ("three pencils"), ordinal aspects ("the third runner"), and even nominal aspects ("bus #3"). I show that these number assignments are based on a specific association of relational structures, and that it is the human language faculty that provides a cognitive paradigm for such an association, suggesting that language played a pivotal role in the evolution of systematic numerical cognition.
This paper advances a purely presuppositional analysis of intonation. I first show that a inspiring recent article by Geurts and van der Sandt (Theoretical Linguistics, 2004) that pursues the same goal cannot account for multiple foci. Then, I show that if it is assumed that destressed rather than focussed material is semantically marked, multiple foci are accounted for correctly.
Twenty years ago I discussed the oldest isoglosses in the South Slavic linguistic area (1982). Subscribing to Van Wijk’s view that the bundle of isoglosses which separates Bulgarian from Serbo-Croatian was the result of an early split in South Slavic and that the transitional dialects originated from a later mixture of Serbian and Bulgarian dialects when the contact between the two languages had been restored (1927), I argued that the shared innovations of Bulgarian and Serbo-Croatian must be dated to a period when the dialects were still spoken in the original Trans-Carpathian homeland of the Slavs. I concluded that there is no evidence for common innovations of South Slavic which were posterior to the end of what I have called the Late Middle Slavic period, which I dated to the 4th through 6th centuries AD. At that time, the major dialect divisions of Slavic were already established.
We present a broad coverage Japanese grammar written in the HPSG formalism with MRS semantics. The grammar is created for use in real world applications, such that robustness and performance issues play an important role. It is connected to a POS tagging and word segmentation tool. This grammar is being developed in a multilingual context, requiring MRS structures that are easily comparable across languages.
Evaluating phonological status : significance of paradigm uniformity vs. prosodic group effects
(2007)
A central concern of linguistic phonetics is to define criteria for determining the phonological status of sounds or sound properties observed in phonetic surface form. Based on acoustic measurements we show that the occurrence of syllabic sonorants vs. schwa-sonorant sequences in German is determined exclusively by segmental and prosodic structure, with no paradigm uniformity effects. We argue that these findings are consistent with a uniform representation of syllabic sonorants as schwa sonorant sequences in the lexicon. The stability of schwa in CVC-suffixes (e.g. the German diminutive suffix -chen), as opposed to its phonetic absence in a segmentally comparable underived context, is argued to be conditioned by the prosodic organisation of such suffixes external to the phonological word of the stem.
Expletives as features
(2000)
Expletives have always been a central topic of theoretical debate and subject to different analyses within the different stages of the Principles and Parameter theory (see Chomsky 1981, 1986, 1995; Lasnik 1992, 1995; Frampton and Gutman 1997; among others). However, most analyses center on the question how to explain the behavior of expletives in A-chains (such as there in English or Þad in Icelandic). No account relates wh-expletives (as one finds them in so-called partial wh-movement constructions in languages such as Hungarian, Romani, and German) to expletives in Achains. In this paper, I argue that the framework of the Minimalist Program opens up the possibility of accounting for expletive-associate relations in A-/A'-chains in a unified manner. The main idea of the unitary analysis is that an expletive is an overtly realized feature bundle that is (sub)extracted from its associate DP. There in an expletive-associate chain is a moved D-feature which orginates inside the associate DP. Similarily, in A'-chains, the whexpletive originates as a focus-/wh-feature in the wh-phrase with which it is associated. This analysis provides evidence for the feature-checking theory in Chomsky (1995). The paper is organized as follows. Section 2 contains the discussion of expletive there. In section 3 I suggest an analysis for whexpletives, and I also explore whether this analysis can be extended to relations between X°-categories such as auxiliary and participle complexes.
This article examines the expression of natural gender in Icelandic nouns denoting human beings. Particular attention will be paid to the system's symmetry with regards to nouns denoting women and men. Our society consists more or less exactly of half women and half men. One would therefore assume that systems for terms denoting persons would also be symmetrically organised. Yet this assumption could not be further from the truth, and not just in single isolated cases, but in many languages: I will attempt to show that Icelandic has numerous methods for referring to women, but also many barriers and idiosyncrasies.
In our presentation we will outline the verb system of Lelemi and concentrate on certain “focal” aspects which are of primary interest to us. Lelemi has two TAMP paradigms: one constituting the so-called “simple tenses”, the other the so-called “relative tenses” (Allan 1973), although not every “simple tense” has a counterpart in the “relative tenses”. The simple paradigm is formed by subject prefixes (prefixed pronouns for 1st or 2nd person and noun class pronouns for 3rd persons) and the verb form whereas the relative paradigm is build up by the obligatory use of an external subject noun, an invariable verb prefix, and the verb form. While the simple paradigm is used in quite a lot of syntactic environments the relative paradigm only shows up in relative clauses with the subject being the head as well as in subject and sentence focus constructions including questions concerning the subject. We will show some interesting interactions between the grammatical expression of focus and the verb system and sketch the grammaticalisation path of the morpheme nà.
Focus expressions in Foodo
(2006)
Focus expressions in Yom
(2005)
Focus in Gur and Kwa
(2006)
The project investigates focus phenomena in the two genetically relatedWest African Gur and Kwa language groups of the Niger-Congo phylum. Most of its members are tone languages, they are similar with respect to word order typology (all are SVO languages), but of divergent morphological type (agglutinating Gur versus isolating Kwa).
0. Introduction 1. Observations concerning the structure of morphosyntactically marked focus constructions 1.1 First observation: SF vs. NSF asymmetry 1.2 Second observation: NSF-NAR parallelism 1.3 Affirmative ex-situ focus constructions (SF, NSF), and narrative clauses (NAR) 2. Grammaticalization 2.1 Cleft hypothesis 2.2 Movement hypothesis 2.3 Narrative hypothesis 2.3.1 Back- or Foregrounding? 2.3.2 Converse directionality of FM and conjunction 3. Language specific analysis 4. Conclusionary remarks References
This demo abstract describes the SmartWeb Ontology-based Information Extraction System (SOBIE). A key feature of SOBIE is that all information is extracted and stored with respect to the SmartWeb ontology. In this way, other components of the systems, which use the same ontology, can access this information in a straightforward way. We will show how information extracted by SOBIE is visualized within its original context, thus enhancing the browsing experience of the end user.
Guess how?
(1996)
Japanese is often taken to be strictly head-final in its syntax. In our work on a broad-coverage, precision implemented HPSG for Japanese, we have found that while this is generally true, there are nonetheless a few minor exceptions to the broad trend. In this paper, we describe the grammar engineering project, present the exceptions we have found, and conclude that this kind of phenomenon motivates on the one hand the HPSG type hierarchical approach which allows for the statement of both broad generalizations and exceptions to those generalizations and on the other hand the usefulness of grammar engineering as a means of testing linguistic hypotheses.
Rawang (Rvwàng) is a Tibeto-Burman language spoken in the far north of Myanmar (Burma), and is closely related to the Dulong language spoken in China. Rawang manifests a kind of hierarchical person marking on the predicate which marks first person primarily (in several different ways - suffixes, change of final consonant, vowel length - and up to five times within one verb complex), and second person indirectly with a sort of marking similar to the inverse marking found in some North American languages: it appears when there is a first person participant, but that referent is not the actor, and when the second person is a participant. This system is quite different from those that reflect semantic role (e.g. Qiang) or grammatical relations (e.g. English).
This article discusses the divergent status of the two particles lé and lá in the grammar of Konkomba, a Gur language (Niger-Congo) of the Gurma subgroup. While previous studies claim that both particles are focus markers, this author argues that only the particle lá should be analyzed as a pure pragmatic device. Distributional studies suggest that the use of particle lé, on the other hand, is only required under specific focus conditions, and primarily represents a syntactic device.
Hybrid robust deep and shallow semantic processing for creativity support in document production
(2004)
The research performed in the DeepThought project (http://www.project-deepthought.net) aims at demonstrating the potential of deep linguistic processing if added to existing shallow methods that ensure robustness. Classical information retrieval is extended by high precision concept indexing and relation detection. We use this approach to demonstrate the feasibility of three ambitious applications, one of which is a tool for creativity support in document production and collective brainstorming. This application is described in detail in this paper. Common to all three applications, and the basis for their development is a platform for integrated linguistic processing. This platform is based on a generic software architecture that combines multiple NLP components and on robust minimal recursive semantics (RMRS) as a uniform representation language.
While the sortal constraints associated with Japanese numeral classifiers are wellstudied, less attention has been paid to the details of their syntax. We describe an analysis implemented within a broadcoverage HPSG that handles an intricate set of numeral classifier construction types and compositionally relates each to an appropriate semantic representation, using Minimal Recursion Semantics.
We present a solution for the representation of Japanese honorifical information in the HPSG framework. Basically, there are three dimensions of honorification. We show that a treatment is necessary that involves both the syntactic and the contextual level of information. The japanese grammar is part of a machine translation system.
In linguistics and the philosophy of language, the mass/count distinction has traditionally been regarded as a bi-partition on the nominal domain, where typical instances are nouns like "beef" (mass) vs."cow" (count). In the present paper, we argue that this partition reveals a system that is based on both syntactic features and conceptual features, and present experimental evidence suggesting that the discrimination of the two kinds of features has a psychological reality.
Experimental data shows that adult learners of an artificial language with a phonotactic restriction learned this restriction better when being trained on word types (e.g. when they were presented with 80 different words twice each) than when being trained on word tokens (e.g. when presented with 40 different words four times each) (Hamann & Ernestus submitted). These findings support Pierrehumbert’s (2003) observation that phonotactic co-occurrence restrictions are formed across lexical entries, since only lexical levels of representation can be sensitive to type frequencies.
Several articulatory strategies are available during the production of /u/, all resulting in a similar acoustic output. /u/ has two main constrictions, at the velum and at the lips. A perturbation of either constriction can be compensated at the other one, e.g wider constriction at the velum by more lip protrusion, wider lip opening by more tongue retraction. This study investigates whether speakers use this relation under perturbation. Six speakers were provided with palatal prostheses which were worn for two weeks. Speakers were instructed to make a serious attempt to produce normal speech. Their speech was recorded via EMA and acoustics several times over the adaptation period. Formant values of /u/-productions were measured. Velar constriction width and lip protrusion were estimated. For four speakers a correlation between constriction width and lip protrusion was found. A negative correlation between lip protrusion and F1 or F2 could sometimes be observed, but no correlation occurred between constriction size and either of the formants. The results show that under perturbation speakers use motor equivalent strategies in order to adapt. The correlation between constriction size and lip protrusion is stronger than in studies investigating unperturbed speech. This could be because under perturbation speakers are inclined to try out several strategies in order to reach the acoustic target and the co-variability might thus be greater.
A common topic in recent literature on phonology is the question of whether phonological processes and segments are licensed by prosodic position or by perceptual cues. The former is the traditional view, as represented by e.g. Lombardi (1995) and Beckman (1998), and holds that segments occur in specific prosodic positions such as the coda. In a licensing by cue approach, as represented by Steriade (1995, 1999), on the other hand, segments are assumed to occur in those positions only where their perceptual cues are prominent, independent of the prosodic position. In positions where the cues are not salient, neutralization occurs.
Friedrich Schlegel's lasting contribution to linguistics is usually seen in the impact that his book "Über die Sprache und Weisheit der Indier" from 1808 left on comparative linguistics and on the study of Sanskrit. Schlegel was one of the first European scholars to have studied Sanskrit extensively and he made a number of translations of Sanskrit literature into German which make up one third of "Über die Sprache und Weisheit der Indier". Schlegel's book is widely regarded as a founding document both of comparative linguistics and of indology, a fact which is quite remarkable in light of the development of Schlegel's thought after this text. His interest in Indian studies ceased more or less directly with the publication of this work, while his thoughts on language became more and more suffused by transcendental philosophy.
This paper provides an analysis of an alternative strategy to A´-movement in both German and Dutch where the extracted constituent is preceded by a preposition and a coreferential pronoun appears in the extraction site. The construction has properties of both binding and movement: Whereas reconstruction effects suggest movement out of the embedded clause, there is strong evidence that the operator constituent is linked to an A-position in the matrix clause; this paradox is resolved by assuming a Control-like approach that involves movement from the embedded clause into a theta-position in the matrix clause with subsequent short A´- movement. The coreferential pronoun is interpreted as a resumptive heading a Big-DP which hosts the antecedent in its specifier.
In this paper I show that Clitic Climbing (CC) in Spanish and Long Scrambling (LS) in German (and Polish) are (im-)possible out of the same environments. For an explanation of this fact I propose a feature-oriented analysis of incorporation phenomena. The idea is that restructuring is a phenomenon of syntactic incorporation. In German and Polish, Agro incorporates covertly into the matrix clause and licenses LS out of the infinitival into the matrix clause. Similarily the clitic in Spanish, which is analysed as an Agro-head, incorporates into the matrix clause. I argue that this movement is necessary for reasons of feature-checking, i. e. for checking of an [+R]- or Restructuring-feature. In section 2 I discuss several differences between CC and LS. For example, the proposed analysis correctly predicts that clitics in contrast to scrambled phrases are subject to several serialization restrictions. Throughout the paper I use the term restructuring only in a descriptive sense, in order to describe the phenomenon in question.
In terms of the direction of development, I referred to Johanna Nichols' work on head-marking vs. dependant marking. Nichols did not make reference to any languages in Tibeto-Burman, but all of the Tibeto-Burman languages that do not have verb agreement systems are solidly dependent-marking (i.e., they have marking on the nouns for case or pragmatic function); those languages with verb agreement systems, a type of head marking, also have many dependent-marking features (of the same types as the non-pronominalized languages). The question, then, is which is older, the dependent-marking type or the headmarking (actually mixed) type?
In this paper we describe SOBA, a sub-component of the SmartWeb multi-modal dialog system. SOBA is a component for ontologybased information extraction from soccer web pages for automatic population of a knowledge base that can be used for domainspecific question answering. SOBA realizes a tight connection between the ontology, knowledge base and the information extraction component. The originality of SOBA is in the fact that it extracts information from heterogeneous sources such as tabular structures, text and image captions in a semantically integrated way. In particular, it stores extracted information in a knowledge base, and in turn uses the knowledge base to interpret and link newly extracted information with respect to already existing entities.
The Deep Linguistic Processing with HPSG Initiative (DELH-IN) provides the infrastructure needed to produce open-source semantic transfer-based machine translation systems. We have made available a prototype Japanese-English machine translation system built from existing resources include parsers, generators, bidirectional grammars and a transfer engine.
Based on a detailed case study of parallel grammar development distributed across two sites, we review some of the requirements for regression testing in grammar engineering, summarize our approach to systematic competence and performance profiling, and discuss our experience with grammar development for a commercial application. If possible, the workshop presentation will be organized around a software demonstration.
On the basis of perceptual experiments we show that alveolo-palatal fricatives and palatalized post-alveolars are two separate sounds which are distinguished not only by Polish native speakers but also by German ones. This claim is partly attested by centre of gravity measurements of the two sibilants. In this paper we revise the claim made by Halle & Stevens [1] and Maddieson & Ladefoged [2] that the Polish alveolo-palatal fricatives [˛, ¸] are palatalized postalveolars [SJ, ZJ]. On the basis of perceptual experiments we show that alveolo-palatal fricatives and palatalized post-alveolars are two separate sounds which are distinguished not only by Polish native speakers but also by German ones. This claim is partly attested by centre of gravity measurements of the two sibilants.
This paper examines the development of periphrastic constructions involving auxiliary "have" and "be" with a past participle in the history of English, on the basis of parsed electronic corpora. It is argued that the two constructions represented distinct syntactic and semantic structures: while the one with have developed into a true perfect in the course of Middle English, the one with be remained a stative resultative throughout its history. In this way, it is explained why the be construction was rarely or never used in a number of contexts, including past counterfactuals, iteratives, duratives, certain kinds of infinitives and various other utterance types that cannot be characterized as perfects of result. When the construction with have became a true perfect, it was used in such contexts, regardless of the identity of the main verb, leading to the appearance of have with verbs like come which had previously only taken be. Crucially, however, have was not spreading at the expense of be, as the be perfect had never been used in such contexts, but rather at the expense of the old simple past. At least until the end of the Early Modern English period, the shift in the relative frequency of have and be perfects is to be explained in terms of the expansion of the former into new contexts, while the latter remained stable. A formal analysis is proposed, taking as its starting point a comparison with German which shows that the older English be perfect indeed behaves more like the German stative passive than its haben and sein perfects.
The present study poses the question on what phonetic and phonological grounds postalveolar fricatives in Polish can be analyzed as retroflex and whether postalveolar fricatives in other Slavic languages are retroflex as well. Velarization and incompatibility with front vowels are introduced as articulatory criteria for retroflexion, based on crosslinguistic data. According to these criteria, Polish and Russian have retroflex fricatives, whereas Bulgarian and Czech do not. In a phonological representation of these Slavic retroflexes, the necessity of perceptual features is shown. Lastly, it is illustrated that palatalization of retroflex fricatives both in Slavic languages and more generally causes a phonetic and phonological change to a non-retroflex sound.