Linguistik
Refine
Year of publication
Document Type
- Conference Proceeding (106) (remove)
Language
- English (106) (remove)
Has Fulltext
- yes (106)
Is part of the Bibliography
- no (106)
Keywords
- Computerlinguistik (17)
- Informationsstruktur (16)
- Phonetik (12)
- Japanisch (9)
- Englisch (7)
- Grammatik (7)
- Maschinelle Übersetzung (6)
- Nungisch (6)
- Tibetobirmanische Sprachen (6)
- Deutsch (5)
Institute
A contrast to a trace
(2001)
For movement, such as quantifier raising, the three different structures illustrated in (1) are discussed in the recent literature.
(1) A girl danced with every boy
a. [every boy]x a girl danced with x (copy + replace)
b. [every boy]x a girl danced with [every boy] (copy)
c. [every boy]x a girl danced with [thex boy] (copy + modify)
In this paper, I'll call the proposal illustrated by (1a) the copy+replace theory since the movement is analyzed as first copying the moving phrase followed by replacing the moving phrase with a trace in the base position of movement. Chomsky (1993) and Fox (1999) argue against the copy+replace theory (1a) on the basis of Condition C data that show that moved material can behave as if it occupied the base position of movement. This behavior would, for example, be expected on the copy theory of movement illustrated by (1b), which also seems conceptually simpler than the copy+replace theory since it involves only copying without replacement. This conceptual advantage, however, is probably only apparent since a theory of the interpretation of structures like (1b) would probably be more complicated than for (1a). Standard assumptions about interpretation, at least, don't predict the right meaning when applied to (1b). For this reason, Chomsky and Fox propose what I'll call the copy+modify-theory illustrated in (1c). This proposes that copying is followed by a trace modification operation that replaces the determiner of the moved DP with something else. I assume that this is an indexed definite determiner, the interpretation of which is to be clarified below.
In terms of the direction of development, I referred to Johanna Nichols' work on head-marking vs. dependant marking. Nichols did not make reference to any languages in Tibeto-Burman, but all of the Tibeto-Burman languages that do not have verb agreement systems are solidly dependent-marking (i.e., they have marking on the nouns for case or pragmatic function); those languages with verb agreement systems, a type of head marking, also have many dependent-marking features (of the same types as the non-pronominalized languages). The question, then, is which is older, the dependent-marking type or the headmarking (actually mixed) type?
The aim of this paper is to give a unified account of the way that German demonstrative pronouns (henceforth: D-pronouns) like der, die and das behave (a) in sentences where they receive a coreferential interpretation, and (b) in sentences where they receive a covarying interpretation because they are in some way dependent on a quantificational expression – either via direct binding or indirectly, because the value they receive varies with the value that is assigned to the variable bound by an indefinite determiner.
Japanese is often taken to be strictly head-final in its syntax. In our work on a broad-coverage, precision implemented HPSG for Japanese, we have found that while this is generally true, there are nonetheless a few minor exceptions to the broad trend. In this paper, we describe the grammar engineering project, present the exceptions we have found, and conclude that this kind of phenomenon motivates on the one hand the HPSG type hierarchical approach which allows for the statement of both broad generalizations and exceptions to those generalizations and on the other hand the usefulness of grammar engineering as a means of testing linguistic hypotheses.
We present a broad coverage Japanese grammar written in the HPSG formalism with MRS semantics. The grammar is created for use in real world applications, such that robustness and performance issues play an important role. It is connected to a POS tagging and word segmentation tool. This grammar is being developed in a multilingual context, requiring MRS structures that are easily comparable across languages.
Based on a detailed case study of parallel grammar development distributed across two sites, we review some of the requirements for regression testing in grammar engineering, summarize our approach to systematic competence and performance profiling, and discuss our experience with grammar development for a commercial application. If possible, the workshop presentation will be organized around a software demonstration.
This paper proposes an annotating scheme that encodes honorifics (respectful words). Honorifics are used extensively in Japanese, reflecting the social relationship (e.g. social ranks and age) of the referents. This referential information is vital for resolving zero
pronouns and improving machine translation outputs. Annotating honorifics is a complex task that involves identifying a predicate with honorifics, assigning ranks to referents of the
predicate, calibrating the ranks, and connecting referents with their predicates.
We present an architecture for the integration of shallow and deep NLP components which is aimed at flexible combination of different language technologies for a range of practical current and future applications. In particular, we describe the integration of a high-level HPSG parsing system with different high-performance shallow components, ranging from named entity recognition to chunk parsing and shallow clause recognition. The NLP components enrich a representation of natural language text with layers of new XML meta-information using a single shared data structure, called the text chart. We describe details of the integration methods, and show how information extraction and language checking applications for realworld German text benefit from a deep grammatical analysis.
Hybrid robust deep and shallow semantic processing for creativity support in document production
(2004)
The research performed in the DeepThought project (http://www.project-deepthought.net) aims at demonstrating the potential of deep linguistic processing if added to existing shallow methods that ensure robustness. Classical information retrieval is extended by high precision concept indexing and relation detection. We use this approach to demonstrate the feasibility of three ambitious applications, one of which is a tool for creativity support in document production and collective brainstorming. This application is described in detail in this paper. Common to all three applications, and the basis for their development is a platform for integrated linguistic processing. This platform is based on a generic software architecture that combines multiple NLP components and on robust minimal recursive semantics (RMRS) as a uniform representation language.
In this paper we show an approach to the customization of GermaNet to the German HPSG grammar lexicon developed in the Verbmobil project. GermaNet has a broad coverage of the German base vocabulary and fine-grained semantic classification; while the HPSG grammar lexicon is comparatively small und has a coarse-grained semantic classification. In our approach, we have developed a mapping algorithm to relate the synsets in GermaNet with the semantic sorts in HPSG. The evaluation result shows that this approach is useful for the lexical extension of our deep grammar development to cope with real-world text understanding.
Particles fullfill several distinct central roles in the Japanese language. They can mark arguments as well as adjuncts, can be functional or have semantic functions. There is, however, no straightforward matching from particles to functions, as, e.g., 'ga' can mark the subject, the object or the adjunct of a sentence. Particles can cooccur. Verbal arguments that could be identified by particles can be eliminated in the Japanese sentence. And finally, in spoken language particles are often omitted. A proper treatment of particles is thus necessary to make an analysis of Japanese sentences possible. Our treatment is based on an empirical investigation of 800 dialogues. We set up a type hierarchy of particles motivated by their subcategorizational and modificational behaviour. This type hierarchy is part of the Japanese syntax in VERBMOBIL.
Preferences and defaults for definiteness and number in japanese to german machine translation
(1996)
A significant problem when translating Japanese dialogues into German is the missing information on number and definiteness in the Japanese analysis output. The integration of the search for such information into the transfer process provides an efficient solution. General transfer includes conditions to make it possible to consider external knowledge. Thereby, grammatical and lexical knowledge of the source language, knowledge of lexical restrictions on the target language, domain knowledge and discourse knowledge are accessible.
We present a solution for the representation of Japanese honorifical information in the HPSG framework. Basically, there are three dimensions of honorification. We show that a treatment is necessary that involves both the syntactic and the contextual level of information. The japanese grammar is part of a machine translation system.
The research performed in the DeepThought project aims at demonstrating the potential of deep linguistic processing if combined with shallow methods for robustness. Classical information retrieval is extended by high precision concept indexing and relation detection. On the basis of this approach, the feasibility of three ambitious applications will be demonstrated, namely: precise information extraction for business intelligence; email response management for customer relationship management; creativity support for document production and collective brainstorming. Common to these applications, and the basis for their development is the XML-based, RMRS-enabled core architecture framework that will be described in detail in this paper. The framework is not limited to the applications envisaged in the DeepThought project, but can also be employed e.g. to generate and make use of XML standoff annotation of documents and linguistic corpora, and in general for a wide range of NLP-based applications and research purposes.
In this paper we describe SOBA, a sub-component of the SmartWeb multi-modal dialog system. SOBA is a component for ontologybased information extraction from soccer web pages for automatic population of a knowledge base that can be used for domainspecific question answering. SOBA realizes a tight connection between the ontology, knowledge base and the information extraction component. The originality of SOBA is in the fact that it extracts information from heterogeneous sources such as tabular structures, text and image captions in a semantically integrated way. In particular, it stores extracted information in a knowledge base, and in turn uses the knowledge base to interpret and link newly extracted information with respect to already existing entities.
This demo abstract describes the SmartWeb Ontology-based Information Extraction System (SOBIE). A key feature of SOBIE is that all information is extracted and stored with respect to the SmartWeb ontology. In this way, other components of the systems, which use the same ontology, can access this information in a straightforward way. We will show how information extracted by SOBIE is visualized within its original context, thus enhancing the browsing experience of the end user.
The Deep Linguistic Processing with HPSG Initiative (DELH-IN) provides the infrastructure needed to produce open-source semantic transfer-based machine translation systems. We have made available a prototype Japanese-English machine translation system built from existing resources include parsers, generators, bidirectional grammars and a transfer engine.
While the sortal constraints associated with Japanese numeral classifiers are wellstudied, less attention has been paid to the details of their syntax. We describe an analysis implemented within a broadcoverage HPSG that handles an intricate set of numeral classifier construction types and compositionally relates each to an appropriate semantic representation, using Minimal Recursion Semantics.
We present an effort for the development of multilingual named entity grammars in a unification-based finite-state formalism (SProUT). Following an extended version of the MUC7 standard, we have developed Named Entity Recognition grammars for German, Chinese, Japanese, French, Spanish, English, and Czech. The grammars recognize person names, organizations, geographical locations, currency, time and date expressions. Subgrammars and gazetteers are shared as much as possible for the grammars of the different languages. Multilingual corpora from the business domain are used for grammar development and evaluation. The annotation format (named entity and other linguistic information) is described. We present an evaluation tool which provides detailed statistics and diagnostics, allows for partial matching of annotations, and supports user-defined mappings between different annotation and grammar output formats.
"Ich mag so Wasserpfeifeladen" : the interaction of grammar and information structure in Kiezdeutsch
(2008)
This article examines the expression of natural gender in Icelandic nouns denoting human beings. Particular attention will be paid to the system's symmetry with regards to nouns denoting women and men. Our society consists more or less exactly of half women and half men. One would therefore assume that systems for terms denoting persons would also be symmetrically organised. Yet this assumption could not be further from the truth, and not just in single isolated cases, but in many languages: I will attempt to show that Icelandic has numerous methods for referring to women, but also many barriers and idiosyncrasies.
Human communication takes place when one person does something that when seen or heard by another person is taken to be done with the intention to communicate, and the other person, having seen the communicator show his or her intention to communicate, then uses inference to determine what the communicator intends to communicate. This is possible because the addressee assumes that the communicator is a rational person, that is, acts with goals in mind (see Grice 1975), and so must be doing the act for a reason, and it is worth the addressee’s effort to try to determine what that reason is, that is, determine the relevance of the act.
Rawang [...] is a Tibeto-Burman language spoken by people who live in the far north of Kachin State in Myanmar (Burma), particularly along the Mae Hka ('Nmai Hka) and Maeli Hka (Mali Hka) river valleys (see map on back page); population unknown, although Ethnologue gives 100,000. In the past they had been called ‘Nung’, or (mistakenly) ‘Hkanung’, and are considered to be a sub-group of the Kachin by the Myanmar government. Until government policies put a stop to the clearing of new land in 1994, the Rawang speakers still practiced slash and burn farming on the mountainsides (they still do a bit, but only on already claimed land), in conjunction with planting paddy rice near the river. They are closely related to people on the other side of the Chinese border in Yunnan classified as either Dulong or Nu(ng) (see LaPolla 2001, 2003 on the Dulong language). In this paper, I will be discussing the word-class-changing constructions found in Rawang, using data of the Mvtwang (Mvt River) dialect of Rawang, which is considered the most central of those dialects in Myanmar and so has become something of a standard for writing and inter-group communication.
Rawang (Rvwàng) is a Tibeto-Burman language spoken in the far north of Myanmar (Burma), and is closely related to the Dulong language spoken in China. Rawang manifests a kind of hierarchical person marking on the predicate which marks first person primarily (in several different ways - suffixes, change of final consonant, vowel length - and up to five times within one verb complex), and second person indirectly with a sort of marking similar to the inverse marking found in some North American languages: it appears when there is a first person participant, but that referent is not the actor, and when the second person is a participant. This system is quite different from those that reflect semantic role (e.g. Qiang) or grammatical relations (e.g. English).
Rawang [...] is a Tibeto-Burman language spoken by people who live in the far north of Kachin State in Myanmar (Burma), particularly along the Mae Hka ('Nmai Hka) and Maeli Hka (Mali Hka) river valleys; population unknown, although Ethnologue gives 100,000. In the past they had been called ‘Nung’, or (mistakenly) ‘Hkanung’, and are considered to be a sub-group of the Kachin by the Myanmar government. They are closely related to people on the other side of the Chinese border in Yunnan classified as either Dulong or Nu (see LaPolla 2001, 2003 on the Dulong language and Sun 1988, Sun & Liu 2005 on the Anong language). In this paper, I will be discussing a particular morphological phenomenon found in Rawang, using data of the Mvtwang (Mvt River) dialect of Rawang, which is considered the most central of those dialects in Myanmar and so has become something of a standard for writing and inter-group communication.
Questions on transitivity
(2008)
This handout (it isn’t a paper) presents phenomena and questions, rather than conclusions, related to the concept of transitivity. The idea is to return to these questions at the end of the Workshop to see if we can have a clearer consensus about the best general analysis of phenomena associated with transitivity. Section 2 presents alternative analyses of transitivity and questions about transitivity in three languages I have worked on. Section 3 discusses a few of the different conceptualisations of transitivity that might be relevant to our thinking about the questions related to these languages or that bring up further questions. Section 4 presents some general questions that might be asked of individual languages.
This paper is more about presenting phenomena and questions related to the concept of transitivity in Tibeto-Burman languages that I hope will stimulate discussion, rather than presenting strong conclusions. Sections 2 and 3 present alternative analyses of transitivity and questions about transitivity in two Tibeto-Burman languages I have worked on. In Section 4 I discuss some general issues about transitivity.
This paper is an inductive look at the constituents found in a randomly selected Tagalog text, Bob Ong’s Alamat ng Gubat (Makati City, MM: Visual Print Enterprises, 2004). The analysis is based on the full text, but we will only be able to go through the first few lines of the text here, which we will do one by one, and discuss the structures found in each line of the text in bullet format after the relevant line. At the end of the paper we will bring up some important questions about the structures found in Tagalog based on this text.
Twenty years ago I discussed the oldest isoglosses in the South Slavic linguistic area (1982). Subscribing to Van Wijk’s view that the bundle of isoglosses which separates Bulgarian from Serbo-Croatian was the result of an early split in South Slavic and that the transitional dialects originated from a later mixture of Serbian and Bulgarian dialects when the contact between the two languages had been restored (1927), I argued that the shared innovations of Bulgarian and Serbo-Croatian must be dated to a period when the dialects were still spoken in the original Trans-Carpathian homeland of the Slavs. I concluded that there is no evidence for common innovations of South Slavic which were posterior to the end of what I have called the Late Middle Slavic period, which I dated to the 4th through 6th centuries AD. At that time, the major dialect divisions of Slavic were already established.
Twenty years ago (1983), I severely criticized Halle and Kiparsky’s review (1981) of Garde’s history of Slavic accentuation (1976). I concluded that Halle and Ki-parsky’s theoretical framework “rests upon an unwarranted limitation of the available evidence, obscures the chronological perspective, and yields results which are partly not new and partly incorrect. It is harmful because it does not give the facts their proper due and thereby blocks the road to empirical study, giving a free hand to unrestrained speculation” (1983: 40). As Halle has recently returned to the subject (2001), it may be interesting to see if there has been some progress in his thinking over the last two decades. In the following I shall try to avoid repeating what I have said in my earlier discussion.
1. There are two classes of theories of Universal Grammar: (1) Formalist theories, such as the widespread varieties of generative grammar. These theories start from the assumption that certain strings of linguistic forms are grammatical while other strings are ungrammatical. A grammar of this type produces grammatical strings and does not produce ungrammatical ones. All theories of this class fail in the same respect: they do not account for the meaning of the strings. (2) Semiotactic theories, which describe the meaning of a string in terms of the meanings of its constituent forms and their interrelations. The only elaborate formalized theory of this class presently available is the one advanced by C.L. Ebeling (Syntax and Semantics, Leiden: Brill, 1978). I shall discuss some of its mathematical properties here.
Friedrich Schlegel's lasting contribution to linguistics is usually seen in the impact that his book "Über die Sprache und Weisheit der Indier" from 1808 left on comparative linguistics and on the study of Sanskrit. Schlegel was one of the first European scholars to have studied Sanskrit extensively and he made a number of translations of Sanskrit literature into German which make up one third of "Über die Sprache und Weisheit der Indier". Schlegel's book is widely regarded as a founding document both of comparative linguistics and of indology, a fact which is quite remarkable in light of the development of Schlegel's thought after this text. His interest in Indian studies ceased more or less directly with the publication of this work, while his thoughts on language became more and more suffused by transcendental philosophy.
Research on dialectal varieties was for a long time concentrated on phonetic aspects of language. While there was a lot of work done on segmental aspects, suprasegmentals remained unexploited until the last few years, despite the fact that prosody was remarked as a salient aspect of dialectal variants by linguists and by naive speakers. Actual research on dialectal prosody in the German speaking area often deals with discourse analytic methods, correlating intonations curves with communicative functions (P. Auer et al. 2000, P. Gilles & R. Schrambke 2000, R. Kehrein & S. Rabanus 2001). The project I present here has another focus. It looks at general prosodic aspects, abstracted from actual situations. These global structures are modelled and integrated in a speech synthesis system. Today, mostly intonation is being investigated. However, rhythm, the temporal organisation of speech, is not a core of actual research on prosody. But there is evidence that temporal organisation is one of the main structuring elements of speech (B. Zellner 1998, B. Zellner Keller 2002). Following this approach developed for speech synthesis, I will present the modelling of the timing of two Swiss German dialects (Bernese and Zurich dialect) that are considered quite different on the prosodic level. These models are part of the project on the "development of basic knowledge for research on Swiss German prosody by means of speech synthesis modelling" founded by the Swiss National Science Foundation.
I discuss the status of WH-words for interrogative interpretations, and show that the derivation of constituent questions evolves from a specific interplay of syntactic and semantic representations with pragmatics. I argue that WH-pronouns are not ‘interrogative’. Rather, they are underspecified elements; due to this underspecification, WH-words can form a constitutive part not only of interrogative, but also of exclamative and declarative clauses. WH-words introduce a variable of a particular conceptual domain into the semantic representation. Accordingly, they have to be specified for interpretation. Different WH-contexts give rise to different interpretations. In a cross-linguistic overview, I discuss the characteristic elements contributing to the derivation of interrogatives. I argue that specific particles or their phonologically empty counterparts in the head of CP contribute the interrogative aspect. The speech act of ‘asking’ is then carried out via an intonational contour that identifies a question. By default, this intonational contour operates on interrogative sentences; however, other sentence formats – in particular, those of declarative sentences – are possible as well. The distinction of (a) grammatical (syntactic, semantic and phonological) sentence formats for interrogative and declarative sentences, and (b) intonational contours serving the discrimination of speech acts like questions and assertions, can be related to psychological and neurological evidence.
What role does language play in the development of numerical cognition? In the present paper I argue that the evolution of symbolic thinking (as a basis for language) laid the grounds for the emergence of a systematic concept of number. This concept is grounded in the notion of an infinite sequence and encompasses number assignments that can focus on cardinal aspects ("three pencils"), ordinal aspects ("the third runner"), and even nominal aspects ("bus #3"). I show that these number assignments are based on a specific association of relational structures, and that it is the human language faculty that provides a cognitive paradigm for such an association, suggesting that language played a pivotal role in the evolution of systematic numerical cognition.
In linguistics and the philosophy of language, the mass/count distinction has traditionally been regarded as a bi-partition on the nominal domain, where typical instances are nouns like "beef" (mass) vs."cow" (count). In the present paper, we argue that this partition reveals a system that is based on both syntactic features and conceptual features, and present experimental evidence suggesting that the discrimination of the two kinds of features has a psychological reality.
The study investigates the contribution of tactile and auditory feedback in the adaptation of /s/ towards a palatal prosthesis. Five speakers were recorded via electromagnetic articulography, at first without the prosthesis, then with the prosthesis and auditory feedback masked, and finally with the prosthesis and auditory feedback available. Tongue position, jaw position and acoustic centre of gravity of productions of the sound were measured. The results show that the initial adaptation attempts without auditory feedback are dependent on the prosthesis type and directed towards reaching the original tongue palate contact pattern. Speakers with a prosthesis which retracted the alveolar ridge retracted the tongue. Speakers with a prosthesis which did not change the place of the alveolar ridge did not retract the tongue. All speakers lowered the jaw. In a second adaptation step with auditory feedback available speakers reorganised tongue and jaw movements in order to produce more subtle acoustic characteristics of the sound such as the high amplitude noise which is typical for sibilants.
A two-week perturbation EMA-experiment was carried out with palatal prostheses. Articulatory effort for five speakers was assessed by means of peak acceleration and jerk during the tongue tip gestures from /t/ towards /i, e, o, y, u/. After a period of no change speakers showed an increase in these values. Towards the end of the experiment the values decreased. The results are interpreted as three phases of carrying out changes in the internal model. At first, the complete production system is shifted in relation to the palatal change, afterwards speakers explore different production mechanisms which involves more articulatory effort. This second phase can be seen as a training phase where several articulatory strategies are explored. In the third phase speakers start to select an optimal movement strategy to produce the sounds so that the values decrease.
Temporal development of compensation strategies for perturbed palate shape in German /S/-production
(2006)
The palate shape of four speakers was changed by a prosthesis which either lowered the palate or retracted the alveoles. Subjects wore the prosthesis for two weeks and were recorded several times via EMA. Results of articulatory measurements show that speakers use different compensation methods at different stages of the adaptation. They lower the tongue immediately after the insertion of the prosthesis. Other compensation methods as for example lip protrusion are only acquired after longer practising periods. The results are interpreted as supporting the existence of different mappings between motor commands, vocal tract shape and auditory-acoustic target.
Several articulatory strategies are available during the production of /u/, all resulting in a similar acoustic output. /u/ has two main constrictions, at the velum and at the lips. A perturbation of either constriction can be compensated at the other one, e.g wider constriction at the velum by more lip protrusion, wider lip opening by more tongue retraction. This study investigates whether speakers use this relation under perturbation. Six speakers were provided with palatal prostheses which were worn for two weeks. Speakers were instructed to make a serious attempt to produce normal speech. Their speech was recorded via EMA and acoustics several times over the adaptation period. Formant values of /u/-productions were measured. Velar constriction width and lip protrusion were estimated. For four speakers a correlation between constriction width and lip protrusion was found. A negative correlation between lip protrusion and F1 or F2 could sometimes be observed, but no correlation occurred between constriction size and either of the formants. The results show that under perturbation speakers use motor equivalent strategies in order to adapt. The correlation between constriction size and lip protrusion is stronger than in studies investigating unperturbed speech. This could be because under perturbation speakers are inclined to try out several strategies in order to reach the acoustic target and the co-variability might thus be greater.