Linguistik
Refine
Year of publication
- 2008 (174) (remove)
Document Type
- Article (81)
- Part of a Book (31)
- Conference Proceeding (22)
- Preprint (20)
- Review (8)
- Report (4)
- Book (3)
- Doctoral Thesis (2)
- Working Paper (2)
- magisterthesis (1)
Has Fulltext
- yes (174) (remove)
Keywords
- Deutsch (20)
- Linguistik (20)
- Rezension (20)
- Germanistik (12)
- Russland (12)
- Metapher (9)
- Phonologie (9)
- Englisch (7)
- Kroatisch (7)
- Phonetik (7)
Institute
A series of production and perception experiments investigating the prosody and well-formedness of special sentences, called Wide Focus Partial Fronting (WFPF), which consist of only one prosodic phrase and a unique initial accented argument, are reported on here. The results help us to decide between different models of German prosody. The absence of pitch height difference on the accent of the sentence speaks in favor of a relative model of prosody, in which accents are scaled relative to each other, and against models in which pitch accents are scaled in an absolute way. The results also speak for a model in which syntax, but not information structure, influences the prosodic phrasing. Finally, perception experiments show that the prosodic structure of sentences with a marked word order needs to be presented for grammaticality judgments. Presentation of written material only is not enough, and falsifies the results.
Focus asymmetries in Bura
(2008)
This article presents the central aspects of the focus system of Bura (Chadic), which exhibits a number of asymmetries: Grammatical focus marking is obligatory only with focused subjects, where focus is marked by the particle án following the subject. Focused subjects remain in situ and the complement of án is a regular VP. With nonsubject foci, án appears in a cleft-structure between the fronted focus constituent and a relative clause. We present a semantically unified analysis of focus marking in Bura that treats the particle as a focusmarking copula in T that takes a property-denoting expression (the background) and an individual-denoting expression (the focus) as arguments. The article also investigates the realization of predicate and polarity focus, which are almost never marked. The upshot of the discussion is that Bura shares many characteristic traits of focus marking with other Chadic languages, but it crucially differs in exhibiting a structural difference in the marking of focus on subjects and non-subject constituents.
Freeze (1992) argued on the basis of data from several different languages that there is a close relationship between existential sentences (stating the existence of an entity) and locative sentences (stating the location of an entity). Freeze (1992) proposes that they are both derived from the same base structure and that the surface differences are rather due to the distinct information structures.This paper argues against this position with the data from Serbian existentials, which show clear syntactic differences from the locatives. Thus, the close relationship between existential and locative sentences that Freeze (1992) observes is conceptual, but not (necessarily) part of the syntax of the language. In order to account for the data, we propose that existential sentences originate from a different syntactic predication structure than the locative ones. The existential meaning arises, as we will show, from the interaction of this predication structure with the structure and meaning of the noun phrase.
The ACL 2008 Workshop on Parsing German features a shared task on parsing German. The goal of the shared task was to find reasons for the radically different behavior of parsers on the different treebanks and between constituent and dependency representations. In this paper, we describe the task and the data sets. In addition, we provide an overview of the test results and a first analysis.
Part-of-Speech tagging is generally performed by Markov models, based on bigram or trigram models. While Markov models have a strong concentration on the left context of a word, many languages require the inclusion of right context for correct disambiguation. We show for German that the best results are reached by a combination of left and right context. If only left context is available, then changing the direction of analysis and going from right to left improves the results. In a version of MBT (Daelemans et al., 1996) with default parameter settings, the inclusion of the right context improved POS tagging accuracy from 94.00% to 96.08%, thus corroborating our hypothesis. The version with optimized parameters reaches 96.73%.
The problem of vocalization, or diacritization, is essential to many tasks in Arabic NLP. Arabic is generally written without the short vowels, which leads to one written form having several pronunciations with each pronunciation carrying its own meaning(s). In the experiments reported here, we define vocalization as a classification problem in which we decide for each character in the unvocalized word whether it is followed by a short vowel. We investigate the importance of different types of context. Our results show that the combination of using memory-based learning with only a word internal context leads to a word error rate of 6.64%. If a lexical context is added, the results deteriorate slowly.
How to compare treebanks
(2008)
Recent years have seen an increasing interest in developing standards for linguistic annotation, with a focus on the interoperability of the resources. This effort, however, requires a profound knowledge of the advantages and disadvantages of linguistic annotation schemes in order to avoid importing the flaws and weaknesses of existing encoding schemes into the new standards. This paper addresses the question how to compare syntactically annotated corpora and gain insights into the usefulness of specific design decisions. We present an exhaustive evaluation of two German treebanks with crucially different encoding schemes. We evaluate three different parsers trained on the two treebanks and compare results using EVALB, the Leaf-Ancestor metric, and a dependency-based evaluation. Furthermore, we present TePaCoC, a new testsuite for the evaluation of parsers on complex German grammatical constructions. The testsuite provides a well thought-out error classification, which enables us to compare parser output for parsers trained on treebanks with different encoding schemes and provides interesting insights into the impact of treebank annotation schemes on specific constructions like PP attachment or non-constituent coordination.
In this paper, we present an open-source parsing environment (Tübingen Linguistic Parsing Architecture, TuLiPA) which uses Range Concatenation Grammar (RCG) as a pivot formalism, thus opening the way to the parsing of several mildly context-sensitive formalisms. This environment currently supports tree-based grammars (namely Tree-Adjoining Grammars (TAG) and Multi-Component Tree-Adjoining Grammars with Tree Tuples (TT-MCTAG)) and allows computation not only of syntactic structures, but also of the corresponding semantic representations. It is used for the development of a tree-based grammar for German.
This paper investigates the relation between TT-MCTAG, a formalism used in computational linguistics, and RCG. RCGs are known to describe exactly the class PTIME; simple RCG even have been shown to be equivalent to linear context-free rewriting systems, i.e., to be mildly context-sensitive. TT-MCTAG has been proposed to model free word order languages. In general, it is NP-complete. In this paper, we will put an additional limitation on the derivations licensed in TT-MCTAG. We show that TT-MCTAG with this additional limitation can be transformed into equivalent simple RCGs. This result is interesting for theoretical reasons (since it shows that TT-MCTAG in this limited form is mildly context-sensitive) and, furthermore, even for practical reasons: We use the proposed transformation from TT-MCTAG to RCG in an actual parser that we have implemented.
TT-MCTAG lets one abstract away from the relative order of co-complements in the final derived tree, which is more appropriate than classic TAG when dealing with flexible word order in German. In this paper, we present the analyses for sentential complements, i.e., wh-extraction, thatcomplementation and bridging, and we work out the crucial differences between these and respective accounts in XTAG (for English) and V-TAG (for German).
Developing linguistic resources, in particular grammars, is known to be a complex task in itself, because of (amongst others) redundancy and consistency issues. Furthermore some languages can reveal themselves hard to describe because of specific characteristics, e.g. the free word order in German. In this context, we present (i) a framework allowing to describe tree-based grammars, and (ii) an actual fragment of a core multicomponent tree-adjoining grammar with tree tuples (TT-MCTAG) for German developed using this framework. This framework combines a metagrammar compiler and a parser based on range concatenation grammar (RCG) to respectively check the consistency and the correction of the grammar. The German grammar being developed within this framework already deals with a wide range of scrambling and extraction phenomena.
Cet article étudie la relation entre les grammaires darbres adjoints à composantes multiples avec tuples darbres (TT-MCTAG), un formalisme utilisé en linguistique informatique, et les grammaires à concaténation dintervalles (RCG). Les RCGs sont connues pour décrire exactement la classe PTIME, il a en outre été démontré que les RCGs « simples » sont même équivalentes aux systèmes de réécriture hors-contextes linéaires (LCFRS), en dautres termes, elles sont légèrement sensibles au contexte. TT-MCTAG a été proposé pour modéliser les langages à ordre des mots libre. En général ces langages sont NP-complets. Dans cet article, nous définissons une contrainte additionnelle sur les dérivations autorisées par le formalisme TT-MCTAG. Nous montrons ensuite comment cette forme restreinte de TT-MCTAG peut être convertie en une RCG simple équivalente. Le résultat est intéressant pour des raisons théoriques (puisqu’il montre que la forme restreinte de TT-MCTAG est légèrement sensible au contexte), mais également pour des raisons pratiques (la transformation proposée ici a été utilisée pour implanter un analyseur pour TT-MCTAG).
In this paper we present a parsing architecture that allows processing of different mildly context-sensitive formalisms, in particular Tree-Adjoining Grammar (TAG), Multi-Component Tree-Adjoining Grammar with Tree Tuples (TT-MCTAG) and simple Range Concatenation Grammar (RCG). Furthermore, for tree-based grammars, the parser computes not only syntactic analyses but also the corresponding semantic representations.
In the course of the ME period, HAVE began to encroach on territory previously held by BE. According to Rydén and Brorström (1987); Kytö (1997), this occurred especially in iterative and durational contexts, in the perfect infinitive and modal constructions. In Early Modern English (henceforth EModE), BE was increasingly restricted to the most common intransitives come and go, before disappearing entirely in the 18th and 19th centuries. This development raises a number of questions, both historical and theoretical. First, why did HAVE start spreading at the expense of BE in the first place? Second, why was the change conditioned by the factors mentioned by Rydén and Brorström (1987) and Kytö (1997)? Third, why did the change take on the order of 800 years to go to completion? Fourth, what implications does the change have for general theories of auxiliary selection? In this paper we’ll try to answer the first question by focusing on one the earliest clearly identifiable advance of HAVE onto BE territory – its first appearance with the verb come, which for a number of reasons is an ideal verb to focus on. First, come is by far the most common intransitive verb, so we get large enough numbers for statistical analysis. Second, clauses containing the past participle of come with a form of BE are unambiguous perfects: they cannot be passives, and they did not continue into modern English with a stative reading like he is gone. Third, and perhaps most importantly, come selected BE categorically in the early stages of English, so the first examples we find with HAVE are clear evidence for innovation. We will present evidence from a corpus study showing that the first spread of HAVE was due to a ban on auxiliary BE in certain types of counterfactual perfects, and will propose an account for that ban in terms of Iatridou’s (2000) Exclusion theory of counterfactuals.
Verbs, nouns and affixation
(2008)
What explains the rich patterns of deverbal nominalization? Why do some nouns have argument structure, while others do not? We seek a solution in which properties of deverbal nouns are composed from properties of verbs, properties of nouns, and properties of the morphemes that relate them. The theory of each plus the theory of howthey combine, should give the explanation. In exploring this, we investigate properties of two theories of nominalization. In one, the verb-like properties of deverbal nouns result from verbal syntactic structure (a “structural model”). See, for example, van Hout & Roeper 1998, Fu, Roeper and Borer 1993, 2001, to appear, Alexiadou 2001, to appear). According to the structural hypothesis, some nouns contain VPs and/or verbal functional layers. In the other theory, the verbal properties of deverbal nouns result from the event structure and argument structure of the DPs that they head. By “event structure” we mean a representation of the elements and structure of a linguistic event, not a representation of the world. We refer to this view as the “event model”. According to the event model hypothesis, all derived nouns are represented with the same syntactic structure, the difference lying in argument structure – which in turn is critically related to event structure, in the way sketched in Grimshaw (1990), Siloni (1997) among others. In pursuing these lines of analysis, and at least to some extent disentangling their properties, we reach the conclusion that, with respect to a core set of phenomena, the two theories are remarkably similar – specifically, they achieve success with the same problems, and must resort to the same stipulations to address the remaining issues that we discuss (although the stipulations are couched in different forms).
Class features as probes
(2008)
In this article, we adress (i) the form and (ii) the function on inflection class features in minimalist grammar. The empirical evidence comes from noun inflection systems involving fusional markers in German, Greek, and Russian. As for (i), we argue (based on instances of transparadigmatic syncretism) that class features are not privative; rather, class information must be decomposed into more abstract, binary features. Concerning (ii), we propose that class features qualify as the very device that brings about fusional infection: They are uninterpretable in syntax and actas probes on stems, with matching inflection markers as goels, and thus trigger morphological Agree operations that merge stem and inflection marker before syntax is reached.
In this paper we compare the distribution of PPs introducing external arguments in nominalizations with PPs introducing external arguments in the verbal domain. We show that several mismatches exist between the behavior of PPs in nominalizations and PPs in the verbal domain. This leads us to suggest that while PPs in the verbal domain are licensed by functional structure alone, within the nominal domain, PPs can also be licensed via an interplay of the encyclopaedic meaning of the root involved and the properties of the preposition itself. This second mechanism kicks in in the absence of functional structure.
Structuring participles
(2008)
In this paper we discuss three types of adjectival participles in Greek, ending in -tos and –menos, and provide a further argument for the view that finer distinctions are necessary in the domain of participles (Kratzer 2001, Embick 2004). We further compare Greek stative participles to their German (and English) counterparts. We propose that a number of semantic as well as syntactic differences shown by these derive from differences in their respective morpho-syntactic composition.
In this paper we investigate the distribution of PPs related to external arguments (agent, causer, instrument, causing event) in Greek. We argue that their distribution supports an analysis, according to which agentive/instrument and causer PPs are licensed by distinct functional heads, respectively. We argue against a conceivable alternative analysis, which links agentivity and causation to the prepositions themselves. We furthermore identify a particular type of Voice head in Greek anticausative realised by non-active Voice morphology.
On the role of syntactic locality in morphological processes : the case of (Greek) derived nominals
(2008)
The paper is structured as follows. In section 2, I briefly summarize the facts on English and Greek nominalizations. In section 3, I discuss English nominal derivation in some detail. In section 4, I turn to the question of licensing of AS in nominals. In section 5, I turn to the issue of the optionality of licensing of AS in the nominal system.
This paper deals with the variable position of adjectives in the Romanian DP. As all other Romance languages, Romanian allows for adjectives to appear in both prenominal and post-nominal position. In addition, however, Romanian has a third pattern: the so-called cel construction, in which the adjective in the post-nominal position is preceded by a determiner-like element, cel. This pattern is superficially similar to Determiner Spreading in Greek. In this paper we contrast the cel construction to Greek DS and discuss the similarities and differences between the two. We then present an analysis of cel as involving an appositive specification clause, building on de Vries (2002). We argue that the same structure is also involved in the context of nominal ellipsis, the second environment in which cel is found.
Seitdem die Junggrammatiker den Begriff des Lautgesetzes geprägt haben, sind deren fast ebenso viele aufgestellt wie in der Folge hinterfragt, widerlegt und vielleicht am Ende sogar doch wieder erfolgreich verteidigt worden. Jedes Lautgesetz wirkt in einem unterschiedlichen Zeitraum. Ist aus dem Zeitraum des Wirkens mehrerer zeitlich benachbarter oder gar einander zeitlich überlappender Lautgesetze ein ausreichend großes Textkorpus erhalten, so ist es ein Leichtes, die Reihenfolge des Wirkens der Gesetze zu ermitteln, oder, im günstigsten Fall, den Zeitraum ihres Wirkens sogar mit gewisser Präzision datieren zu können. Anders verhält es sich hingegen, wenn schriftliche Überlieferungen der untersuchten Sprache in der entscheidenden Epoche nur spärlich oder gar nicht vorliegen. Hier muss daher traditionell darauf zurückgegriffen werden, die Reihenfolge anhand der allein möglichen Entwicklung einzelner Wörter, auf die besonders viele der betreffenden Lautgesetze gewirkt haben, zu bestimmen. Diese Methode birgt jedoch die Gefahr menschlicher Fehler, insbesondere in Fällen, in denen eine klare Reihenfolge nur unter Betrachtung mehrerer Wörter zu ermitteln ist. Die Forscher vergangener Jahrzehnte und Jahrhunderte hatten hier allerdings keine andere Wahl. Mit den heute verfügbaren Computern eröffnen sich jedoch ungeahnte Möglichkeiten. Zuvor in Programmiersprache umgeschriebene Lautgesetze können in Sekundenschnelle auf immense Textkorpora angewandt werden. Um aber – ohne jegliche Zuhilfenahme außersprachlichen Wissens – die eine oder mehrere mögliche Reihenfolgen verschiedener Lautgesetze zu bestimmen, ist es nötig, sämtliche Möglichkeiten anhand eines Wortkorpus durchzuspielen und die jeweiligen Ergebnisse mit den tatsächlichen, vorliegenden Ergebnissen zu vergleichen. Dieser Versuch soll im Folgenden unternommen werden. Auf diese Weise könnten dann relative Chronologien von Lautgesetzen, die als längst etabliert gelten, noch einmal auf den Prüfstand gestellt und möglicherweise sogar noch präzisiert werden. Nach einer kurzen Begriffsgeschichte des Lautgesetzes soll zunächst auf sprachliche Problemstellungen eingegangen werden, die das Vorhaben erschweren, bevor die Auswahl zweier den Untersuchungszeitraum begrenzender Sprachstufen sowie eine Beschreibung des Datenmaterials – Wortkorpus und Lautgesetze – folgen. Nun soll das Computerprogramm, von den Anforderungen bis hin zur Umsetzung, erläutert werden. Anschließen soll sich hieran eine Darstellung der Erkenntnisse, die die Ergebnisse des Programms gewähren. Im Schlussteil sollen die offen gebliebenen und die neu entstandenen Fragen noch einmal zusammengefasst und Möglichkeiten zur hierauf basierenden weitergehenden Forschung erörtert werden.