Linguistik
Refine
Year of publication
Document Type
- Preprint (122) (remove)
Has Fulltext
- yes (122)
Is part of the Bibliography
- no (122)
Keywords
- Deutsch (19)
- Multicomponent Tree Adjoining Grammar (9)
- Schweizerdeutsch (9)
- Syntax (9)
- Syntaktische Analyse (8)
- Semantik (7)
- Lexicalized Tree Adjoining Grammar (6)
- Dialektologie (5)
- Optimalitätstheorie (5)
- Range Concatenation Grammar (5)
Institute
- Extern (69)
- Sprachwissenschaften (1)
This article presents linguistic features of and educational approaches to a new variety of German that has emerged in multi-ethnic urban areas in Germany: Kiezdeutsch (‘Hood German’). From a linguistic point of view, Kiezdeutsch is very interesting, as it is a multi-ethnolect that combines features of a youth language with those of a contact language. We will present examples that illustrate the grammatical productivity and innovative potential of this variety. From an educational perspective, Kiezdeutsch has also a high potential in many respects: school projects can help enrich intercultural communication and weaken derogatory attitudes. In grammar lessons, Kiezdeutsch can be a means to enhance linguistic competence by having the adolescents analyse their own language. Keywords: German, Kiezdeutsch, multi-ethnolect, migrants’ language, language change, educational proposals
In this paper, we investigate the usefulness of a wide range of features for their usefulness in the resolution of nominal coreference, both as hard constraints (i.e. completely removing elements from the list of possible candidates) as well as soft constraints (where a cumulation of violations of soft constraints will make it less likely that a candidate is chosen as the antecedent). We present a state of the art system based on such constraints and weights estimated with a maximum entropy model, using lexical information to resolve cases of coreferent bridging.
Multicomponent Tree Adjoining Grammars (MCTAGs) are a formalism that has been shown to be useful for many natural language applications. The definition of non-local MCTAG however is problematic since it refers to the process of the derivation itself: a simultaneity constraint must be respected concerning the way the members of the elementary tree sets are added. Looking only at the result of a derivation (i.e., the derived tree and the derivation tree), this simultaneity is no longer visible and therefore cannot be checked. I.e., this way of characterizing MCTAG does not allow to abstract away from the concrete order of derivation. In this paper, we propose an alternative definition of MCTAG that characterizes the trees in the tree language of an MCTAG via the properties of the derivation trees (in the underlying TAG) the MCTAG licences. We provide similar characterizations for various types of MCTAG. These characterizations give a better understanding of the formalisms, they allow a more systematic comparison of different types of MCTAG, and, furthermore, they can be exploited for parsing.
Multicomponent Tree Adjoining Grammars (MCTAG) is a formalism that has been shown to be useful for many natural language applications. The definition of MCTAG however is problematic since it refers to the process of the derivation itself: a simultaneity constraint must be respected concerning the way the members of the elementary tree sets are added. This way of characterizing MCTAG does not allow to abstract away from the concrete order of derivation. In this paper, we propose an alternative definition of MCTAG that characterizes the trees in the tree language of an MCTAG via the properties of the derivation trees (in the underlying TAG) the MCTAG licences. This definition gives a better understanding of the formalism, it allows a more systematic comparison of different types of MCTAG, and, furthermore, it can be exploited for parsing.
Multicomponent Tree Adjoining Grammars (MCTAG) is a formalism that has been shown to be useful for many natural language applications. The definition of MCTAG however is problematic since it refers to the process of the derivation itself: a simultaneity constraint must be respected concerning the way the members of the elementary tree sets are added. Looking only at the result of a derivation (i.e., the derived tree and the derivation tree), this simultaneity is no longer visible and therefore cannot be checked. I.e., this way of characterizing MCTAG does not allow to abstract away from the concrete order of derivation. Therefore, in this paper, we propose an alternative definition of MCTAG that characterizes the trees in the tree language of an MCTAG via the properties of the derivation trees the MCTAG licences.
This book is a full reference grammar of Qiang, one of the minority languages of southwest China, spoken by about 70,000 Qiang and Tibetan people in Aba Tibetan and Qiang Autonomous Prefecture in northern Sichuan Province. It belongs to the Qiangic branch of Tibeto-Burman (one of the two major branches of Sino-Tibetan). The dialect presented in the book is the Northern Qiang variety spoken in Ronghong Village, Yadu Township, Chibusu District, Mao County. This book, the first book-length description of the Qiang language in English, is the result of many years of work on the language.
A hierarchy of local TDGs
(1998)
Many recent variants of Tree Adoining Grammars (TAG) allow an underspecifiaction of the parent relation between nodes in a tree, i.e. they do not deal with fully specified trees as it is the case with TAGs.Such TAG variants are for example Description Tree Grammars (DTG), Unordered Vector Grammars with Dominance Links (UVG-DL), a definition of TAGs via so-called quasi trees and Tree Description Grammars (TDG. The last TAg variant, local TDG, is an extension of TAG generating Tree Descriptions. Local TDGs even allow an underspecification of the dominance relation between node names and thereby provide the possibility to generate underspecified representations for structural ambiguities such as quantifier scope ambiguities. This abstract deals with formal properties of local TDGs. A hierarchiy of local TDGs is established together with a pumping lemma for local TDGs of a certain rank.
This paper provides an overview of current research on a hybrid and robust parsing architecture for the morphological, syntactic and semantic annotation of German text corpora. The novel contribution of this research lies not in the individual parsing modules, each of which relies on state-of-the-art algorithms and techniques. Rather what is new about the present approach is the combination of these modules into a single architecture. This combination provides a means to significantly optimize the performance of each component, resulting in an increased accuracy of annotation.
This paper investigates the class of Tree-Tuple MCTAG with Shared Nodes, TT-MCTAG for short, an extension of Tree Adjoining Grammars that has been proposed for natural language processing, in particular for dealing with discontinuities and word order variation in languages such as German. It has been shown that the universal recognition problem for this formalism is NP-hard, but so far it was not known whether the class of languages generated by TT-MCTAG is included in PTIME. We provide a positive answer to this question, using a new characterization of TT-MCTAG.
Traditionally, parsers are evaluated against gold standard test data. This can cause problems if there is a mismatch between the data structures and representations used by the parser and the gold standard. A particular case in point is German, for which two treebanks (TiGer and TüBa-D/Z) are available with highly different annotation schemes for the acquisition of (e.g.) PCFG parsers. The differences between the TiGer and TüBa-D/Z annotation schemes make fair and unbiased parser evaluation difficult [7, 9, 12]. The resource (TEPACOC) presented in this paper takes a different approach to parser evaluation: instead of providing evaluation data in a single annotation scheme, TEPACOC uses comparable sentences and their annotations for 5 selected key grammatical phenomena (with 20 sentences each per phenomena) from both TiGer and TüBa-D/Z resources. This provides a 2 times 100 sentence comparable testsuite which allows us to evaluate TiGer-trained parsers against the TiGer part of TEPACOC, and TüBa-D/Z-trained parsers against the TüBa-D/Z part of TEPACOC for key phenomena, instead of comparing them against a single (and potentially biased) gold standard. To overcome the problem of inconsistency in human evaluation and to bridge the gap between the two different annotation schemes, we provide an extensive error classification, which enables us to compare parser output across the two different treebanks. In the remaining part of the paper we present the testsuite and describe the grammatical phenomena covered in the data. We discuss the different annotation strategies used in the two treebanks to encode these phenomena and present our error classification of potential parser errors.
This paper reports on the SYN-RA (SYNtax-based Reference Annotation) project, an on-going project of annotating German newspaper texts with referential relations. The project has developed an inventory of anaphoric and coreference relations for German in the context of a unified, XML-based annotation scheme for combining morphological, syntactic, semantic, and anaphoric information. The paper discusses how this unified annotation scheme relates to other formats currently discussed in the literature, in particular the annotation graph model of Bird and Liberman (2001) and the pie-in-thesky scheme for semantic annotation.
Language universals are statements that are true of all languages, for example: “all languages have stop consonants”. But beneath this simple definition lurks deep ambiguity, and this triggers misunderstanding in both interdisciplinary discourse and within linguistics itself. A core dimension of the ambiguity is captured by the opposition “absolute vs. statistical universal”, although the literature uses these terms in varied ways. Many textbooks draw the boundary between absolute and statistical according to whether a sample of languages contains exceptions to a universal. But the notion of an exception-free sample is not very revealing even if the sample contained all known languages: there is always a chance that an as yet undescribed language, or an unknown language from the past or future, will provide an exception.
In this paper we investigate the distribution of PPs related to external arguments (agent, causer, instrument, causing event) in Greek. We argue that their distribution supports an analysis, according to which agentive/instrument and causer PPs are licensed by distinct functional heads, respectively. We argue against a conceivable alternative analysis, which links agentivity and causation to the prepositions themselves. We furthermore identify a particular type of Voice head in Greek anticausative realised by non-active Voice morphology.
We present a CYK and an Earley-style algorithm for parsing Range Concatenation Grammar (RCG), using the deductive parsing framework. The characteristic property of the Earley parser is that we use a technique of range boundary constraint propagation to compute the yields of non-terminals as late as possible. Experiments show that, compared to previous approaches, the constraint propagation helps to considerably decrease the number of items in the chart.
Thirty-one years ago Tsu-lin Mei (1961) argued against the traditional doctrine that saw the subject-predicate distinction in grammar as parallel to the particular- universal distinction in logic, as he said it was a reflex of an Indo-European bias, and could not be valid, as ‘Chinese ... does not admit a distinction into subject and predicate’ (p. 153). This has not stopped linguists working on Chinese from attempting to define ‘subject’ (and ‘object’) in Chinese. Though a number of linguists have lamented the difficulties in trying to define these concepts for Chinese (see below), most work done on Chinese still assumes that Chinese must have the same grammatical features as Indo-European, such as having a subject and a direct object, though no attempt is made to justify that view. This paper challenges that view and argues that there has been no grammaticalization of syntactic functions in Chinese. The correct assignment of semantic roles to the constituents of a discourse is done by the listener on the basis of the discourse structure and pragmatics (information flow, inference, relevance, and real world knowledge) (cf. Li & Thompson 1978, 1979; LaPolla 1990).
Maschinelles Lernen wird häufig zur effzienten Annotation großer Datenmengen eingesetzt. Die Forschung zu maschinellen Lernverfahren beschränkt sich i.a. darauf unterschiedliche Lernverfahren zu vergelichen oder die optimale größe der Trainingsdaten zu bestimmen. Bisher wurde jedoch nicht untersucht, in wie weit sich linguistisches Wissen bei der Aufgabendefinition positiv auswirken kann. Dies soll hier anhand des Lernens von Base-Nominalphrasen mit drei unterschiedlichen Definitionen untersucht werden. Die Definitionen unterscheiden sich im Grad der linguistisch motivierten Erweiterungen, die zu einer eher praktisch motivierten ersten Definition hinzu kamen. Die Untersuchungen ergaben, dass sich die Anzahl der falsch klasssifizierten Wörter um ein Drittel reduzieren lässt.
Class features as probes
(2008)
In this article, we adress (i) the form and (ii) the function on inflection class features in minimalist grammar. The empirical evidence comes from noun inflection systems involving fusional markers in German, Greek, and Russian. As for (i), we argue (based on instances of transparadigmatic syncretism) that class features are not privative; rather, class information must be decomposed into more abstract, binary features. Concerning (ii), we propose that class features qualify as the very device that brings about fusional infection: They are uninterpretable in syntax and actas probes on stems, with matching inflection markers as goels, and thus trigger morphological Agree operations that merge stem and inflection marker before syntax is reached.
In this paper we investigate Greek, an optional clitic doubling language not subject to Kaynes generalization (Jaeggli 1982), and we argue that in this language, doubled DPs are in A-positions. We propose that Greek clitics are formal features that move, permitting DPs in argument positions. This leads to a typology according to which there are two types of clitic/agreement languages -configurational and nonconfigurational ones-, depending upon whether clitics are instantiations of formal features or not.
Prepositional phrase (PP) attachment is one of the major sources for errors in traditional statistical parsers. The reason for that lies in the type of information necessary for resolving structural ambiguities. For parsing, it is assumed that distributional information of parts-of-speech and phrases is sufficient for disambiguation. For PP attachment, in contrast, lexical information is needed. The problem of PP attachment has sparked much interest ever since Hindle and Rooth (1993) formulated the problem in a way that can be easily handled by machine learning approaches: In their approach, PP attachment is reduced to the decision between noun and verb attachment; and the relevant information is reduced to the two possible attachment sites (the noun and the verb) and the preposition of the PP. Brill and Resnik (1994) extended the feature set to the now standard 4-tupel also containing the noun inside the PP. Among many publications on the problem of PP attachment, Volk (2001; 2002) describes the only system for German. He uses a combination of supervised and unsupervised methods. The supervised method is based on the back-off model by Collins and Brooks (1995), the unsupervised part consists of heuristics such as ”If there is a support verb construction present, choose verb attachment”. Volk trains his back-off model on the Negra treebank (Skut et al., 1998) and extracts frequencies for the heuristics from the ”Computerzeitung”. The latter also serves as test data set. Consequently, it is difficult to compare Volk’s results to other results for German, including the results presented here, since not only he uses a combination of supervised and unsupervised learning, but he also performs domain adaptation. Most of the researchers working on PP attachment seem to be satisfied with a PP attachment system; we have found hardly any work on integrating the results of such approaches into actual parsers. The only exceptions are Mehl et al. (1998) and Foth and Menzel (2006), both working with German data. Mehl et al. report a slight improvement of PP attachment from 475 correct PPs out of 681 PPs for the original parser to 481 PPs. Foth and Menzel report an improvement of overall accuracy from 90.7% to 92.2%. Both integrate statistical attachment preferences into a parser. First, we will investigate whether dependency parsing, which generally uses lexical information, shows the same performance on PP attachment as an independent PP attachment classifier does. Then we will investigate an approach that allows the integration of PP attachment information into the output of a parser without having to modify the parser: The results of an independent PP attachment classifier are integrated into the parse of a dependency parser for German in a postprocessing step.
The work presented here addresses the question of how to determine whether a grammar formalism is powerful enough to describe natural languages. The expressive power of a formalism can be characterized in terms of i) the string languages it generates (weak generative capacity (WGC)) or ii) the tree languages it generates (strong generative capacity (SGC)). The notion of WGC is not enough to determine whether a formalism is adequate for natural languages. We argue that even SGC is problematic since the sets of trees a grammar formalism for natural languages should be able to generate is difficult to determine. The concrete syntactic structures assumed for natural languages depend very much on theoretical stipulations and empirical evidence for syntactic structures is rather hard to obtain. Therefore, for lexicalized formalisms, we propose to consider the ability to generate certain strings together with specific predicate argument dependencies as a criterion for adequacy for natural languages.
This paper compares two approaches to computational semantics, namely semantic unification in Lexicalized Tree Adjoining Grammars (LTAG) and Lexical Resource Semantics (LRS) in HPSG. There are striking similarities between the frameworks that make them comparable in many respects. We will exemplify the differences and similarities by looking at several phenomena. We will show, first of all, that many intuitions about the mechanisms of semantic computations can be implemented in similar ways in both frameworks. Secondly, we will identify some aspects in which the frameworks intrinsically differ due to more general differences between the approaches to formal grammar adopted by LTAG and HPSG.
Cet article étudie la relation entre les grammaires darbres adjoints à composantes multiples avec tuples darbres (TT-MCTAG), un formalisme utilisé en linguistique informatique, et les grammaires à concaténation dintervalles (RCG). Les RCGs sont connues pour décrire exactement la classe PTIME, il a en outre été démontré que les RCGs « simples » sont même équivalentes aux systèmes de réécriture hors-contextes linéaires (LCFRS), en dautres termes, elles sont légèrement sensibles au contexte. TT-MCTAG a été proposé pour modéliser les langages à ordre des mots libre. En général ces langages sont NP-complets. Dans cet article, nous définissons une contrainte additionnelle sur les dérivations autorisées par le formalisme TT-MCTAG. Nous montrons ensuite comment cette forme restreinte de TT-MCTAG peut être convertie en une RCG simple équivalente. Le résultat est intéressant pour des raisons théoriques (puisqu’il montre que la forme restreinte de TT-MCTAG est légèrement sensible au contexte), mais également pour des raisons pratiques (la transformation proposée ici a été utilisée pour implanter un analyseur pour TT-MCTAG).
The aim of this paper is the exploration of an optimality theoretic architecture for syntax that is guided by the concept of "correspondence": syntax is understood as the mechanism of "translating" underlying representations into a surface form. In minimalism, this surface form is called "Phonological Form" (PF). Both semantic and abstract syntactic information are reflected by the surface form. The empirical domain where this architecture is tested are minimal link effects, especially in the case of "wh"-movement. The OT constraints require the surface form to reflect the underlying semantic and syntactic representations as maximally as possible. The means by which underlying relations and properties are encoded are precedence, adjacency, surface morphology and prosodic structure. Information that is not encoded in one of these ways remains unexpressed, and gets lost unless it is recoverable via the context. Different kinds of information are often expressed by the same means. The resulting conflicts are resolved by the relative ranking of the relevant correspondence constraints.
Wer sich einmal in Deutschschweizer IRC-Chatkanälen herumgesehen hat, hat sofort bemerkt, dass neben der Standardsprache häufig Mundart verwendet wird. Eine Analyse der Varietätenverwendung bietet sich an. Es stellt sich die Frage: was bedeutet sprachliche Norm in einem Kommunikationsraum, in dem die Vorgabe, Deutsch zu schreiben, nur heißt nicht Französisch, Italienisch, Türkisch, Serbisch, Portugiesisch usw. zu schreiben, wo also die Standardsprache nur eine der akzeptierten Varietäten ist? Was bedeutet sprachliche Norm, wo Berndeutsch mit /l/-Vokalisierung neben Walliserdeutsch mit archaischen Volltonvokalen in Nebensilben vorkommt, wo für ein standardsprachliches [a:] ‹a, ah, aa, o, oh› oder ‹oo› stehen kann? Der Frage nach einer deskriptiven Norm wird hier nachgegangen, indem Möglichkeiten der Verschriftung einzelner Aspekte aufgezeigt werden und deren Nutzung in regionalen und überregionalen Chaträumen verglichen werden. Aus dem aktuellen Gebrauch wird dann versucht implizite Normen abzuleiten.
Das Zustandspassiv : grammatische Einordnung – Bildungsbeschränkungen – Interpretationsspielraum
(2005)
Distributional approximations to lexical semantics are very useful not only in helping the creation of lexical semantic resources (Kilgariff et al., 2004; Snow et al., 2006), but also when directly applied in tasks that can benefit from large-coverage semantic knowledge such as coreference resolution (Poesio et al., 1998; Gasperin and Vieira, 2004; Versley, 2007), word sense disambiguation (Mc- Carthy et al., 2004) or semantical role labeling (Gordon and Swanson, 2007). We present a model that is built from Webbased corpora using both shallow patterns for grammatical and semantic relations and a window-based approach, using singular value decomposition to decorrelate the feature space which is otherwise too heavily influenced by the skewed topic distribution of Web corpora.
The argument that I tried to elaborate on in this paper is that the conceptual problem behind the traditional competence/performance distinction does not go away, even if we abandon its original Chomskyan formulation. It returns as the question about the relation between the model of the grammar and the results of empirical investigations – the question of empirical verification The theoretical concept of markedness is argued to be an ideal correlate of gradience. Optimality Theory, being based on markedness, is a promising framework for the task of bridging the gap between model and empirical world. However, this task not only requires a model of grammar, but also a theory of the methods that are chosen in empirical investigations and how their results are interpreted, and a theory of how to derive predictions for these particular empirical investigations from the model. Stochastic Optimality Theory is one possible formulation of a proposal that derives empirical predictions from an OT model. However, I hope to have shown that it is not enough to take frequency distributions and relative acceptabilities at face value, and simply construe some Stochastic OT model that fits the facts. These facts first of all need to be interpreted, and those factors that the grammar has to account for must be sorted out from those about which grammar should have nothing to say. This task, to my mind, is more complicated than the picture that a simplistic application of (not only) Stochastic OT might draw.
Der TUSNELDA-Standard : ein Korpusannotierungsstandard zur Unterstützung linguistischer Forschung
(2001)
Die Verwendung von Standards für die Annotierung größerer Sammlungen elektronischer Texte (Korpora) ist eine Voraussetzung für eine mögliche Wiederverwendung dieser Korpora. Dieser Artikel stellt einen Korpusannotierungsstandard vor, der die Anforderungen der Untersuchung unterschiedlichster linguistischer Phänomene berücksichtigt. Der Standard wurde im SFB 441 an der Universität Tübingen entwickelt. Er geht von bestehenden Standards, insbesondere CES und TEI, aus, die sich als teilweise zu ausführlich und zu wenig restriktiv,teilweise auch als nicht ausdrucksstark genug erweisen, um den Bedürfnissen korpusbasierter linguistischer Forschung gerecht zu werden.
Developing linguistic resources, in particular grammars, is known to be a complex task in itself, because of (amongst others) redundancy and consistency issues. Furthermore some languages can reveal themselves hard to describe because of specific characteristics, e.g. the free word order in German. In this context, we present (i) a framework allowing to describe tree-based grammars, and (ii) an actual fragment of a core multicomponent tree-adjoining grammar with tree tuples (TT-MCTAG) for German developed using this framework. This framework combines a metagrammar compiler and a parser based on range concatenation grammar (RCG) to respectively check the consistency and the correction of the grammar. The German grammar being developed within this framework already deals with a wide range of scrambling and extraction phenomena.
In regionalen Schweizer Chaträumen stellt die Mundart mit Anteilen um 80% bis 90% die unmarkierte Varietät dar. Chats bieten somit einen Einblick in die individuell geprägte Verschriftung der Schweizer Dialekte, die sich einerseits regional verschieden präsentiert und andererseits fern von Vereinheitlichungstendenzen liegt. Durch diese Normierungsferne lässt sich aus den Chatdaten in groben Zügen eine Sprachgeographie nachzeichnen, wie sie im Sprachatlas der deutschen Schweiz SDS (1962–1997) festgehalten ist. Hier sollen Reflexe der sprachgeographischen Verteilung in der Verschriftung der flektierten Formen von «haben» nachgezeichnet werden. Neben der grundsätzlichen Bestätigung dieser Struktur zeigen sich in der Analyse auch systematisch Abweichungen, die unter Berücksichtigung der Verschriftungsbarriere Hinweise auf Sprachwandel geben können, die jedoch mit authentischen Daten gesprochener Sprache überprüft werden müssen.
Die Prosodie der Mundarten wurde schon früh als auffälliges und distinktes Merkmal wahrgenommen und in mehreren Arbeiten zur Grammatik des Schweizerdeutschen mittels Musiknoten festgehalten (u. a. J. Vetsch 1910, E. Wipf 1910, K. Schmid 1915, W. Clauss 1927, A. Weber 1948), wobei schon A. Weber (1948, S. 53) anmerkt, "dass sich der musikalische Gang der Rede nicht ohne Gewaltsamkeit mit der üblichen Notenschrift darstellen lässt". Da also eine adäquate Kodierung, eine theoretische Grundlage und die notwendigen phonetischen Instrumente zur Intonationsforschung fehlten, wurden diese ersten Ansätze nicht aus- und weitergeführt. Erst in der Mitte des 20. Jahrhunderts brachte die technische Entwicklung Instrumente zur Messung der Prosodie hervor, die nun durch die Popularisierung der entsprechenden Computerprogramme im Übergang zum 21. Jahrhundert für die linguistische Forschung intensiv und breit genutzt werden können.
Die Sprachen der Städte
(2008)
Die frühen Sprachkarten, für die Georg Wenker Ende des 19. Jh. in über 40.000 Schulorten des deutschen Reiches schriftliche Übersetzungen in die Mundart gesammelt hatte, dokumentieren die Sonderstellung vieler Städte im sprachlichen Raum. Zum Beispiel zeigen Berlin und die nähere Umgebung sprachliche Formen, die sonst erst weiter südlich oder in der Schriftsprache gelten.
Using a qualitative analysis of disagreements from a referentially annotated newspaper corpus, we show that, in coreference annotation, vague referents are prone to greater disagreement. We show how potentially problematic cases can be dealt with in a way that is practical even for larger-scale annotation, considering a real-world example from newspaper text.
In this paper, we investigate the role of sub-optimality in training data for part-of-speech tagging. In particular, we examine to what extent the size of the training corpus and certain types of errors in it affect the performance of the tagger. We distinguish four types of errors: If a word is assigned a wrong tag, this tag can belong to the ambiguity class of the word (i.e. to the set of possible tags for that word) or not; furthermore, the major syntactic category (e.g. "N" or "V") can be correctly assigned (e.g. if a finite verb is classified as an infinitive) or not (e.g. if a verb is classified as a noun). We empirically explore the decrease of performance that each of these error types causes for different sizes of the training set. Our results show that those types of errors that are easier to eliminate have a particularly negative effect on the performance. Thus, it is worthwhile concentrating on the elimination of these types of errors, especially if the training corpus is large.
“Comments are very welcome!” This basic attitude and the many ways of implementing it contribute immensely to the fascination of engaging in scientific research. I am grateful to Theoretical Linguistics for providing a public platform for this kind of scholarly exchange and I thank all commentators for their thoughtful, stimulating, and often challenging contributions to my target article. My response will address two main issues that are raised by the commentaries. The first issue is shaped by a cluster of questions relating to ontology. The second issue concerns questions of methodology pertaining in particular to the problem of judging data.
This paper proposes a compositional semantics for lexicalized tree adjoining grammars (LTAG). Tree-local multicompnent derivations allow seperation of semantiv contribution of a lexical item into one component contributing to the predicate argument structure and second a component contributing to scope semantics. Based on this idea a syntx-semantics interface is presented where the compositional semantics depends only on the derivation structure. It is shown that the derivation structure allows an appropriate amount of underspecification. This is illustrated by investigating underspecified representations for quantifier scpoe ambiguities and related phenomena such as adjunct scope and island constraints.
TT-MCTAG lets one abstract away from the relative order of co-complements in the final derived tree, which is more appropriate than classic TAG when dealing with flexible word order in German. In this paper, we present the analyses for sentential complements, i.e., wh-extraction, thatcomplementation and bridging, and we work out the crucial differences between these and respective accounts in XTAG (for English) and V-TAG (for German).
In this paper we will explore the similarities and differences between two feature logic-based approaches to the composition of semantic representations. The first approach is formulated for Lexicalized Tree Adjoining Grammar (LTAG, Joshi and Schabes 1997), the second is Lexical Ressource Semantics (LRS, Richter and Sailer 2004) and was first defined in Head-driven Phrase Structure Grammar. The two frameworks have several common characteristics that make them easy to compare: 1 They use languages of two sorted type theory for semantic representations. 2. They allow underspecification. LTAG uses scope constraints while LRS provides component-of contraints. 3 They use feature logics for computing semantic representations. 4. they are designed for computational applications. By comparing the two frameworks we will also point outsome characteristics and advantages of feature logic-based semantic computation in genereal.
This paper addresses the problem ofconstraints for relative quantifier sope, in partiular in inverse linking readings wherecertain scope orders are exluded. We show how to account for such restrictions in the Tree Adjoining Grammar (TAG) framework by adopting a notion offlexible composition. In the semantics we use for TAG we introduce quantifier sets that group quantifiers that are "glued" together in the sense that no other quantifieran scopally intervene between them. Theflexible composition approach allows us to obtain the desired quantifier sets and thereby the desiredconstraints for quantifier sope.
This paper is part of a research project on OT Syntax and the typology of the free relative (FR) construction. It concentrates on the details of an OT analysis and some of its consequences for OT syntax. I will not present a general discussion of the phenomenon and the many controversial issues it is famous for in generative syntax.
Chunk parsing has focused on the recognition of partial constituent structures at the level of individual chunks. Little attention has been paid to the question of how such partial analyses can be combined into larger structures for complete utterances. Such larger structures are not only desirable for a deeper syntactic analysis. They also constitute a necessary prerequisite for assigning function-argument structure. The present paper offers a similaritybased algorithm for assigning functional labels such as subject, object, head, complement, etc. to complete syntactic structures on the basis of prechunked input. The evaluation of the algorithm has concentrated on measuring the quality of functional labels. It was performed on a German and an English treebank using two different annotation schemes at the level of function argument structure. The results of 89.73% correct functional labels for German and 90.40%for English validate the general approach.
In the recent literature there is growing interest in the morpho-syntactic encoding of hierarchical effects. The paper investigates one domain where such effects are attested: ergative splits conditioned by person. This type of splits is then compared to hierarchical effects in direct-inverse alternations. On the basis of two case studies (Lummi instantiating an ergative split person language and Passamaquoddy an inverse language) we offer an account that makes no use of hierarchies as a primitive. We propose that the two language types differ as far as the location of person features is concerned. In inverse systems person features are located exclusively in T, while in ergative systems, they are located in T and a particular type of v. A consequence of our analysis is that Case checking in split and inverse systems is guided by the presence/absence of specific phi-features. This in turn provides evidence for a close connection between Case and phi-features, reminiscent of Chomsky’s (2000, 2001) Agree.
In the past, a divide could be seen between ’deep’ parsers on the one hand, which construct a semantic representation out of their input, but usually have significant coverage problems, and more robust parsers on the other hand, which are usually based on a (statistical) model derived from a treebank and have larger coverage, but leave the problem of semantic interpretation to the user. More recently, approaches have emerged that combine the robustness of datadriven (statistical) models with more detailed linguistic interpretation such that the output could be used for deeper semantic analysis. Cahill et al. (2002) use a PCFG-based parsing model in combination with a set of principles and heuristics to derive functional (f-)structures of Lexical-Functional Grammar (LFG). They show that the derived functional structures have a better quality than those generated by a parser based on a state-of-the-art hand-crafted LFG grammar. Advocates of Dependency Grammar usually point out that dependencies already are a semantically meaningful representation (cf. Menzel, 2003). However, parsers based on dependency grammar normally create underspecified representations with respect to certain phenomena such as coordination, apposition and control structures. In these areas they are too "shallow" to be directly used for semantic interpretation. In this paper, we adopt a similar approach to Cahill et al. (2002) using a dependency-based analysis to derive functional structure, and demonstrate the feasibility of this approach using German data. A major focus of our discussion is on the treatment of coordination and other potentially underspecified structures of the dependency data input. F-structure is one of the two core levels of syntactic representation in LFG (Bresnan, 2001). Independently of surface order, it encodes abstract syntactic functions that constitute predicate argument structure and other dependency relations such as subject, predicate, adjunct, but also further semantic information such as the semantic type of an adjunct (e.g. directional). Normally f-structure is captured as a recursive attribute value matrix, which is isomorphic to a directed graph representation. Figure 5 depicts an example target f-structure. As mentioned earlier, these deeper-level dependency relations can be used to construct logical forms as in the approaches of van Genabith and Crouch (1996), who construct underspecified discourse representations (UDRSs), and Spreyer and Frank (2005), who have robust minimal recursion semantics (RMRS) as their target representation. We therefore think that f-structures are a suitable target representation for automatic syntactic analysis in a larger pipeline of mapping text to interpretation. In this paper, we report on the conversion from dependency structures to fstructure. Firstly, we evaluate the f-structure conversion in isolation, starting from hand-corrected dependencies based on the TüBa-D/Z treebank and Versley (2005)´s conversion. Secondly, we start from tokenized text to evaluate the combined process of automatic parsing (using Foth and Menzel (2006)´s parser) and f-structure conversion. As a test set, we randomly selected 100 sentences from TüBa-D/Z which we annotated using a scheme very close to that of the TiGer Dependency Bank (Forst et al., 2004). In the next section, we sketch dependency analysis, the underlying theory of our input representations, and introduce four different representations of coordination. We also describe Weighted Constraint Dependency Grammar (WCDG), the dependency parsing formalism that we use in our experiments. Section 3 characterises the conversion of dependencies to f-structures. Our evaluation is presented in section 4, and finally, section 5 summarises our results and gives an overview of problems remaining to be solved.
Der folgende Text betrachtet die Varietätenverwendung von Schweizer ChatterInnen und rückt dabei altersspezifische Fragen in den Vordergrund. Im Gegensatz zu vielen Versuchen, an die Sprache Jugendlicher heranzugehen, kommt hier ein quantitativer Ansatz zur Anwendung, der die Sprache der jugendlichen ChatterInnen mit der Sprache von ChatterInnen anderer Generationen vergleicht.
In der Abteilung Grammatik des Instituts für Deutsche Sprache, Mannheim, wird derzeit ein neues Projekt entwickelt, und zwar das einer Grammatik des Deutschen im europäischen Vergleich (GDE). Dieses Projekt fügt sich ein in die kontrastive Tradition des IDS, ist jedoch andererseits auch in vieler Hinsicht innovativ. Bevor ich das Projekt im Einzelnen vorstelle, versuche ich den Bogen zurück zu den kontrastiven Grammatiken zu schlagen. Gerade die Leserschaft polnischer Germanisten braucht an die Tradition kontrastiver Grammatikschreibung sicher nicht eigens erinnert zu werden. Denn diese Tradition, die untrennbar mit dem Namen Ulrich Engel verknüpft ist, ist gerade erst in der neu erschienenen deutsch-polnischen kontrastiven Grammatik kulminiert. Im Bereich der kontrastiven Grammatiken zu Sprachenpaaren, von denen das Deutsche ein Element ist, verfügt das IDS also über eine vergleichsweise reiche Tradition. Am IDS oder in Kooperation mit dem IDS wurden kontrastive Grammatiken zu den Sprachenpaaren Deutsch – Französisch (Zemb 1978), Deutsch – Serbokroatisch , Deutsch – Spanisch (Cartegena/Gauger 1989), Deutsch – Rumänisch (Engel u.a. 1993) erarbeitet. Zum Sprachenpaar Englisch – Deutsch liegt mit Hawkins 1986 eine typologisch-vergleichende Grammatik vor. Die deutsch-polnische kontrastive Grammatik, die unter der Leitung von Ulrich Engel erarbeitet wurde, ist 1999 erscheinen. Abraham 1994 und Glinz 1994 konfrontieren das Deutsche, mit durchaus unterschiedlicher Akzentsetzung, mit mehreren anderen europäischen Sprachen. An der Berliner Humboldt-Universität laufen derzeit die Vorarbeiten zu einer deutsch-russischen kontrastiven Grammatik (Initiative Wolfgang Gladrow und Michail Kotin). Die Aufgabe einer 'Grammatik des Deutschen im europäischen Kontext' ist also hinlänglich vorbereitet.
In the last decade, the Penn treebank has become the standard data set for evaluating parsers. The fact that most parsers are solely evaluated on this specific data set leaves the question unanswered how much these results depend on the annotation scheme of the treebank. In this paper, we will investigate the influence which different decisions in the annotation schemes of treebanks have on parsing. The investigation uses the comparison of similar treebanks of German, NEGRA and TüBa-D/Z, which are subsequently modified to allow a comparison of the differences. The results show that deleted unary nodes and a flat phrase structure have a negative influence on parsing quality while a flat clause structure has a positive influence.