OPUS 4 | Search

Why is German dependency parsing more reliable than constituent parsing? (2006)

In recent years, research in parsing has extended in several new directions. One of these directions is concerned with parsing languages other than English. Treebanks have become available for many European languages, but also for Arabic, Chinese, or Japanese. However, it was shown that parsing results on these treebanks depend on the types of treebank annotations used. Another direction in parsing research is the development of dependency parsers. Dependency parsing profits from the non-hierarchical nature of dependency relations, thus lexical information can be included in the parsing process in a much more natural way. Especially machine learning based approaches are very successful (cf. e.g.). The results achieved by these dependency parsers are very competitive although comparisons are difficult because of the differences in annotation. For English, the Penn Treebank has been converted to dependencies. For this version, Nivre et al. report an accuracy rate of 86.3%, as compared to an F-score of 92.1 for Charniaks parser. The Penn Chinese Treebank is also available in a constituent and a dependency representations. The best results reported for parsing experiments with this treebank give an F-score of 81.8 for the constituent version and 79.8% accuracy for the dependency version. The general trend in comparisons between constituent and dependency parsers is that the dependency parser performs slightly worse than the constituent parser. The only exception occurs for German, where F-scores for constituent plus grammatical function parses range between 51.4 and 75.3, depending on the treebank, NEGRA or TüBa-D/Z. The dependency parser based on a converted version of Tüba-D/Z, in contrast, reached an accuracy of 83.4%, i.e. 12 percent points better than the best constituent analysis including grammatical functions.

What linguists always wanted to know about german and did not know how to estimate (2006)

Hinrichs, Erhard ; Kübler, Sandra

This paper profiles significant differences in syntactic distribution and differences in word class frequencies for two treebanks of spoken and written German: the TüBa-D/S, a treebank of transliterated spontaneous dialogues, and the TüBa-D/Z treebank of newspaper articles published in the German daily newspaper die tageszeitung´(taz). The approach can be used more generally as a means of distinguishing and classifying language corpora of different genres.

Treebank profiling of spoken and written German (2005)

Hinrichs, Erhard ; Kübler, Sandra

This paper profiles significant differences in syntactic distribution and differences in word class frequencies for two treebanks of spoken and written German: the TüBa-D/S, a treebank of transliterated spontaneous dialogs, and the TüBa-D/Z treebank of newspaper articles published in the German daily newspaper ´die tageszeitung´(taz). The approach can be used more generally as a means of distinguishing and classifying language corpora of different genres.

The phylogenetic relationships of Morgan's Sphinx, Xanthopan morganii (Walker), the tribe Acherontiini, and allied long-tongued hawkmoths (Lepidoptera: Sphingidae, Sphinginae) (2002)

Kitching, Ian J.

A cladistic analysis is presented of the hawkmoths of the tribe Acherontiini, Morgan´s Sphinx (Xanthopan morganii (Walker», and related genera. The study aims to test the monophyly of tribe Acherontiini; the hypothesis that all taxa with extremely long probosces (some Acherontiini, Meganoton rubescens, Neococytius, Xanthopan) form a monophyletic group, or at least fall within a single reasonably compact clade; and, within this group, to determine whether Xanthopan is more closely related to Acherontiini or to COCytillS and Neococytius. The data set comprises 109 characters derived from adult and immature stage morphology, biology and behaviour. These data were analysed using equal weighting, successive approximations character weighting (SACW) and implied weighting. All weighting schemes agreed on the monophyly of Acherontiini and of a group of genera comprising Amphimoea, Cocytius and Neococytius (the Cocytius group). Several other generic and suprageneric clades were also consistently recovered. However, those hawkmoths with extremely long probosces were never recovered as a monophyletic group. The relationships of Xanthopan were also ambiguous. Equal weighting and SACW placedXanthopan + Meganoton rztbescens (Butler) as sister to the COCytills group, while implied weighting placed Xanthopan as sister to Acherontiini. This latter relationship is based primarily on shared possession of a pilifer/palp hearing organ. Further analyses suggested the two components of this organ were not biologically independent. Downweighting this feature accordingly resulted in all weighting schemes converging on the topology found by equal weighting. Exclusion of the incomplete subset of immature stage data had no effect under implied weighting but equal weighting and SACW now recovered a Neotropical clade comprising Manduca. and the Cocytius group, while Xanthopan was placed with M. rubescens and Panogena. Downweighting the pilifer/palp hearing organ under implied weighting again caused convergence with the equal weighting/SACW results. Thus, the relationships of Xanthopan remain equivocal and further data, particularly from the immature stages, will be required to elucidate its phylogenetic position further.

Towards case-based parsing : are chunks reliable indicators for syntax trees? (2006)

Kübler, Sandra

This paper presents an approach to the question whether it is possible to construct a parser based on ideas from case-based reasoning. Such a parser would employ a partial analysis of the input sentence to select a (nearly) complete syntax tree and then adapt this tree to the input sentence. The experiments performed on German data from the Tüba-D/Z treebank and the KaRoPars partial parser show that a wide range of levels of generality can be reached, depending on which types of information are used to determine the similarity between input sentence and training sentences. The results are such that it is possible to construct a case-based parser. The optimal setting out of those presented here need to be determined empirically.

Towards a dependency-oriented evaluation for partial parsing (2002)

Kübler, Sandra ; Telljohann, Heike

Quantitative evaluation of parsers has traditionally centered around the PARSEVAL measures of crossing brackets, (labeled) precision, and (labeled) recall. However, it is well known that these measures do not give an accurate picture of the quality of the parsers output. Furthermore, we will show that they are especially unsuited for partial parsers. In recent years, research has concentrated on dependencybased evaluation measures. We will show in this paper that such a dependency-based evaluation scheme is particularly suitable for partial parsers. TüBa-D, the treebank used here for evaluation, contains all the necessary dependency information so that the conversion of trees into a dependency structure does not have to rely on heuristics. Therefore, the dependency representations are not only reliable, they are also linguistically motivated and can be used for linguistic purposes.

The Tüba-D/Z treebank : annotating German with a context-free backbone (2004)

Telljohann, Heike ; Hinrichs, Erhard ; Kübler, Sandra

The purpose of this paper is to describe the TüBa-D/Z treebank of written German and to compare it to the independently developed TIGER treebank (Brants et al., 2002). Both treebanks, TIGER and TüBa-D/Z, use an annotation framework that is based on phrase structure grammar and that is enhanced by a level of predicate-argument structure. The comparison between the annotation schemes of the two treebanks focuses on the different treatments of free word order and discontinuous constituents in German as well as on differences in phrase-internal annotation.

The earliest Gullah/AAVE texts : a case of 19th century mesolectal variation (2003)

Troike, Rudolph C.

The earliest known extensive texts in Gullah (and perhaps African American Vernacular English as well) to appear in print were published in The Riverside Magazine for Young People in November, 1868, under the title "Negro Fables" (p. 505-507). These are four animal stories, which the editor of the magazine, Horace Elisha Scudder, described in his column only as having been "taken down from the lips of an old negro, in the vicinity of Charleston" (see Appendix for the editor´s comments and the full text of the stories).2 The Story-Teller was evidently a genuine "man of words" (Abrahams, 1983), a true raconteur who could artistically embellish a simple traditional account (perhaps further embellished by the transcriber) in a variety of ways. That he commanded a certain range of Gullah is evident from particular signature features in the texts, but the absence of other typical Gullah features and the presence of shared Gullah/African American Vernacular English usages, together with the periodic appearance of standard English forms, demonstrate that these texts provide perhaps the earliest actual documentation (apart from early tertiary comments, cited e.g. in Feagin, 1997, p. 128-129) of register variation or style/code-switching among Gullah speakers. ...

The PaGe 2008 shared task on parsing German (2008)

Kübler, Sandra

The ACL 2008 Workshop on Parsing German features a shared task on parsing German. The goal of the shared task was to find reasons for the radically different behavior of parsers on the different treebanks and between constituent and dependency representations. In this paper, we describe the task and the data sets. In addition, we provide an overview of the test results and a first analysis.

The CoNLL 2007 shared task on dependency parsing (2007)

Nivre, Joakim ; Hall, Johan ; Kübler, Sandra ; McDonald, Ryan ; Nilsson, Jens ; Riedel, Sebastian ; Yuret, Deniz

The Conference on Computational Natural Language Learning features a shared task, in which participants train and test their learning systems on the same data sets. In 2007, as in 2006, the shared task has been devoted to dependency parsing, this year with both a multilingual track and a domain adaptation track. In this paper, we define the tasks of the different tracks and describe how the data sets were created from existing treebanks for ten languages. In addition, we characterize the different approaches of the participating systems, report the test results, and provide a first analysis of these results.

Open Access

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Institute

75128 search hits