Why is German dependency parsing more reliable than constituent parsing?

  • In recent years, research in parsing has extended in several new directions. One of these directions is concerned with parsing languages other than English. Treebanks have become available for many European languages, but also for Arabic, Chinese, or Japanese. However, it was shown that parsing results on these treebanks depend on the types of treebank annotations used. Another direction in parsing research is the development of dependency parsers. Dependency parsing profits from the non-hierarchical nature of dependency relations, thus lexical information can be included in the parsing process in a much more natural way. Especially machine learning based approaches are very successful (cf. e.g.). The results achieved by these dependency parsers are very competitive although comparisons are difficult because of the differences in annotation. For English, the Penn Treebank has been converted to dependencies. For this version, Nivre et al. report an accuracy rate of 86.3%, as compared to an F-score of 92.1 for Charniaks parser. The Penn Chinese Treebank is also available in a constituent and a dependency representations. The best results reported for parsing experiments with this treebank give an F-score of 81.8 for the constituent version and 79.8% accuracy for the dependency version. The general trend in comparisons between constituent and dependency parsers is that the dependency parser performs slightly worse than the constituent parser. The only exception occurs for German, where F-scores for constituent plus grammatical function parses range between 51.4 and 75.3, depending on the treebank, NEGRA or TüBa-D/Z. The dependency parser based on a converted version of Tüba-D/Z, in contrast, reached an accuracy of 83.4%, i.e. 12 percent points better than the best constituent analysis including grammatical functions.
Author:Sandra KüblerORCiDGND, Jelena Prokic
Document Type:Preprint
Year of Completion:2006
Year of first Publication:2006
Publishing Institution:Universitätsbibliothek Johann Christian Senckenberg
Release Date:2008/11/03
Page Number:12
Erschienen in: Proceedings of the Fifth International Workshop on Treebanks and Linguistic Theories (TLT 2006)
Source:http://jones.ling.indiana.edu/~skuebler/papers/german.pdf ; Proceedings of the Fifth International Workshop on Treebanks and Linguistic Theories - Prague, Czech Republic, December 2006.
Institutes:keine Angabe Fachbereich / Extern
Dewey Decimal Classification:4 Sprache / 40 Sprache / 400 Sprache
Linguistik-Klassifikation:Linguistik-Klassifikation: Computerlinguistik / Computational linguistics
Licence (German):License LogoDeutsches Urheberrecht