How do treebank annotation schemes influence parsing results? : or how not to compare apples and oranges

  • In the last decade, the Penn treebank has become the standard data set for evaluating parsers. The fact that most parsers are solely evaluated on this specific data set leaves the question unanswered how much these results depend on the annotation scheme of the treebank. In this paper, we will investigate the influence which different decisions in the annotation schemes of treebanks have on parsing. The investigation uses the comparison of similar treebanks of German, NEGRA and TüBa-D/Z, which are subsequently modified to allow a comparison of the differences. The results show that deleted unary nodes and a flat phrase structure have a negative influence on parsing quality while a flat clause structure has a positive influence.
Author:Sandra KüblerORCiDGND
Editor:Galia Angelova, Kalina Bontcheva, Ruslan Mitkov, Nicolas Nicolov, Nikolai Nikolov
Document Type:Preprint
Year of Completion:2005
Year of first Publication:2005
Publishing Institution:Universitätsbibliothek Johann Christian Senckenberg
Release Date:2008/10/21
Page Number:8
Erschienen in: Galia Angelova ; Kalina Bontcheva ; Ruslan Mitkov ; Nicolas Nicolov ; Nikolai Nikolov (Hrsg.): International conference recent advances in natural language processing : proceedings, Borovets, Bulgaria, 21-23 September 2005, Shoumen : Incoma, 2005, S. 293-300, ISBN: 954-91743-3-6
Source: ; (in:) Proceedings of RANLP 2005 - Borovets, 2005.
Institutes:keine Angabe Fachbereich / Extern
Dewey Decimal Classification:4 Sprache / 40 Sprache / 400 Sprache
Linguistik-Klassifikation:Linguistik-Klassifikation: Computerlinguistik / Computational linguistics
Licence (German):License LogoDeutsches Urheberrecht