• Treffer 4 von 8
Zurück zur Trefferliste

How to compare treebanks

  • Recent years have seen an increasing interest in developing standards for linguistic annotation, with a focus on the interoperability of the resources. This effort, however, requires a profound knowledge of the advantages and disadvantages of linguistic annotation schemes in order to avoid importing the flaws and weaknesses of existing encoding schemes into the new standards. This paper addresses the question how to compare syntactically annotated corpora and gain insights into the usefulness of specific design decisions. We present an exhaustive evaluation of two German treebanks with crucially different encoding schemes. We evaluate three different parsers trained on the two treebanks and compare results using EVALB, the Leaf-Ancestor metric, and a dependency-based evaluation. Furthermore, we present TePaCoC, a new testsuite for the evaluation of parsers on complex German grammatical constructions. The testsuite provides a well thought-out error classification, which enables us to compare parser output for parsers trained on treebanks with different encoding schemes and provides interesting insights into the impact of treebank annotation schemes on specific constructions like PP attachment or non-constituent coordination.

Volltext Dateien herunterladen

Metadaten exportieren

Metadaten
Verfasserangaben:Sandra KüblerORCiDGND, Wolfgang Maier, Ines Rehbein, Yannick Versley
URN:urn:nbn:de:hebis:30-1110595
URL:http://cl.indiana.edu/~skuebler/papers/german_parsing.pdf
ISBN:2-9517408-4-0
Herausgeber*in:Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias
Dokumentart:Preprint
Sprache:Englisch
Jahr der Fertigstellung:2008
Jahr der Erstveröffentlichung:2008
Veröffentlichende Institution:Universitätsbibliothek Johann Christian Senckenberg
Datum der Freischaltung:21.10.2008
Seitenzahl:8
Bemerkung:
Erschienen in: Nicoletta Calzolari ; Khalid Choukri ; Bente Maegaard ; Joseph Mariani ; Jan Odijk ; Stelios Piperidis ; Daniel Tapias (Hrsg.): Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC-2008), May, 28-30, 2008. Marrakech, Marocco, Paris : ELRA, 2008, S. 2322-2329, ISBN: 2-9517408-4-0
Quelle:http://jones.ling.indiana.edu/~skuebler/papers/german_parsing.pdf ; (in:) Proceedings of the Sixth International Conference on Language Resources and Evaluation, LREC 2007 - Marrakesh, 2008.
HeBIS-PPN:206763840
Institute:keine Angabe Fachbereich / Extern
DDC-Klassifikation:4 Sprache / 40 Sprache / 400 Sprache
Sammlungen:Linguistik
Linguistik-Klassifikation:Linguistik-Klassifikation: Computerlinguistik / Computational linguistics
Lizenz (Deutsch):License LogoDeutsches Urheberrecht