• Treffer 47 von 82
Zurück zur Trefferliste

Parser evaluation across text types

  • When a statistical parser is trained on one treebank, one usually tests it on another portion of the same treebank, partly due to the fact that a comparable annotation format is needed for testing. But the user of a parser may not be interested in parsing sentences from the same newspaper all over, or even wants syntactic annotations for a slightly different text type. Gildea (2001) for instance found that a parser trained on the WSJ portion of the Penn Treebank performs less well on the Brown corpus (the subset that is available in the PTB bracketing format) than a parser that has been trained only on the Brown corpus, although the latter one has only half as many sentences as the former. Additionally, a parser trained on both the WSJ and Brown corpora performs less well on the Brown corpus than on the WSJ one. This leads us to the following questions that we would like to address in this paper: - Is there a difference in usefulness of techniques that are used to improve parser performance between the same-corpus and the different-corpus case? - Are different types of parsers (rule-based and statistical) equally sensitive to corpus variation? To achieve this, we compared the quality of the parses of a hand-crafted constraint-based parser and a statistical PCFG-based parser that was trained on a treebank of German newspaper text.

Volltext Dateien herunterladen

Metadaten exportieren

Metadaten
Verfasserangaben:Yannick Versley
URN:urn:nbn:de:hebis:30-1111538
URL:http://www.versley.de/versley_tlt05.pdf
Dokumentart:Preprint
Sprache:Englisch
Jahr der Fertigstellung:2005
Jahr der Erstveröffentlichung:2005
Veröffentlichende Institution:Universitätsbibliothek Johann Christian Senckenberg
Datum der Freischaltung:04.11.2008
Seitenzahl:12
Bemerkung:
Erschienen in: Proceedings of the Fourth Workshop on Treebanks and Linguistic Theories, TLT 2005, Barcelona, Spain, S. 209-220
Quelle:Arbeitspapier vom Fourth Workshop on Treebanks and Linguistic Theories (TLT 2005) ; http://www.versley.de/versley_tlt05.pdf
HeBIS-PPN:207012296
Institute:keine Angabe Fachbereich / Extern
DDC-Klassifikation:4 Sprache / 40 Sprache / 400 Sprache
Sammlungen:Linguistik
Linguistik-Klassifikation:Linguistik-Klassifikation: Computerlinguistik / Computational linguistics
Lizenz (Deutsch):License LogoDeutsches Urheberrecht