• Treffer 94 von 116
Zurück zur Trefferliste

POS tagging for German : how important is the right context?

  • Part-of-Speech tagging is generally performed by Markov models, based on bigram or trigram models. While Markov models have a strong concentration on the left context of a word, many languages require the inclusion of right context for correct disambiguation. We show for German that the best results are reached by a combination of left and right context. If only left context is available, then changing the direction of analysis and going from right to left improves the results. In a version of MBT (Daelemans et al., 1996) with default parameter settings, the inclusion of the right context improved POS tagging accuracy from 94.00% to 96.08%, thus corroborating our hypothesis. The version with optimized parameters reaches 96.73%.
Metadaten
Verfasserangaben:Steliana Ivanova, Sandra KüblerORCiDGND
URN:urn:nbn:de:hebis:30-1110660
URL:http://cl.indiana.edu/~skuebler/papers/postagging.pdf
ISBN:2-9517408-4-0
ISSN:2522-2686
Herausgeber*in:Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias
Dokumentart:Preprint
Sprache:Deutsch
Jahr der Fertigstellung:2008
Jahr der Erstveröffentlichung:2008
Veröffentlichende Institution:Universitätsbibliothek Johann Christian Senckenberg
Datum der Freischaltung:21.10.2008
Freies Schlagwort / Tag:Acquisition; Machine Learning; Tagging
Seitenzahl:4
Bemerkung:
Erschienen in: Nicoletta Calzolari ; Khalid Choukri ; Bente Maegaard ; Joseph Mariani ; Jan Odijk ; Stelios Piperidis ; Daniel Tapias (Hrsg.): Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC-2008), May, 28-30, 2008. Marrakech, Marocco, Paris : ELRA, 2008, S. 994-997, ISBN: 2-9517408-4-0
Quelle:http://jones.ling.indiana.edu/~skuebler/papers/postagging.pdf ; (in:) Proceedings of the Sixth International Conference on Language Resources and Evaluation, LREC 2007 - Marrakesh, 2008.
HeBIS-PPN:205688187
Institute:keine Angabe Fachbereich / Extern
DDC-Klassifikation:4 Sprache / 40 Sprache / 400 Sprache
Sammlungen:Linguistik
Linguistik-Klassifikation:Linguistik-Klassifikation: Computerlinguistik / Computational linguistics
Lizenz (Deutsch):License LogoDeutsches Urheberrecht