Linguistik
Refine
Year of publication
- 2008 (2) (remove)
Document Type
- Preprint (2) (remove)
Language
- German (2) (remove)
Has Fulltext
- yes (2)
Is part of the Bibliography
- no (2)
Keywords
- Acquisition (1)
- Machine Learning (1)
- Stadtmundart (1)
- Tagging (1)
Institute
- Extern (1)
Die Sprachen der Städte
(2008)
Die frühen Sprachkarten, für die Georg Wenker Ende des 19. Jh. in über 40.000 Schulorten des deutschen Reiches schriftliche Übersetzungen in die Mundart gesammelt hatte, dokumentieren die Sonderstellung vieler Städte im sprachlichen Raum. Zum Beispiel zeigen Berlin und die nähere Umgebung sprachliche Formen, die sonst erst weiter südlich oder in der Schriftsprache gelten.
Part-of-Speech tagging is generally performed by Markov models, based on bigram or trigram models. While Markov models have a strong concentration on the left context of a word, many languages require the inclusion of right context for correct disambiguation. We show for German that the best results are reached by a combination of left and right context. If only left context is available, then changing the direction of analysis and going from right to left improves the results. In a version of MBT (Daelemans et al., 1996) with default parameter settings, the inclusion of the right context improved POS tagging accuracy from 94.00% to 96.08%, thus corroborating our hypothesis. The version with optimized parameters reaches 96.73%.