• Treffer 4 von 4
Zurück zur Trefferliste

Evaluating POS tagging under sub-optimal conditions : or: does meticulousness pay?

  • In this paper, we investigate the role of sub-optimality in training data for part-of-speech tagging. In particular, we examine to what extent the size of the training corpus and certain types of errors in it affect the performance of the tagger. We distinguish four types of errors: If a word is assigned a wrong tag, this tag can belong to the ambiguity class of the word (i.e. to the set of possible tags for that word) or not; furthermore, the major syntactic category (e.g. "N" or "V") can be correctly assigned (e.g. if a finite verb is classified as an infinitive) or not (e.g. if a verb is classified as a noun). We empirically explore the decrease of performance that each of these error types causes for different sizes of the training set. Our results show that those types of errors that are easier to eliminate have a particularly negative effect on the performance. Thus, it is worthwhile concentrating on the elimination of these types of errors, especially if the training corpus is large.
Metadaten
Verfasserangaben:Sandra KüblerORCiDGND, Andreas Wagner
URN:urn:nbn:de:hebis:30-1110556
URL:http://cl.indiana.edu/~skuebler/papers/acidca.ps
Dokumentart:Preprint
Sprache:Englisch
Jahr der Fertigstellung:2000
Jahr der Erstveröffentlichung:2000
Veröffentlichende Institution:Universitätsbibliothek Johann Christian Senckenberg
Datum der Freischaltung:21.10.2008
Freies Schlagwort / Tag:speech tagging
Seitenzahl:5
Bemerkung:
Erschienen in: Proceedings of International Conference on Artificial and Computational Intelligence for Decision, Control and Automation in Engineering and Industrial Applications (ACIDCA 2000), March 2000
Quelle:http://jones.ling.indiana.edu/~skuebler/papers/acidca.ps ; Proceedings of International Conference on Artificial and Computational Intelligence for Decision, Control and Automation in Engineering and Industrial Applications ACIDCA2000 (Tunisia 2000).
HeBIS-PPN:206762615
Institute:keine Angabe Fachbereich / Extern
DDC-Klassifikation:4 Sprache / 40 Sprache / 400 Sprache
Sammlungen:Linguistik
Linguistik-Klassifikation:Linguistik-Klassifikation: Computerlinguistik / Computational linguistics
Lizenz (Deutsch):License LogoDeutsches Urheberrecht