Sometimes less is more : Romanian word sense disambiguation revisited
- Recent approaches to Word Sense Disambiguation (WSD) generally fall into two classes: (1) information-intensive approaches and (2) information-poor approaches. Our hypothesis is that for memory-based learning (MBL), a reduced amount of data is more beneficial than the full range of features used in the past. Our experiments show that MBL combined with a restricted set of features and a feature selection method that minimizes the feature set leads to competitive results, outperforming all systems that participated in the SENSEVAL-3 competition on the Romanian data. Thus, with this specific method, a tightly controlled feature set improves the accuracy of the classifier, reaching 74.0% in the fine-grained and 78.7% in the coarse-grained evaluation.
Verfasserangaben: | Georgiana Dinu, Sandra KüblerORCiDGND |
---|---|
URN: | urn:nbn:de:hebis:30-1111249 |
URL: | http://cl.indiana.edu/~skuebler/papers/romwsd.pdf |
ISBN: | 978-954-91743-7-3 |
Herausgeber*in: | Galia Angelova, Kalina Bontcheva, Ruslan Mitkov, Nicolas Nicolov, Nikolai Nikolov |
Dokumentart: | Preprint |
Sprache: | Englisch |
Jahr der Fertigstellung: | 2007 |
Jahr der Erstveröffentlichung: | 2007 |
Veröffentlichende Institution: | Universitätsbibliothek Johann Christian Senckenberg |
Datum der Freischaltung: | 03.11.2008 |
Freies Schlagwort / Tag: | Rumänisch Romanian; Word Sense Disambiguation; memory-based learning |
Seitenzahl: | 5 |
Bemerkung: | Erschienen in: Galia Angelova ; Kalina Bontcheva ; Ruslan Mitkov ; Nicolas Nicolov ; Nikolai Nikolov (Hrsg.): International Conference Recent Advances in Natural Language Processing : proceedings, Shoumen : Incoma, 2007, S. 173-177, ISBN: 978-954-91743-7-3 |
Quelle: | http://jones.ling.indiana.edu/~skuebler/papers/romwsd.pdf ; Proceedings of the International Conference on Recent Advances in Natural Language Processing, RANLP 2007 - Borovets, Bulgaria, September 2007. |
HeBIS-PPN: | 206928882 |
Institute: | keine Angabe Fachbereich / Extern |
DDC-Klassifikation: | 4 Sprache / 40 Sprache / 400 Sprache |
Sammlungen: | Linguistik |
Linguistik-Klassifikation: | Linguistik-Klassifikation: Computerlinguistik / Computational linguistics |
Lizenz (Deutsch): | Deutsches Urheberrecht |