TY - INPR A1 - Kübler, Sandra A1 - Mohamed, Emad A2 - Calzolari, Nicoletta A2 - Choukri, Khalid A2 - Maegaard, Bente A2 - Mariani, Joseph A2 - Odijk, Jan A2 - Piperidis, Stelios A2 - Tapias, Daniel T1 - Memory-based vocalization of Arabic N2 - The problem of vocalization, or diacritization, is essential to many tasks in Arabic NLP. Arabic is generally written without the short vowels, which leads to one written form having several pronunciations with each pronunciation carrying its own meaning(s). In the experiments reported here, we define vocalization as a classification problem in which we decide for each character in the unvocalized word whether it is followed by a short vowel. We investigate the importance of different types of context. Our results show that the combination of using memory-based learning with only a word internal context leads to a word error rate of 6.64%. If a lexical context is added, the results deteriorate slowly. KW - Arabisch Y1 - 2008 UR - http://publikationen.ub.uni-frankfurt.de/frontdoor/index/index/docId/9885 UR - https://nbn-resolving.org/urn:nbn:de:hebis:30-1110645 UR - http://cl.indiana.edu/~skuebler/papers/vocal.pdf SN - 2-9517408-4-0 N1 - Erschienen in: Nicoletta Calzolari ; Khalid Choukri ; Bente Maegaard ; Joseph Mariani ; Jan Odijk ; Stelios Piperidis ; Stelios Piperidis ; Daniel Tapias (Hrsg.): Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC-2008), May, 28-30, 2008. Marrakech, Marocco, Paris : ELRA, 2008, S. 2322-2329, ISBN: 2-9517408-4-0 ER -