Machine-learning-based vs. manually designed approaches to anaphor resolution: the best of two worlds

In the last years, much effort went into the design of robust anaphor resolution algorithms. Many algorithms are based on antecedent filtering and preference strategies that are manually designed. Along a different line of research, corpus-based approaches have been investigated that employ machine-learning techniques for deriving strategies automatically. Since the knowledge-engineering effort for designing and optimizing the strategies is reduced, the latter approaches are considered particularly attractive. Since, however, the hand-coding of robust antecedent filtering strategies such as syntactic disjoint reference and agreement in person, number, and gender constitutes a once-for-all effort, the question arises whether at all they should be derived automatically. In this paper, it is investigated what might be gained by combining the best of two worlds: designing the universally valid antecedent filtering strategies manually, in a once-for-all fashion, and deriving the (potentially genre-specific) antecedent selection strategies automatically by applying machine-learning techniques. An anaphor resolution system ROSANA-ML, which follows this paradigm, is designed and implemented. Through a series of formal evaluations, it is shown that, while exhibiting additional advantages, ROSANAML reaches a performance level that compares with the performance of its manually designed ancestor ROSANA.

Metadaten
Verfasserangaben:	Roland Stuckardt GND
URN:	urn:nbn:de:hebis:30-12948
Titel des übergeordneten Werkes (Englisch):	Proc. 4th Discourse Anaphora and Anaphor Resolution Colloquium (DAARC2002), University of Lisbon, Sept. 2002, 211-216
Dokumentart:	Konferenzveröffentlichung
Sprache:	Englisch
Jahr der Fertigstellung:	2002
Jahr der Erstveröffentlichung:	2002
Veröffentlichende Institution:	Universitätsbibliothek Johann Christian Senckenberg
Datum der Freischaltung:	26.07.2005
GND-Schlagwort:	Textanalyse; Linguistische Datenverarbeitung; Computerlinguistik
Seitenzahl:	6
Quelle:	Publ. in: Proc. 4th Discourse Anaphora and Anaphor Resolution Colloquium (DAARC2002), University of Lisbon, Sept. 2002, 211-216
HeBIS-PPN:	226532844
Institute:	Informatik und Mathematik / Informatik
DDC-Klassifikation:	0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Lizenz (Deutsch):	Deutsches Urheberrecht

Open Access