Machine-learning-based vs. manually designed approaches to anaphor resolution: the best of two worlds

  • In the last years, much effort went into the design of robust anaphor resolution algorithms. Many algorithms are based on antecedent filtering and preference strategies that are manually designed. Along a different line of research, corpus-based approaches have been investigated that employ machine-learning techniques for deriving strategies automatically. Since the knowledge-engineering effort for designing and optimizing the strategies is reduced, the latter approaches are considered particularly attractive. Since, however, the hand-coding of robust antecedent filtering strategies such as syntactic disjoint reference and agreement in person, number, and gender constitutes a once-for-all effort, the question arises whether at all they should be derived automatically. In this paper, it is investigated what might be gained by combining the best of two worlds: designing the universally valid antecedent filtering strategies manually, in a once-for-all fashion, and deriving the (potentially genre-specific) antecedent selection strategies automatically by applying machine-learning techniques. An anaphor resolution system ROSANA-ML, which follows this paradigm, is designed and implemented. Through a series of formal evaluations, it is shown that, while exhibiting additional advantages, ROSANAML reaches a performance level that compares with the performance of its manually designed ancestor ROSANA.

Volltext Dateien herunterladen

Metadaten exportieren

Weitere Dienste

Teilen auf Twitter Suche bei Google Scholar
Metadaten
Verfasserangaben:Roland StuckardtGND
URN:urn:nbn:de:hebis:30-12948
Titel des übergeordneten Werkes (Englisch):Proc. 4th Discourse Anaphora and Anaphor Resolution Colloquium (DAARC2002), University of Lisbon, Sept. 2002, 211-216
Dokumentart:Konferenzveröffentlichung
Sprache:Englisch
Jahr der Fertigstellung:2002
Jahr der Erstveröffentlichung:2002
Veröffentlichende Institution:Universitätsbibliothek Johann Christian Senckenberg
Datum der Freischaltung:26.07.2005
GND-Schlagwort:Textanalyse; Linguistische Datenverarbeitung; Computerlinguistik
Seitenzahl:6
Quelle:Publ. in: Proc. 4th Discourse Anaphora and Anaphor Resolution Colloquium (DAARC2002), University of Lisbon, Sept. 2002, 211-216
HeBIS-PPN:226532844
Institute:Informatik und Mathematik / Informatik
DDC-Klassifikation:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Lizenz (Deutsch):License LogoDeutsches Urheberrecht