Design and enhanced evaluation of a robust anaphor resolution algorithm

  • Syntactic coindexing restrictions are by now known to be of central importance to practical anaphor resolution approaches. Since, in particular due to structural ambiguity, the assumption of the availability of a unique syntactic reading proves to be unrealistic, robust anaphor resolution relies on techniques to overcome this deficiency. This paper describes the ROSANA approach, which generalizes the verification of coindexing restrictions in order to make it applicable to the deficient syntactic descriptions that are provided by a robust state-of-the-art parser. By a formal evaluation on two corpora that differ with respect to text genre and domain, it is shown that ROSANA achieves high-quality robust coreference resolution. Moreover, by an in-depth analysis, it is proven that the robust implementation of syntactic disjoint reference is nearly optimal. The study reveals that, compared with approaches that rely on shallow preprocessing, the largely nonheuristic disjoint reference algorithmization opens up the possibility/or a slight improvement. Furthermore, it is shown that more significant gains are to be expected elsewhere, particularly from a text-genre-specific choice of preference strategies. The performance study of the ROSANA system crucially rests on an enhanced evaluation methodology for coreference resolution systems, the development of which constitutes the second major contribution o/the paper. As a supplement to the model-theoretic scoring scheme that was developed for the Message Understanding Conference (MUC) evaluations, additional evaluation measures are defined that, on one hand, support the developer of anaphor resolution systems, and, on the other hand, shed light on application aspects of pronoun interpretation.

Download full text files

Export metadata

Additional Services

Share in Twitter Search Google Scholar
Author:Roland StuckardtGND
Parent Title (English):Computational linguistics
Document Type:Article
Year of Completion:2002
Year of first Publication:2001
Publishing Institution:Universit├Ątsbibliothek Johann Christian Senckenberg
Release Date:2005/07/26
GND Keyword:Textanalyse ; Linguistische Datenverarbeitung; Computerlinguistik
Page Number:28
First Page:479
Last Page:506
Institutes:Informatik und Mathematik / Informatik
Dewey Decimal Classification:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
4 Sprache / 41 Linguistik / 410 Linguistik
Licence (German):License LogoDeutsches Urheberrecht