Refine
Year of publication
Document Type
- Conference Proceeding (6)
- Report (4)
- Article (2)
- Diploma Thesis (1)
- Doctoral Thesis (1)
- Lecture (1)
- Review (1)
Has Fulltext
- yes (16)
Is part of the Bibliography
- no (16)
Keywords
Institute
- Informatik (14)
- Gesellschaftswissenschaften (2)
In the last years, much effort went into the design of robust anaphor resolution algorithms. Many algorithms are based on antecedent filtering and preference strategies that are manually designed. Along a different line of research, corpus-based approaches have been investigated that employ machine-learning techniques for deriving strategies automatically. Since the knowledge-engineering effort for designing and optimizing the strategies is reduced, the latter approaches are considered particularly attractive. Since, however, the hand-coding of robust antecedent filtering strategies such as syntactic disjoint reference and agreement in person, number, and gender constitutes a once-for-all effort, the question arises whether at all they should be derived automatically. In this paper, it is investigated what might be gained by combining the best of two worlds: designing the universally valid antecedent filtering strategies manually, in a once-for-all fashion, and deriving the (potentially genre-specific) antecedent selection strategies automatically by applying machine-learning techniques. An anaphor resolution system ROSANA-ML, which follows this paradigm, is designed and implemented. Through a series of formal evaluations, it is shown that, while exhibiting additional advantages, ROSANAML reaches a performance level that compares with the performance of its manually designed ancestor ROSANA.
An anaphor resolution algorithm is presented which relies on a combination of strategies for narrowing down and selecting from antecedent sets for re exive pronouns, nonre exive pronouns, and common nouns. The work focuses on syntactic restrictions which are derived from Chomsky's Binding Theory. It is discussed how these constraints can be incorporated adequately in an anaphor resolution algorithm. Moreover, by showing that pragmatic inferences may be necessary, the limits of syntactic restrictions are elucidated.
Coreference-Based Summarization and Question Answering: a Case for High Precision Anaphor Resolution
(2003)
Approaches to Text Summarization and Question Answering are known to benefit from the availability of coreference information. Based on an analysis of its contributions, a more detailed look at coreference processing for these applications will be proposed: it should be considered as a task of anaphor resolution rather than coreference resolution. It will be further argued that high precision approaches to anaphor resolution optimally match the specific requirements. Three such approaches will be described and empirically evaluated, and the implications for Text Summarization and Question Answering will be discussed.
Syntactic coindexing restrictions are by now known to be of central importance to practical anaphor resolution approaches. Since, in particular due to structural ambiguity, the assumption of the availability of a unique syntactic reading proves to be unrealistic, robust anaphor resolution relies on techniques to overcome this deficiency.
This paper describes the ROSANA approach, which generalizes the verification of coindexing restrictions in order to make it applicable to the deficient syntactic descriptions that are provided by a robust state-of-the-art parser. By a formal evaluation on two corpora that differ with respect to text genre and domain, it is shown that ROSANA achieves high-quality robust coreference resolution. Moreover, by an in-depth analysis, it is proven that the robust implementation of syntactic disjoint reference is nearly optimal. The study reveals that, compared with approaches that rely on shallow preprocessing, the largely nonheuristic disjoint reference algorithmization opens up the possibility/or a slight improvement. Furthermore, it is shown that more significant gains are to be expected elsewhere, particularly from a text-genre-specific choice of preference strategies.
The performance study of the ROSANA system crucially rests on an enhanced evaluation methodology for coreference resolution systems, the development of which constitutes the second major contribution o/the paper. As a supplement to the model-theoretic scoring scheme that was developed for the Message Understanding Conference (MUC) evaluations, additional evaluation measures are defined that, on one hand, support the developer of anaphor resolution systems, and, on the other hand, shed light on application aspects of pronoun interpretation.
In the last years, much effort went into the design of robust anaphor resolution algorithms. Many algorithms are based on antecedent filtering and preference strategies that are manually designed. Along a different line of research, corpus-based approaches have been investigated that employ machine-learning techniques for deriving strategies automatically. Since the knowledge-engineering effort for designing and optimizing the strategies is reduced, the latter approaches are considered particularly attractive. Since, however, the hand-coding of robust antecedent filtering strategies such as syntactic disjoint reference and agreement in person, number, and gender constitutes a once-for-all effort, the question arises whether at all they should be derived automatically. In this paper, it is investigated what might be gained by combining the best of two worlds: designing the universally valid antecedent filtering strategies manually, in a once-for-all fashion, and deriving the (potentially genre-specific) antecedent selection strategies automatically by applying machine-learning techniques. An anaphor resolution system ROSANA-ML, which follows this paradigm, is designed and implemented. Through a series of formal evaluations, it is shown that, while exhibiting additional advantages, ROSANAML reaches a performance level that compares with the performance of its manually designed ancestor ROSANA.
Als Methode zur inhaltlichen Erschließung von Texten, dem überwiegenden Ausgangsmaterial empirischer Untersuchungen, kommt der Inhaltsanalyse in den Sozialwissenschaften eine Schlüsselstellung zu. Allgemein wird unterschieden zwischen quantitativen, wörterbuchbasierten und qualitativen, »hermeneutischen« Verfahren; gemäß der weithin vertretenen Lehrmeinung ist nur die quantitative, einzelwortorientierte Inhaltsanalyse von Computern durchführbar. Der Autor zeigt auf, daß sich auf der Grundlage der Dichotomisierung »quantitativ-qualitativ « kein geeignetes Kriterium ergibt, um die Frage nach Reichweite und Grenzen der algorithmischen Inhaltsanalyse abschließend zu beantworten. Unter interdisziplinärem Rekurs auf aktuelle Entwicklungen in Computerlinguistik, Künstlicher Intelligenz und Kognitionswissenschaften wird der Nachweis erbracht, daß die computergestützte Textinhaltserschließung nicht notwendig auf die Einzelwortanalyse beschränkt ist. Für ein zentrales qualitatives Problem der klassischen wörterbuchbasierten Inhaltsanalyse, die referentielle Interpretation von Pronomen, wird eine algorithmische Lösung erarbeitet, softwaretechnisch umgesetzt und unter Anwendungsbedingungen empirisch evaluiert. Mit der vorliegenden Arbeit gelingt der Nachweis, daß die im Kontext der »Qualitativ- Quantitativ « - Kontroverse postulierten »prinzipiellen Grenzen« der computergestützten Inhaltsanalyse nichtzutreffend, da auf algorithmischem Wege transzendierbar sind. Somit ergeben sich völlig neue Perspektiven für den Einsatz von Computern in der Inhaltsanalyse.