OPUS 4 | Linguistik

A testsuite for testing parser performance on complex German grammatical constructions [TePaCoC - a corpus for testing parser performance on complex German grammatical constructions] (2009)

Kübler, Sandra ; Rehbein, Ines ; Genabith, Josef van

Traditionally, parsers are evaluated against gold standard test data. This can cause problems if there is a mismatch between the data structures and representations used by the parser and the gold standard. A particular case in point is German, for which two treebanks (TiGer and TüBa-D/Z) are available with highly different annotation schemes for the acquisition of (e.g.) PCFG parsers. The differences between the TiGer and TüBa-D/Z annotation schemes make fair and unbiased parser evaluation difficult [7, 9, 12]. The resource (TEPACOC) presented in this paper takes a different approach to parser evaluation: instead of providing evaluation data in a single annotation scheme, TEPACOC uses comparable sentences and their annotations for 5 selected key grammatical phenomena (with 20 sentences each per phenomena) from both TiGer and TüBa-D/Z resources. This provides a 2 times 100 sentence comparable testsuite which allows us to evaluate TiGer-trained parsers against the TiGer part of TEPACOC, and TüBa-D/Z-trained parsers against the TüBa-D/Z part of TEPACOC for key phenomena, instead of comparing them against a single (and potentially biased) gold standard. To overcome the problem of inconsistency in human evaluation and to bridge the gap between the two different annotation schemes, we provide an extensive error classification, which enables us to compare parser output across the two different treebanks. In the remaining part of the paper we present the testsuite and describe the grammatical phenomena covered in the data. We discuss the different annotation strategies used in the two treebanks to encode these phenomena and present our error classification of potential parser errors.

A unified representation for morphological, syntactic, semantic, and referential annotations (2004)

Hinrichs, Erhard ; Kübler, Sandra ; Naumann, Karin

This paper reports on the SYN-RA (SYNtax-based Reference Annotation) project, an on-going project of annotating German newspaper texts with referential relations. The project has developed an inventory of anaphoric and coreference relations for German in the context of a unified, XML-based annotation scheme for combining morphological, syntactic, semantic, and anaphoric information. The paper discusses how this unified annotation scheme relates to other formats currently discussed in the literature, in particular the annotation graph model of Bird and Liberman (2001) and the pie-in-thesky scheme for semantic annotation.

An earley parsing algorithm for range concatenation grammars (2009)

Kallmeyer, Laura ; Maier, Wolfgang ; Parmentier, Yannick

We present a CYK and an Earley-style algorithm for parsing Range Concatenation Grammar (RCG), using the deductive parsing framework. The characteristic property of the Earley parser is that we use a technique of range boundary constraint propagation to compute the yields of non-terminals as late as possible. Experiments show that, compared to previous approaches, the constraint propagation helps to considerably decrease the number of items in the chart.

An HSPG-to-CFG Approximation of Japanese (2000)

Kiefer, Bernd ; Krieger, Hans-Ulrich ; Siegel, Melanie

We present a simple approximation method for turning a Head-Driven Phrase Structure Grammar into a context-free grammar. The approximation method can be seen as the construction of the least fixpoint of a certain monotonic function. We discuss an experiment with a large HPSG for Japanese.

An integrated architecture for shallow and deep processing (2002)

Crysmann, Berthold ; Frank, Anette ; Kiefer, Bernd ; Müller, Stefan ; Neumann, Günter ; Piskorski, Jakub ; Schäfer, Ulrich ; Siegel, Melanie ; Uszkoreit, Hans ; Xu, Feiyu ; Becker, Markus ; Krieger, Hans-Ulrich

We present an architecture for the integration of shallow and deep NLP components which is aimed at flexible combination of different language technologies for a range of practical current and future applications. In particular, we describe the integration of a high-level HPSG parsing system with different high-performance shallow components, ranging from named entity recognition to chunk parsing and shallow clause recognition. The NLP components enrich a representation of natural language text with layers of new XML meta-information using a single shared data structure, called the text chart. We describe details of the integration methods, and show how information extraction and language checking applications for realworld German text benefit from a deep grammatical analysis.

An interesting couple: the semantic development of dyad morphemes (2003)

Evans, Nicholas

Most systematic discussion of dyad morphemes has focussed on Australian languages, owing to a combination of their relative prevalence there, and the development of a descriptive tradition that investigates them in some depth. In the course of researching this paper, however, I became aware of functionally and semantically similar morphemes in many other parts of the world, almost invariably described in isolation from any typological reference point. I have incorporated such data as far as I am aware of it, in the hope that a systematic study will encourage other investigators to identify, and investigate in detail, similar constructions in a range of languages. The current state of our research, however, as well as some interesting geographical skewings that I discuss below, such that outside Australia dyad constructions almost exclusively employ reciprocal morphology, means that most of this paper will focus on Australian languages.

Annotating honorifics denoting social ranking of referents (2005)

Nariyama, Shigeko ; Nakaiwa, Hiromi ; Siegel, Melanie

This paper proposes an annotating scheme that encodes honorifics (respectful words). Honorifics are used extensively in Japanese, reflecting the social relationship (e.g. social ranks and age) of the referents. This referential information is vital for resolving zero pronouns and improving machine translation outputs. Annotating honorifics is a complex task that involves identifying a predicate with honorifics, assigning ranks to referents of the predicate, calibrating the ranks, and connecting referents with their predicates.

Annotation compatibility working group report (2006)

This report explores the question of compatibility between annotation projects including translating annotation formalisms to each other or to common forms. Compatibility issues are crucial for systems that use the results of multiple annotation projects. We hope that this report will begin a concerted effort in the field to track the compatibility of annotation schemes for part of speech tagging, time annotation, treebanking, role labeling and other phenomena.

Antecedent selection techniques for high-recall roreference resolution (2007)

Versley, Yannick

We investigate methods to improve the recall in coreference resolution by also trying to resolve those definite descriptions where no earlier mention of the referent shares the same lexical head (coreferent bridging). The problem, which is notably harder than identifying coreference relations among mentions which have the same lexical head, has been tackled with several rather different approaches, and we attempt to provide a meaningful classification along with a quantitative comparison. Based on the different merits of the methods, we discuss possibilities to improve them and show how they can be effectively combined.

Argument structure in nominalizations : the case of the light verb construction in German (2007)

Wittenberg, Eva ; Piñango, Maria Mercedes

The predicate associated with the verb fails to express its full argument structure, while the predicate associated with the nominalization preserves its original argument structure.

Open Access

Linguistik

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Institute

141 search hits