Linguistik-Klassifikation
Refine
Year of publication
- 2004 (13) (remove)
Document Type
- Preprint (7)
- Conference Proceeding (4)
- Article (1)
- Working Paper (1)
Language
- English (13)
Has Fulltext
- yes (13)
Is part of the Bibliography
- no (13)
Keywords
- Computerlinguistik (4)
- Deutsch (2)
- Frage (2)
- Japanisch (2)
- Semantik (2)
- German (1)
- Grammaires d’Arbres Adjoints (1)
- Korean (1)
- Koreanisch (1)
- Maschinelle Übersetzung (1)
Institute
- Extern (13) (remove)
This paper reports on the SYN-RA (SYNtax-based Reference Annotation) project, an on-going project of annotating German newspaper texts with referential relations. The project has developed an inventory of anaphoric and coreference relations for German in the context of a unified, XML-based annotation scheme for combining morphological, syntactic, semantic, and anaphoric information. The paper discusses how this unified annotation scheme relates to other formats currently discussed in the literature, in particular the annotation graph model of Bird and Liberman (2001) and the pie-in-thesky scheme for semantic annotation.
This paper is concerned with the tagging of spatial expressions in German newspaper articles, assigning a meaning to the expression and classifying the usages of the spatial expression and linking the derived referent to an event description. In our system, we implemented the activation of concepts in a very simple fashion, a concept is activated once (with a cost depending on the item that activated it) and is left activated thereafter. As an example, a city also activates the nodes for the region and the country it is part of, so that cities from one country are chosen over cities from different countries. A test corpus of 12 German newspaper articles was tested regarding several disambiguation strategies. Disambiguation was carried out via a beam search to find an approximately cost-optimal solution for the conflict set of potential grounding candidates for the tagged spatial expression. Test showed that the disambiguation strategies improved accuracy significantly.
Transforming constituent-based annotation into dependency-based annotation has been shown to work for different treebanks and annotation schemes (e.g. Lin (1995) has transformed the Penn treebank, and Kübler and Telljohann (2002) the Tübinger Baumbank des Deutschen (TüBa-D/Z)). These ventures are usually triggered by the conflict between theory-neutral annotation, that targets most needs of a wider audience, and theory-specific annotation, that provides more fine-grained information for a smaller audience. As a compromise, it has been pointed out that treebanks can be designed to support more than one theory from the start (Nivre, 2003). We argue that information can also be added to an existing annotation scheme so that it supports additional theory-specific annotations. We also argue that such a transformation is useful for improving and extending the original annotation scheme with respect to both ambiguous annotation and annotation errors. We show this by analysing problems that arise when generating dependency information from the constituent-based TüBa-D/Z.
Japanese is often taken to be strictly head-final in its syntax. In our work on a broad-coverage, precision implemented HPSG for Japanese, we have found that while this is generally true, there are nonetheless a few minor exceptions to the broad trend. In this paper, we describe the grammar engineering project, present the exceptions we have found, and conclude that this kind of phenomenon motivates on the one hand the HPSG type hierarchical approach which allows for the statement of both broad generalizations and exceptions to those generalizations and on the other hand the usefulness of grammar engineering as a means of testing linguistic hypotheses.
Hybrid robust deep and shallow semantic processing for creativity support in document production
(2004)
The research performed in the DeepThought project (http://www.project-deepthought.net) aims at demonstrating the potential of deep linguistic processing if added to existing shallow methods that ensure robustness. Classical information retrieval is extended by high precision concept indexing and relation detection. We use this approach to demonstrate the feasibility of three ambitious applications, one of which is a tool for creativity support in document production and collective brainstorming. This application is described in detail in this paper. Common to all three applications, and the basis for their development is a platform for integrated linguistic processing. This platform is based on a generic software architecture that combines multiple NLP components and on robust minimal recursive semantics (RMRS) as a uniform representation language.
While the sortal constraints associated with Japanese numeral classifiers are wellstudied, less attention has been paid to the details of their syntax. We describe an analysis implemented within a broadcoverage HPSG that handles an intricate set of numeral classifier construction types and compositionally relates each to an appropriate semantic representation, using Minimal Recursion Semantics.
LTAG semantics for questions
(2004)
This papers presents a compositional semantic analysis of interrogatives clauses in LTAG (Lexicalized Tree Adjoining Grammar) that captures the scopal properties of wh- and nonwh-quantificational elements. It is shown that the present approach derives the correct semantics for examples claimed to be problematic for LTAG semantic approaches based on the derivation tree. The paper further provides an LTAG semantics for embedded interrogatives.
This paper sets up a framework for LTAG (Lexicalized Tree Adjoining Grammar) semantics that brings together ideas from different recent approaches addressing some shortcomings of TAG semantics based on the derivation tree. Within this framework, several sample analyses are proposed, and it is shown that the framework allows to analyze data that have been claimed to be problematic for derivation tree based LTAG semantics approaches.
The purpose of this paper is to describe recent developments in the morphological, syntactic, and semantic annotation of the TüBa-D/Z treebank of German. The TüBa-D/Z annotation scheme is derived from the Verbmobil treebank of spoken German [4, 10], but has been extended along various dimensions to accommodate the characteristics of written texts. TüBa-D/Z uses as its data source the "die tageszeitung" (taz) newspaper corpus. The Verbmobil treebank annotation scheme distinguishes four levels of syntactic constituency: the lexical level, the phrasal level, the level of topological fields, and the clausal level. The primary ordering principle of a clause is the inventory of topological fields, which characterize the word order regularities among different clause types of German, and which are widely accepted among descriptive linguists of German [3, 6]. The TüBa-D/Z annotation relies on a context-free backbone (i.e. proper trees without crossing branches) of phrase structure combined with edge labels that specify the grammatical function of the phrase in question. The syntactic annotation scheme of the TüBa-D/Z is described in more detail in [12, 11]. TüBa-D/Z currently comprises approximately 15 000 sentences, with approximately 7 000 sentences being in the correction phase. The latter will be released along with an updated version of the existing treebank before the end of this year. The treebank is available in an XML format, in the NEGRA export format [1] and in the Penn treebank bracketing format. The XML format contains all types of information as described above, the NEGRA export format contains all sentenceinternal information while the Penn treebank format includes only those layers of information that can be expressed as pure tree structures. Over the course of the last year, more fine grained linguistic annotations have been added along the following dimensions: 1. the basic Stuttgart-Tübingen tagset, STTS, [9] labels have been enriched by relevant features of inflectional morphology, 2. named entity information has been encoded as part of the syntactic annotation, and 3. a set of anaphoric and coreference relations has been added to link referentially dependent noun phrases. In the following sections, we will describe each of these innovations in turn and will demonstrate how the additional annotations can be incorporated into one comprehensive annotation scheme.
The research performed in the DeepThought project aims at demonstrating the potential of deep linguistic processing if combined with shallow methods for robustness. Classical information retrieval is extended by high precision concept indexing and relation detection. On the basis of this approach, the feasibility of three ambitious applications will be demonstrated, namely: precise information extraction for business intelligence; email response management for customer relationship management; creativity support for document production and collective brainstorming. Common to these applications, and the basis for their development is the XML-based, RMRS-enabled core architecture framework that will be described in detail in this paper. The framework is not limited to the applications envisaged in the DeepThought project, but can also be employed e.g. to generate and make use of XML standoff annotation of documents and linguistic corpora, and in general for a wide range of NLP-based applications and research purposes.
The purpose of this paper is to describe the TüBa-D/Z treebank of written German and to compare it to the independently developed TIGER treebank (Brants et al., 2002). Both treebanks, TIGER and TüBa-D/Z, use an annotation framework that is based on phrase structure grammar and that is enhanced by a level of predicate-argument structure. The comparison between the annotation schemes of the two treebanks focuses on the different treatments of free word order and discontinuous constituents in German as well as on differences in phrase-internal annotation.
Tree-local MCTAG with shared nodes : an analysis of word order variation in German and Korean
(2004)
Tree Adjoining Grammars (TAG) are known not to be powerful enough to deal with scrambling in free word order languages. The TAG-variants proposed so far in order to account for scrambling are not entirely satisfying. Therefore, an alternative extension of TAG is introduced based on the notion of node sharing. Considering data from German and Korean, it is shown that this TAG-extension can adequately analyse scrambling data, also in combination with extraposition and topicalization.