Linguistik
Refine
Year of publication
Document Type
- Conference Proceeding (106) (remove)
Language
- English (106) (remove)
Has Fulltext
- yes (106)
Is part of the Bibliography
- no (106)
Keywords
- Computerlinguistik (17)
- Informationsstruktur (16)
- Phonetik (12)
- Japanisch (9)
- Englisch (7)
- Grammatik (7)
- Maschinelle Übersetzung (6)
- Nungisch (6)
- Tibetobirmanische Sprachen (6)
- Deutsch (5)
Institute
This paper proposes an annotating scheme that encodes honorifics (respectful words). Honorifics are used extensively in Japanese, reflecting the social relationship (e.g. social ranks and age) of the referents. This referential information is vital for resolving zero
pronouns and improving machine translation outputs. Annotating honorifics is a complex task that involves identifying a predicate with honorifics, assigning ranks to referents of the
predicate, calibrating the ranks, and connecting referents with their predicates.
We present an architecture for the integration of shallow and deep NLP components which is aimed at flexible combination of different language technologies for a range of practical current and future applications. In particular, we describe the integration of a high-level HPSG parsing system with different high-performance shallow components, ranging from named entity recognition to chunk parsing and shallow clause recognition. The NLP components enrich a representation of natural language text with layers of new XML meta-information using a single shared data structure, called the text chart. We describe details of the integration methods, and show how information extraction and language checking applications for realworld German text benefit from a deep grammatical analysis.
Hybrid robust deep and shallow semantic processing for creativity support in document production
(2004)
The research performed in the DeepThought project (http://www.project-deepthought.net) aims at demonstrating the potential of deep linguistic processing if added to existing shallow methods that ensure robustness. Classical information retrieval is extended by high precision concept indexing and relation detection. We use this approach to demonstrate the feasibility of three ambitious applications, one of which is a tool for creativity support in document production and collective brainstorming. This application is described in detail in this paper. Common to all three applications, and the basis for their development is a platform for integrated linguistic processing. This platform is based on a generic software architecture that combines multiple NLP components and on robust minimal recursive semantics (RMRS) as a uniform representation language.
In this paper we show an approach to the customization of GermaNet to the German HPSG grammar lexicon developed in the Verbmobil project. GermaNet has a broad coverage of the German base vocabulary and fine-grained semantic classification; while the HPSG grammar lexicon is comparatively small und has a coarse-grained semantic classification. In our approach, we have developed a mapping algorithm to relate the synsets in GermaNet with the semantic sorts in HPSG. The evaluation result shows that this approach is useful for the lexical extension of our deep grammar development to cope with real-world text understanding.
Particles fullfill several distinct central roles in the Japanese language. They can mark arguments as well as adjuncts, can be functional or have semantic functions. There is, however, no straightforward matching from particles to functions, as, e.g., 'ga' can mark the subject, the object or the adjunct of a sentence. Particles can cooccur. Verbal arguments that could be identified by particles can be eliminated in the Japanese sentence. And finally, in spoken language particles are often omitted. A proper treatment of particles is thus necessary to make an analysis of Japanese sentences possible. Our treatment is based on an empirical investigation of 800 dialogues. We set up a type hierarchy of particles motivated by their subcategorizational and modificational behaviour. This type hierarchy is part of the Japanese syntax in VERBMOBIL.
Preferences and defaults for definiteness and number in japanese to german machine translation
(1996)
A significant problem when translating Japanese dialogues into German is the missing information on number and definiteness in the Japanese analysis output. The integration of the search for such information into the transfer process provides an efficient solution. General transfer includes conditions to make it possible to consider external knowledge. Thereby, grammatical and lexical knowledge of the source language, knowledge of lexical restrictions on the target language, domain knowledge and discourse knowledge are accessible.
We present a solution for the representation of Japanese honorifical information in the HPSG framework. Basically, there are three dimensions of honorification. We show that a treatment is necessary that involves both the syntactic and the contextual level of information. The japanese grammar is part of a machine translation system.
The research performed in the DeepThought project aims at demonstrating the potential of deep linguistic processing if combined with shallow methods for robustness. Classical information retrieval is extended by high precision concept indexing and relation detection. On the basis of this approach, the feasibility of three ambitious applications will be demonstrated, namely: precise information extraction for business intelligence; email response management for customer relationship management; creativity support for document production and collective brainstorming. Common to these applications, and the basis for their development is the XML-based, RMRS-enabled core architecture framework that will be described in detail in this paper. The framework is not limited to the applications envisaged in the DeepThought project, but can also be employed e.g. to generate and make use of XML standoff annotation of documents and linguistic corpora, and in general for a wide range of NLP-based applications and research purposes.
In this paper we describe SOBA, a sub-component of the SmartWeb multi-modal dialog system. SOBA is a component for ontologybased information extraction from soccer web pages for automatic population of a knowledge base that can be used for domainspecific question answering. SOBA realizes a tight connection between the ontology, knowledge base and the information extraction component. The originality of SOBA is in the fact that it extracts information from heterogeneous sources such as tabular structures, text and image captions in a semantically integrated way. In particular, it stores extracted information in a knowledge base, and in turn uses the knowledge base to interpret and link newly extracted information with respect to already existing entities.