Refine
Document Type
Language
- English (5)
Has Fulltext
- yes (5)
Is part of the Bibliography
- no (5)
Keywords
- Computerlinguistik (3)
- Grammatiktheorie (2)
- Generic NLP Architecture (1)
- HPSG Parsing (1)
- IE (1)
- Japanisch (1)
- Kategorialgrammatik (1)
- Korpus <Linguistik> (1)
- Parser (1)
- Robust Minimal Recursion Semantics (1)
Institute
- Extern (3)
We present an architecture for the integration of shallow and deep NLP components which is aimed at flexible combination of different language technologies for a range of practical current and future applications. In particular, we describe the integration of a high-level HPSG parsing system with different high-performance shallow components, ranging from named entity recognition to chunk parsing and shallow clause recognition. The NLP components enrich a representation of natural language text with layers of new XML meta-information using a single shared data structure, called the text chart. We describe details of the integration methods, and show how information extraction and language checking applications for realworld German text benefit from a deep grammatical analysis.
We present an effort for the development of multilingual named entity grammars in a unification-based finite-state formalism (SProUT). Following an extended version of the MUC7 standard, we have developed Named Entity Recognition grammars for German, Chinese, Japanese, French, Spanish, English, and Czech. The grammars recognize person names, organizations, geographical locations, currency, time and date expressions. Subgrammars and gazetteers are shared as much as possible for the grammars of the different languages. Multilingual corpora from the business domain are used for grammar development and evaluation. The annotation format (named entity and other linguistic information) is described. We present an evaluation tool which provides detailed statistics and diagnostics, allows for partial matching of annotations, and supports user-defined mappings between different annotation and grammar output formats.
We present a constraint-based syntax-semantics interface for the construction of RMRS (Robust Minimal Recursion Semantics) representations from shallow grammars. The architecture is designed to allow modular interfaces to existing shallow grammars of various depth - ranging from chunk grammars to context-free stochastic grammars. We define modular semantics construction principles in a typed feature structure formalism that allow flexible adaptation to alternative grammars and different languages.
In this paper, we report on a transformation scheme that turns a Categorial Grammar, more specifically, a Combinatory Categorial Grammar (CCG; see Baldridge, 2002) into a derivation- and meaning-preserving typed feature structure (TFS) grammar.
We describe the main idea which can be traced back at least to work by Karttunen (1986), Uszkoreit (1986), Bouma (1988), and Calder et al. (1988). We then show how a typed representation of complex categories can be extended by other constraints, such as modes, and indicate how the Lambda semantics of combinators is mapped into a TFS representation, using unification to perform perform alpha-conversion and beta-reduction (Barendregt, 1984). We also present first findings concerning runtime measurements, showing that the PET system, originally developed for the HPSG grammar framework, outperforms the OpenCCG parser by a factor of 8–10 in the time domain and a factor of 4–5 in the space domain.