29 search hits
-
An HSPG-to-CFG Approximation of Japanese
(2000)
-
Bernd Kiefer
Hans-Ulrich Krieger
Melanie Siegel
- We present a simple approximation method for turning a Head-Driven Phrase Structure Grammar into a context-free grammar. The approximation method can be seen as the construction of the least fixpoint of a certain monotonic function. We discuss an experiment with a large HPSG for Japanese.
-
An Integrated Architecture for Shallow and Deep Processing
(2002)
-
Berthold Crysmann
Anette Frank
Bernd Kiefer
Stefan Müller
Günter Neumann
Jakub Piskorski
Ulrich Schäfer
Melanie Siegel
Hans Uszkoreit
Feiyu Xu
Markus Becker
Hans-Ulrich Krieger
- We present an architecture for the integration of shallow and deep NLP components which is aimed at flexible combination of different language technologies for a range of practical current and future applications. In particular, we describe the integration of a high-level HPSG parsing system with different high-performance shallow components, ranging from named entity recognition to chunk parsing and shallow clause recognition. The NLP components enrich a representation of natural language text with layers of new XML meta-information using a single shared data structure, called the text chart. We describe details of the integration methods, and show how information extraction and language checking applications for realworld German text benefit from a deep grammatical analysis.
-
Annotating Honorifics Denoting Social Ranking of Referents
(2005)
-
Shigeko Nariyama
Hiromi Nakaiwa
Melanie Siegel
- This paper proposes an annotating scheme that encodes honorifics (respectful words). Honorifics are used extensively in Japanese, reflecting the social relationship (e.g. social ranks and age) of the referents. This referential information is vital for resolving zero
pronouns and improving machine translation outputs. Annotating honorifics is a complex task that involves identifying a predicate with honorifics, assigning ranks to referents of the
predicate, calibrating the ranks, and connecting referents with their predicates.
-
Computer-assisted transcription and analysis of speech
(2001)
-
Ursula Stephany
Conny Bast
Katrin Lehmann
- The two papers included in this volume have developed from work with the CHILDES tools and the Media Editor in the two research projects, "Second language acquisition of German by Russian learners", sponsored by the Max Planck Institute for Psycholinguistics, Nijmegen, from 1998 to 1999 (directed by Ursula Stephany, University of Cologne, and Wolfgang Klein, Max Planck Institute for Psycholinguistics, Nijmegen) and "The age factor in the acquisition of German as a second language", sponsored by the German Science Foundation (DFG), Bonn, since 2000 (directed by Ursula Stephany, University of Cologne, and Christine Dimroth, Max Planck Institute for Psycholinguistics, Nijmegen). The CHILDES Project has been developed and is being continuously improved at Carnegie Mellon University, Pittsburgh, under the supervision of Brian MacWhinney. Having used the CHILDES tools for more than ten years for transcribing and analyzing Greek child data there it was no question that I would also use them for research into the acquisition of German as a second language and analyze the big amount of spontaneous speech gathered from two Russian girls with the help of the CLAN programs. When in the spring of 1997, Steven Gillis from the University of Antwerp (in collaboration with Gert Durieux) developed a lexicon-based automatic coding system based on the CLAN program MOR and suitable for coding languages with richer morphologies than English, such as Modern Greek. Coding huge amounts of data then became much quicker and more comfortable so that I decided to adopt this system for German as well. The paper "Working with the CHILDES Tools" is based on two earlier manuscripts which have grown out of my research on Greek child language and the many CHILDES workshops taught in Germany, Greece, Portugal, and Brazil over the years. Its contents have now been adapted to the requirements of research into the acquisition of German as a second language and for use on Windows.
-
Corpora and evaluation tools for multilingual named entity grammar development
(2003)
-
Christian Bering
Witold Droźdźyński
Gregor Erbach
Clara Guasch
Petr Homola
Sabine Lehmann
Hong Li
Hans-Ulrich Krieger
Jakub Piskorski
Ulrich Schäfer
Atsuko Shimada
Melanie Siegel
Feiyu Xu
Dorothee Ziegler-Eisele
- We present an effort for the development of multilingual named entity grammars in a unification-based finite-state formalism (SProUT). Following an extended version of the MUC7 standard, we have developed Named Entity Recognition grammars for German, Chinese, Japanese, French, Spanish, English, and Czech. The grammars recognize person names, organizations, geographical locations, currency, time and date expressions. Subgrammars and gazetteers are shared as much as possible for the grammars of the different languages. Multilingual corpora from the business domain are used for grammar development and evaluation. The annotation format (named entity and other linguistic information) is described. We present an evaluation tool which provides detailed statistics and diagnostics, allows for partial matching of annotations, and supports user-defined mappings between different annotation and grammar output formats.
-
Customizing GermaNet for the Use in Deep Linguistic Processing
(2001)
-
Melanie Siegel
Feiyu Xu
Günter Neumann
- In this paper we show an approach to the customization of GermaNet to the German HPSG grammar lexicon developed in the Verbmobil project. GermaNet has a broad coverage of the German base vocabulary and fine-grained semantic classification; while the HPSG grammar lexicon is comparatively small und has a coarse-grained semantic classification. In our approach, we have developed a mapping algorithm to relate the synsets in GermaNet with the semantic sorts in HPSG. The evaluation result shows that this approach is useful for the lexical extension of our deep grammar development to cope with real-world text understanding.
-
Definiteness and Number in Japanese to German Machine Translation
(1996)
-
Melanie Siegel
-
Definitheit und Numerus : Anforderungen an den Transfer Japanisch-Englisch
(1994)
-
Melanie Siegel
- Das Problem des Transfers in der maschinellen Übersetzung von Japanisch nach Englisch ist fehlende Information über Numerus und Definitheit im Japanischen, die für die Wahl der englischen Artikel und die Nomenmarkierung gebraucht wird. Obwohl dieses Problem signifikant ist, beschäftigt sich die Forschungsliteratur kaum damit. [...] Wir bsaieren unsere Untersuchungen auf experimentell erhobenen Daten aus einem Experiment über deutsch-japanische gedolmetschte Terminaushandlungsdialoge [...]. Auf diese Weise können Phänomene bestimmt werden, die für die Domäne von VERBMOBIL relevant sind. Wir sehen unser Vorgehen in Übereinstimmung mit dem 'Sublanguage'-Ansatz [...].
-
Dialogue Acts in VERBMOBIL-2 : Second Edition
(1998)
-
Jan Alexandersson
Bianka Buschbeck-Wolf
Tsutomu Fujinami
Michael Kipp
Stefan Koch
Elisabeth Maier
Norbert Reithinger
Birte Schmitz
Melanie Siegel
- This report describes the dialogue phases and the second edition dialogue acts which are used in the VERBMOBIL 2 project [...]. While in the first project phase the scenario was restricted to appointment scheduling dialogues, it has been extended to travel planning in the second phase with appointment scheduling being only a part of the new scenario.
-
Die japanische Syntax im Verbmobil Forschungsprototypen
(1996)
-
Melanie Siegel
- Die Domäne in VERBMOBIL sind Terminaushandlungsdialoge. Für die Syntax bedeutet das zunächst, daß die Sytnax sich an gesprochener Sprache orientieren muß. Das beinhaltet Nullanaphern, Phrasen, die auf die Kommunikationssituation bezogen sind und Phrasen, die für geschriebene Sprache als nicht wohlgeformt bezeichnet werden. Weitergehend gibt es einige domänenspezifische syntaktische besonderheiten, wie zum Biepsiel die Realisierung von Zeitangaben.