OPUS 4 | Linguistik-Klassifikation

Dialogue acts in Verbmobil 2 (1998)

Alexandersson, Jan ; Buschbeck-Wolf, Bianka ; Fujinami, Tsutomu ; Kipp, Michael ; Koch, Stefan ; Maier, Elisabeth ; Reithinger, Norbert ; Schmitz, Birte ; Siegel, Melanie

This report describes the dialogue phases and the second edition dialogue acts which are used in the VERBMOBIL 2 project [...]. While in the first project phase the scenario was restricted to appointment scheduling dialogues, it has been extended to travel planning in the second phase with appointment scheduling being only a part of the new scenario.

Open-Source Machine Translation with DELPH-IN (2005)

Bond, Francis ; Copestake, Ann ; Flickinger, Dan ; Oepen, Stephan ; Siegel, Melanie

The Deep Linguistic Processing with HPSG Initiative (DELH-IN) provides the infrastructure needed to produce open-source semantic transfer-based machine translation systems. We have made available a prototype Japanese-English machine translation system built from existing resources include parsers, generators, bidirectional grammars and a transfer engine.

Implementing the syntax of Japanese numeral classifiers (2005)

Bender, Emily M. ; Siegel, Melanie

While the sortal constraints associated with Japanese numeral classifiers are well-studied, less attention has been paid to the details of their syntax. We describe an analysis implemented within a broad-coverage HPSG that handles an intricate set of numeral classifier construction types and compositionally relates each to an appropriate semantic representation, using Minimal Recursion Semantics.

Implementing the syntax of japanese numeral classifiers (2004)

Bender, Emily M. ; Siegel, Melanie

While the sortal constraints associated with Japanese numeral classifiers are wellstudied, less attention has been paid to the details of their syntax. We describe an analysis implemented within a broadcoverage HPSG that handles an intricate set of numeral classifier construction types and compositionally relates each to an appropriate semantic representation, using Minimal Recursion Semantics.

Corpora and evaluation tools for multilingual named entity grammar development (2003)

Bering, Christian ; Droźdźyński, Witold ; Erbach, Gregor ; Guasch, Clara ; Homola, Petr ; Lehmann, Sabine ; Li, Hong ; Krieger, Hans-Ulrich ; Piskorski, Jakub ; Schäfer, Ulrich ; Shimada, Atsuko ; Siegel, Melanie ; Xu, Feiyu ; Ziegler-Eisele, Dorothee

We present an effort for the development of multilingual named entity grammars in a unification-based finite-state formalism (SProUT). Following an extended version of the MUC7 standard, we have developed Named Entity Recognition grammars for German, Chinese, Japanese, French, Spanish, English, and Czech. The grammars recognize person names, organizations, geographical locations, currency, time and date expressions. Subgrammars and gazetteers are shared as much as possible for the grammars of the different languages. Multilingual corpora from the business domain are used for grammar development and evaluation. The annotation format (named entity and other linguistic information) is described. We present an evaluation tool which provides detailed statistics and diagnostics, allows for partial matching of annotations, and supports user-defined mappings between different annotation and grammar output formats.

A brief introduction to the childes project : with special reference to Greek: chat transcription, linkage, grammatical coding and clan analysis (2010)

Stephany, Ursula

Speech transcription using MED (2001)

Lehmann, Katrin

MED (Media EDitor) is a program designed to facilitate the transcription of digitized soundfiles into textfiles. It was written by Hans Drexler and Daan Broeder, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands. [...] The aim of MED is to facilitate the transcription of sound into text using a single program. It works on the principle of the coexistence and interaction of two basic elements, the waveform display window and the text window. [...] This means that you no longer need to use both a sound editor and a word processor at the same time in order to transcribe digitized speech files. Instead, you can directly type the sound you hear (and see) via MED into the text window. Furthermore, you can directly link sound portions of the waveform display window to text portions of the text window, so that you can easily locate and listen to the original source of your transcription once the links have been set. In this function the waveform display window and the text window virtually interact with each other.

A polynomial-time parsing algorithm for TT-MCTAG (2009)

Kallmeyer, Laura ; Satta, Giorgio

This paper investigates the class of Tree-Tuple MCTAG with Shared Nodes, TT-MCTAG for short, an extension of Tree Adjoining Grammars that has been proposed for natural language processing, in particular for dealing with discontinuities and word order variation in languages such as German. It has been shown that the universal recognition problem for this formalism is NP-hard, but so far it was not known whether the class of languages generated by TT-MCTAG is included in PTIME. We provide a positive answer to this question, using a new characterization of TT-MCTAG.

A declarative characterization of different types of multicomponent tree adjoining grammars (2009)

Kallmeyer, Laura

Multicomponent Tree Adjoining Grammars (MCTAGs) are a formalism that has been shown to be useful for many natural language applications. The definition of non-local MCTAG however is problematic since it refers to the process of the derivation itself: a simultaneity constraint must be respected concerning the way the members of the elementary tree sets are added. Looking only at the result of a derivation (i.e., the derived tree and the derivation tree), this simultaneity is no longer visible and therefore cannot be checked. I.e., this way of characterizing MCTAG does not allow to abstract away from the concrete order of derivation. In this paper, we propose an alternative definition of MCTAG that characterizes the trees in the tree language of an MCTAG via the properties of the derivation trees (in the underlying TAG) the MCTAG licences. We provide similar characterizations for various types of MCTAG. These characterizations give a better understanding of the formalisms, they allow a more systematic comparison of different types of MCTAG, and, furthermore, they can be exploited for parsing.

Vagueness and referential ambiguity in a large-scale annotated corpus (2009)

Versley, Yannick

In this paper, we argue that difficulties in the definition of coreference itself contribute to lower inter-annotator agreement in certain cases. Data from a large referentially annotated corpus serves to corroborate this point, using a quantitative investigation to assess which effects or problems are likely to be the most prominent. Several examples where such problems occur are discussed in more detail, and we then propose a generalisation of Poesio, Reyle and Stevenson’s Justified Sloppiness Hypothesis to provide a unified model for these cases of disagreement and argue that a deeper understanding of the phenomena involved allows to tackle problematic cases in a more principled fashion than would be possible using only pre-theoretic intuitions.

Open Access

Linguistik-Klassifikation

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Institute

80 search hits