Linguistik-Klassifikation: Computerlinguistik / Computational linguistics
11 search hits
-
O trabalho de tradutor como fonte para constituição de base de dados
(2005)
-
Bibiana Teixeira de Almeida
-
How do treebank annotation schemes influence parsing results? : or how not to compare apples and oranges
(2005)
-
Sandra Kübler
- In the last decade, the Penn treebank has become the standard data set for evaluating parsers. The fact that most parsers are solely evaluated on this specific data set leaves the question unanswered how much these results depend on the annotation scheme of the treebank. In this paper, we will investigate the influence which different decisions in the annotation schemes of treebanks have on parsing. The investigation uses the comparison of similar treebanks of German, NEGRA and TüBa-D/Z, which are subsequently modified to allow a comparison of the differences. The results show that deleted unary nodes and a flat phrase structure have a negative influence on parsing quality while a flat clause structure has a positive influence.
-
A descriptive characterization of multicomponent tree adjoining grammars
(2005)
-
Laura Kallmeyer
- Multicomponent Tree Adjoining Grammars (MCTAG) is a formalism that has been shown to be useful for many natural language applications. The definition of MCTAG however is problematic since it refers to the process of the derivation itself: a simultaneity constraint must be respected concerning the way the members of the elementary tree sets are added. Looking only at the result of a derivation (i.e., the derived tree and the derivation tree), this simultaneity is no longer visible and therefore cannot be checked. I.e., this way of characterizing MCTAG does not allow to abstract away from the concrete order of derivation. Therefore, in this paper, we propose an alternative definition of MCTAG that characterizes the trees in the tree language of an MCTAG via the properties of the derivation trees the MCTAG licences.
-
Scope and situation binding in LTAG using semantic unification
(2005)
-
Maribel Romero
Laura Kallmeyer
- This paper develops a framework for TAG (Tree Adjoining Grammar) semantics that brings together ideas from different recent approaches.Then, within this framework, an analysis of scope is proposed that accounts for the different scopal properties of quantifiers, adverbs, raising verbs and attitude verbs. Finally, including situation variables in the semantics, different situation binding possibilities are derived for different types of quantificational elements.
-
Treebank profiling of spoken and written German
(2005)
-
Erhard W. Hinrichs
Sandra Kübler
- This paper profiles significant differences in syntactic distribution and differences in word class frequencies for two treebanks of spoken and written German: the TüBa-D/S, a treebank of transliterated spontaneous dialogs, and the TüBa-D/Z treebank of newspaper articles published in the German daily newspaper ´die tageszeitung´(taz). The approach can be used more generally as a means of distinguishing and classifying language corpora of different genres.
-
Parser evaluation across text types
(2005)
-
Yannick Versley
- When a statistical parser is trained on one treebank, one usually tests it on another portion of the same treebank, partly due to the fact that a comparable annotation format is needed for testing. But the user of a parser may not be interested in parsing sentences from the same newspaper all over, or even wants syntactic annotations for a slightly different text type. Gildea (2001) for instance found that a parser trained on the WSJ portion of the Penn Treebank performs less well on the Brown corpus (the subset that is available in the PTB bracketing format) than a parser that has been trained only on the Brown corpus, although the latter one has only half as many sentences as the former. Additionally, a parser trained on both the WSJ and Brown corpora performs less well on the Brown corpus than on the WSJ one. This leads us to the following questions that we would like to address in this paper: - Is there a difference in usefulness of techniques that are used to improve parser performance between the same-corpus and the different-corpus case? - Are different types of parsers (rule-based and statistical) equally sensitive to corpus variation? To achieve this, we compared the quality of the parses of a hand-crafted constraint-based parser and a statistical PCFG-based parser that was trained on a treebank of German newspaper text.
-
Tagging kausaler Relationen
(2005)
-
Yannick Versley
- In dieser Diplomarbeit geht es um kausale Beziehungen zwischen Ereignissen und Erklärungsbeziehungen zwischen Ereignissen, bei denen kausale Relationen eine wichtige Rolle spielen. Nachdem zeitliche Relationen einerseits ihrer einfacheren Formalisierbarkeit und andererseits ihrer gut sichtbaren Rolle in der Grammatik (Tempus und Aspekt, zeitliche Konjunktionen) wegen in jüngerer Zeit stärker im Mittelpunkt des Interesses standen, soll hier argumentiert werden, dass kausale Beziehungen und die Erklärungen, die sie ermöglichen, eine wichtigere Rolle im Kohärenzgefüge des Textes spielen. Im Gegensatz zu “tiefen” Verfahren, die auf einer detaillierten semantischen Repr¨asentation des Textes aufsetzen und infolgedessen für unrestringierten Text m. E. nicht geeignet sind, wird hier untersucht, wie man dieses Ziel erreichen kann, ohne sich auf eine aufwändig konstruierte Wissensbasis verlassen zu müssen.
-
Die Modellierung zeitlicher Strukturen im Schweizerdeutschen
(2005)
-
Beat Siebenhaar
- Die Prosodie der Mundarten wurde schon früh als auffälliges und distinktes Merkmal wahrgenommen und in mehreren Arbeiten zur Grammatik des Schweizerdeutschen mittels Musiknoten festgehalten (u. a. J. Vetsch 1910, E. Wipf 1910, K. Schmid 1915, W. Clauss 1927, A. Weber 1948), wobei schon A. Weber (1948, S. 53) anmerkt, "dass sich der musikalische Gang der Rede nicht ohne Gewaltsamkeit mit der üblichen Notenschrift darstellen lässt". Da also eine adäquate Kodierung, eine theoretische Grundlage und die notwendigen phonetischen Instrumente zur Intonationsforschung fehlten, wurden diese ersten Ansätze nicht aus- und weitergeführt. Erst in der Mitte des 20. Jahrhunderts brachte die technische Entwicklung Instrumente zur Messung der Prosodie hervor, die nun durch die Popularisierung der entsprechenden Computerprogramme im Übergang zum 21. Jahrhundert für die linguistische Forschung intensiv und breit genutzt werden können.
-
Implementing the Syntax of Japanese Numeral Classifiers
(2005)
-
Emily M. Bender
Melanie Siegel
- While the sortal constraints associated with Japanese numeral classifiers are well-studied, less attention has been paid to the details of their syntax. We describe an analysis implemented within a broad-coverage HPSG that handles an intricate set of numeral classifier construction types and compositionally relates each to an appropriate semantic representation, using Minimal Recursion Semantics.
-
Open-Source Machine Translation with DELPH-IN
(2005)
-
Francis Bond
Ann Copestake
Dan Flickinger
Stephan Oepen
Melanie Siegel
- The Deep Linguistic Processing with HPSG Initiative (DELH-IN) provides the infrastructure needed to produce open-source semantic transfer-based machine translation systems. We have made available a prototype Japanese-English machine translation system built from existing resources include parsers, generators, bidirectional grammars and a transfer engine.