OPUS 4 | Linguistik

Single prosodic phrase sentences (2008)

A series of production and perception experiments investigating the prosody and well-formedness of special sentences, called Wide Focus Partial Fronting (WFPF), which consist of only one prosodic phrase and a unique initial accented argument, are reported on here. The results help us to decide between different models of German prosody. The absence of pitch height difference on the accent of the sentence speaks in favor of a relative model of prosody, in which accents are scaled relative to each other, and against models in which pitch accents are scaled in an absolute way. The results also speak for a model in which syntax, but not information structure, influences the prosodic phrasing. Finally, perception experiments show that the prosodic structure of sentences with a marked word order needs to be presented for grammaticality judgments. Presentation of written material only is not enough, and falsifies the results.

Focus asymmetries in Bura (2008)

Hartmann, Katharina ; Jacob, Peggy ; Zimmermann, Malte

This article presents the central aspects of the focus system of Bura (Chadic), which exhibits a number of asymmetries: Grammatical focus marking is obligatory only with focused subjects, where focus is marked by the particle án following the subject. Focused subjects remain in situ and the complement of án is a regular VP. With nonsubject foci, án appears in a cleft-structure between the fronted focus constituent and a relative clause. We present a semantically unified analysis of focus marking in Bura that treats the particle as a focusmarking copula in T that takes a property-denoting expression (the background) and an individual-denoting expression (the focus) as arguments. The article also investigates the realization of predicate and polarity focus, which are almost never marked. The upshot of the discussion is that Bura shares many characteristic traits of focus marking with other Chadic languages, but it crucially differs in exhibiting a structural difference in the marking of focus on subjects and non-subject constituents.

The syntax of existential sentences in Serbian (2008)

Hartmann, Jutta M. ; Milicevic, Nataša

Freeze (1992) argued on the basis of data from several different languages that there is a close relationship between existential sentences (stating the existence of an entity) and locative sentences (stating the location of an entity). Freeze (1992) proposes that they are both derived from the same base structure and that the surface differences are rather due to the distinct information structures.This paper argues against this position with the data from Serbian existentials, which show clear syntactic differences from the locatives. Thus, the close relationship between existential and locative sentences that Freeze (1992) observes is conceptual, but not (necessarily) part of the syntax of the language. In order to account for the data, we propose that existential sentences originate from a different syntactic predication structure than the locative ones. The existential meaning arises, as we will show, from the interaction of this predication structure with the structure and meaning of the noun phrase.

Proceedings of the LREC workshop on partial parsing : between chunk parsing and deep parsing (2008)

Kübler, Sandra ; Piskorski, Jakub ; Przepiorkowski, Adam

The PaGe 2008 shared task on parsing German (2008)

Kübler, Sandra

The ACL 2008 Workshop on Parsing German features a shared task on parsing German. The goal of the shared task was to find reasons for the radically different behavior of parsers on the different treebanks and between constituent and dependency representations. In this paper, we describe the task and the data sets. In addition, we provide an overview of the test results and a first analysis.

POS tagging for German : how important is the right context? (2008)

Ivanova, Steliana ; Kübler, Sandra

Part-of-Speech tagging is generally performed by Markov models, based on bigram or trigram models. While Markov models have a strong concentration on the left context of a word, many languages require the inclusion of right context for correct disambiguation. We show for German that the best results are reached by a combination of left and right context. If only left context is available, then changing the direction of analysis and going from right to left improves the results. In a version of MBT (Daelemans et al., 1996) with default parameter settings, the inclusion of the right context improved POS tagging accuracy from 94.00% to 96.08%, thus corroborating our hypothesis. The version with optimized parameters reaches 96.73%.

Memory-based vocalization of Arabic (2008)

Kübler, Sandra ; Mohamed, Emad

The problem of vocalization, or diacritization, is essential to many tasks in Arabic NLP. Arabic is generally written without the short vowels, which leads to one written form having several pronunciations with each pronunciation carrying its own meaning(s). In the experiments reported here, we define vocalization as a classification problem in which we decide for each character in the unvocalized word whether it is followed by a short vowel. We investigate the importance of different types of context. Our results show that the combination of using memory-based learning with only a word internal context leads to a word error rate of 6.64%. If a lexical context is added, the results deteriorate slowly.

How to compare treebanks (2008)

Kübler, Sandra ; Maier, Wolfgang ; Rehbein, Ines ; Versley, Yannick

Recent years have seen an increasing interest in developing standards for linguistic annotation, with a focus on the interoperability of the resources. This effort, however, requires a profound knowledge of the advantages and disadvantages of linguistic annotation schemes in order to avoid importing the flaws and weaknesses of existing encoding schemes into the new standards. This paper addresses the question how to compare syntactically annotated corpora and gain insights into the usefulness of specific design decisions. We present an exhaustive evaluation of two German treebanks with crucially different encoding schemes. We evaluate three different parsers trained on the two treebanks and compare results using EVALB, the Leaf-Ancestor metric, and a dependency-based evaluation. Furthermore, we present TePaCoC, a new testsuite for the evaluation of parsers on complex German grammatical constructions. The testsuite provides a well thought-out error classification, which enables us to compare parser output for parsers trained on treebanks with different encoding schemes and provides interesting insights into the impact of treebank annotation schemes on specific constructions like PP attachment or non-constituent coordination.

TuLiPA : towards a multi-formalism parsing environment for grammar engineering (2008)

Kallmeyer, Laura ; Lichte, Timm ; Maier, Wolfgang ; Parmentier, Yannick ; Dellert, Johannes ; Evang, Kilian

In this paper, we present an open-source parsing environment (Tübingen Linguistic Parsing Architecture, TuLiPA) which uses Range Concatenation Grammar (RCG) as a pivot formalism, thus opening the way to the parsing of several mildly context-sensitive formalisms. This environment currently supports tree-based grammars (namely Tree-Adjoining Grammars (TAG) and Multi-Component Tree-Adjoining Grammars with Tree Tuples (TT-MCTAG)) and allows computation not only of syntactic structures, but also of the corresponding semantic representations. It is used for the development of a tree-based grammar for German.

On the relation between multicomponent tree adjoining grammars with tree tuples (TT-MCTAG) and range concatenation grammars (RCG) (2008)

Kallmeyer, Laura ; Parmentier, Yannick

This paper investigates the relation between TT-MCTAG, a formalism used in computational linguistics, and RCG. RCGs are known to describe exactly the class PTIME; simple RCG even have been shown to be equivalent to linear context-free rewriting systems, i.e., to be mildly context-sensitive. TT-MCTAG has been proposed to model free word order languages. In general, it is NP-complete. In this paper, we will put an additional limitation on the derivations licensed in TT-MCTAG. We show that TT-MCTAG with this additional limitation can be transformed into equivalent simple RCGs. This result is interesting for theoretical reasons (since it shows that TT-MCTAG in this limited form is mildly context-sensitive) and, furthermore, even for practical reasons: We use the proposed transformation from TT-MCTAG to RCG in an actual parser that we have implemented.

Open Access

Linguistik

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Institute

174 search hits