OPUS 4 | Linguistik

How do treebank annotation schemes influence parsing results? : or how not to compare apples and oranges (2005)

In the last decade, the Penn treebank has become the standard data set for evaluating parsers. The fact that most parsers are solely evaluated on this specific data set leaves the question unanswered how much these results depend on the annotation scheme of the treebank. In this paper, we will investigate the influence which different decisions in the annotation schemes of treebanks have on parsing. The investigation uses the comparison of similar treebanks of German, NEGRA and TüBa-D/Z, which are subsequently modified to allow a comparison of the differences. The results show that deleted unary nodes and a flat phrase structure have a negative influence on parsing quality while a flat clause structure has a positive influence.

Scope and situation binding in LTAG using semantic unification (2005)

Romero, Maribel ; Kallmeyer, Laura

This paper develops a framework for TAG (Tree Adjoining Grammar) semantics that brings together ideas from different recent approaches.Then, within this framework, an analysis of scope is proposed that accounts for the different scopal properties of quantifiers, adverbs, raising verbs and attitude verbs. Finally, including situation variables in the semantics, different situation binding possibilities are derived for different types of quantificational elements.

Treebank profiling of spoken and written German (2005)

Hinrichs, Erhard ; Kübler, Sandra

This paper profiles significant differences in syntactic distribution and differences in word class frequencies for two treebanks of spoken and written German: the TüBa-D/S, a treebank of transliterated spontaneous dialogs, and the TüBa-D/Z treebank of newspaper articles published in the German daily newspaper ´die tageszeitung´(taz). The approach can be used more generally as a means of distinguishing and classifying language corpora of different genres.

A descriptive characterization of multicomponent tree adjoining grammars (2005)

Kallmeyer, Laura

Multicomponent Tree Adjoining Grammars (MCTAG) is a formalism that has been shown to be useful for many natural language applications. The definition of MCTAG however is problematic since it refers to the process of the derivation itself: a simultaneity constraint must be respected concerning the way the members of the elementary tree sets are added. Looking only at the result of a derivation (i.e., the derived tree and the derivation tree), this simultaneity is no longer visible and therefore cannot be checked. I.e., this way of characterizing MCTAG does not allow to abstract away from the concrete order of derivation. Therefore, in this paper, we propose an alternative definition of MCTAG that characterizes the trees in the tree language of an MCTAG via the properties of the derivation trees the MCTAG licences.

Parser evaluation across text types (2005)

Versley, Yannick

When a statistical parser is trained on one treebank, one usually tests it on another portion of the same treebank, partly due to the fact that a comparable annotation format is needed for testing. But the user of a parser may not be interested in parsing sentences from the same newspaper all over, or even wants syntactic annotations for a slightly different text type. Gildea (2001) for instance found that a parser trained on the WSJ portion of the Penn Treebank performs less well on the Brown corpus (the subset that is available in the PTB bracketing format) than a parser that has been trained only on the Brown corpus, although the latter one has only half as many sentences as the former. Additionally, a parser trained on both the WSJ and Brown corpora performs less well on the Brown corpus than on the WSJ one. This leads us to the following questions that we would like to address in this paper: - Is there a difference in usefulness of techniques that are used to improve parser performance between the same-corpus and the different-corpus case? - Are different types of parsers (rule-based and statistical) equally sensitive to corpus variation? To achieve this, we compared the quality of the parses of a hand-crafted constraint-based parser and a statistical PCFG-based parser that was trained on a treebank of German newspaper text.

Was ist ein sprachlicher Fehler? : Anmerkungen zu populärer Sprachkritik am Beispiel der Kolumnensammlung von Bastian Sick (2005)

Schneider, Jan Georg

Woher kommt das neuerwachte Interesse an Sprachrichtigkeit? Woher kommt die ausgeprägte sprachliche Unsicherheit, die auch bei vielen hochgebildeten Menschen den Wunsch entstehen lässt, von Sprachpflegern über ihr Ureigenstes, nämlich ihre Muttersprache, belehrt zu werden? Obwohl Antworten auf diese Fragen letztlich spekulativ bleiben, wage ich doch die These, dass eine Ursache hierfür die Rechtschreibreform ist, die von einem Großteil der Bevölkerung nach wie vor nicht angenommen wird, die insgesamt weder zur Vereinfachung noch zu einer höheren Einheitlichkeit geführt hat; die aber andererseits ein öffentliches Nachdenken und Diskutieren über Sprachrichtigkeit in Gang setzte. – Jedenfalls ist die Verunsicherung ein Faktum, das von Linguisten nicht ignoriert werden sollte.

Zur Normativität von Sprachregeln : ist Sprechen regelgeleitetes Handeln? (2005)

Schneider, Jan Georg

Ausgangspunkt: Die Kritik am "Zwei-Welten-Modell": Die grundlegende linguistische Unterscheidung zwischen "Sprache" und "Sprechen" ist im Rahmen der neueren Debatten um Sprachmedialität wieder verstärkt thematisiert und kritisiert worden. Lässt sich dieses schulbildende, in der Linguistik geradezu eherne Begriffspaar überhaupt noch sinnvollerweise aufrechterhalten? Oder muss es mindestens umdefiniert, vielleicht sogar gänzlich verworfen werden? Hat sich insbesondere die auf Chomsky zurückgehende Unterscheidung von Sprachkompetenz und -performanz nicht von selbst ad absurdum geführt, nachdem der linguistische Kognitivismus chomskyscher Provenienz Sprache als lebendiges Phänomen, als Medium menschlicher Kommunikation, vollständig aus dem Blick verloren hat? Führt nicht schon die scheinbar harmlose linguistische Differenzierung zwischen einer Sprachregel und ihrer Anwendung zu einer irreführenden und unangemessenen Verdinglichung von Sprache? ...

Unity in diversity : integrating differing linguistic data in TUSNELDA (2005)

Wagner, Andreas

This paper describes the creation and preparation of TUSNELDA, a collection of corpus data built for linguistic research. This collection contains a number of linguistically annotated corpora which differ in various aspects such as language, text sorts / data types, encoded annotation levels, and linguistic theories underlying the annotation. The paper focuses on this variation on the one hand and the way how these heterogeneous data are integrated into one resource on the other hand.

Zwischen Syntagmatik und Paradigmatik : grammatische Eigennamenmarker und ihre Typologie (2005)

Nübling, Damaris

Auf Hockett (1960) basiert ein in der onomastischen Literatur oft zitiertes und im Fidschi auch realisiertes Verfahren, Eigennamen (EN) und Appellative (APP) prinzipiell voneinander zu unterscheiden, das durch große Einfachheit besticht.

Von "in die" über "in'n" und "ins" bis "im" : die Klitisierung von Präposition und Artikel als "Grammatikalisierungsbaustelle" (2005)

Nübling, Damaris

Die deutsche Präposition-Artikel-Enklise bietet wie kaum eine andere Grammatikalisierung Einblicke in den Mikrobereich von Grammatikalisierungsprozessen: Klare, "zielorientierte" Verhältnisse sind hier nicht zu beschreiben, was der Grund für ihre bisher so geringe Beachtung durch die Grammatikalisierungsforschung sein dürfte. Es wurde deutlich, dass bezüglich der hier als zentral bewerteten Morphologisierung des Artikels das gesamte Spektrum von Nichtverschmelzbarkeit bis hin zu (kurz vor Flexiven stehenden) obligatorisch verschmelzenden speziellen Klitika abgedeckt ist. Diachron hat sich zwar insgesamt eine deutliche Rechtsdrift auf der Grammatikalisierungsskala vollzogen; bezüglich des Genitivartikels hat jedoch eine Degrammatikalisierung in Form von sog. retraction (gemäß Hapelmath 2004) stattgefunden, die hier in einer Demorphologisierung (Resyntaktisierung) eines Klitikons besteht. Dabei findet keine "Relexikalisierung" im Sinne einer lexikalischen Anreicherung eines bereits grammatikalisierten Elements statt (siehe hierzu Haspelmath 1999). Mittel- und frühneuhochdeutsche Verschriftungen deuten auf reichere Inventare an Verschmelzungs formen hin, doch sind hierzu diachrone Untersuchungen erforderlich. Ebenso ist der Übergangsbereich zwIschen einfachen und speziellen Klitika in sich abgestuft und weitaus komplexer gestaltet als hier dargestellt. Auch dazu besteht Bedarf an Detailanalysen unter der Fragestellung, welche der unter Abschnitt 2.2 aufgeführten Artikelfunkttonen am ehesten eine Präposition-Artikel-Verschmelzung erfordern. Einiges deutet auf den am stärksten desemantisierten (expletiven) Artikel z.B. vor Eigennamen hin. Um den Einfluss von Schriftlichkeit und Standardisierung auf Grammatikalisierungsprozesse ermitteln zu können, wurden zwei Dialekte in den Blick genommen: das Ruhrdeutsche, das die Erwartung nach deutlich fortgeschritteneren Verhältnissen erfüllt, und das Alemannische, das andere Phänomene ausgebildet hat wie etwa die Proklise des Artikels an das Substantiv, die Nullrealisierung klitischer Artikelformen und den kategorialen Umbau der vier Nominalkategorien am Artikel. Die Einbeziehung weiterer Dialekte und vor allem auch der gesprochenen "Umgangssprache" könnte weiteren Aufschluss über die Ratio dieser Grammatikalisierung liefern. Sollten flektierende Präpositionen Ziel dieses Wandels sein, so hätte dies tiefgreifende Konsequenzen für die Grammatikschreibung.

Open Access

Linguistik

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Institute

24 search hits