OPUS 4 | Linguistik

Parser evaluation across text types (2005)

When a statistical parser is trained on one treebank, one usually tests it on another portion of the same treebank, partly due to the fact that a comparable annotation format is needed for testing. But the user of a parser may not be interested in parsing sentences from the same newspaper all over, or even wants syntactic annotations for a slightly different text type. Gildea (2001) for instance found that a parser trained on the WSJ portion of the Penn Treebank performs less well on the Brown corpus (the subset that is available in the PTB bracketing format) than a parser that has been trained only on the Brown corpus, although the latter one has only half as many sentences as the former. Additionally, a parser trained on both the WSJ and Brown corpora performs less well on the Brown corpus than on the WSJ one. This leads us to the following questions that we would like to address in this paper: - Is there a difference in usefulness of techniques that are used to improve parser performance between the same-corpus and the different-corpus case? - Are different types of parsers (rule-based and statistical) equally sensitive to corpus variation? To achieve this, we compared the quality of the parses of a hand-crafted constraint-based parser and a statistical PCFG-based parser that was trained on a treebank of German newspaper text.

Was ist ein sprachlicher Fehler? : Anmerkungen zu populärer Sprachkritik am Beispiel der Kolumnensammlung von Bastian Sick (2005)

Schneider, Jan Georg

Woher kommt das neuerwachte Interesse an Sprachrichtigkeit? Woher kommt die ausgeprägte sprachliche Unsicherheit, die auch bei vielen hochgebildeten Menschen den Wunsch entstehen lässt, von Sprachpflegern über ihr Ureigenstes, nämlich ihre Muttersprache, belehrt zu werden? Obwohl Antworten auf diese Fragen letztlich spekulativ bleiben, wage ich doch die These, dass eine Ursache hierfür die Rechtschreibreform ist, die von einem Großteil der Bevölkerung nach wie vor nicht angenommen wird, die insgesamt weder zur Vereinfachung noch zu einer höheren Einheitlichkeit geführt hat; die aber andererseits ein öffentliches Nachdenken und Diskutieren über Sprachrichtigkeit in Gang setzte. – Jedenfalls ist die Verunsicherung ein Faktum, das von Linguisten nicht ignoriert werden sollte.

Zur Normativität von Sprachregeln : ist Sprechen regelgeleitetes Handeln? (2005)

Schneider, Jan Georg

Ausgangspunkt: Die Kritik am "Zwei-Welten-Modell": Die grundlegende linguistische Unterscheidung zwischen "Sprache" und "Sprechen" ist im Rahmen der neueren Debatten um Sprachmedialität wieder verstärkt thematisiert und kritisiert worden. Lässt sich dieses schulbildende, in der Linguistik geradezu eherne Begriffspaar überhaupt noch sinnvollerweise aufrechterhalten? Oder muss es mindestens umdefiniert, vielleicht sogar gänzlich verworfen werden? Hat sich insbesondere die auf Chomsky zurückgehende Unterscheidung von Sprachkompetenz und -performanz nicht von selbst ad absurdum geführt, nachdem der linguistische Kognitivismus chomskyscher Provenienz Sprache als lebendiges Phänomen, als Medium menschlicher Kommunikation, vollständig aus dem Blick verloren hat? Führt nicht schon die scheinbar harmlose linguistische Differenzierung zwischen einer Sprachregel und ihrer Anwendung zu einer irreführenden und unangemessenen Verdinglichung von Sprache? ...

Focus accent, word length and position as cues to L1 and L2 word recognition (2005)

Sennema, Anke ; Vijver, Ruben van de ; Carroll, Susanne E. ; Zimmer-Stahl, Anne

The present study examines native and nonnative perceptual processing of semantic information conveyed by prosodic prominence. Five groups of German learners of English each listened to one of 5 experimental conditions. Three conditions differed in place of focus accent in the sentence and two conditions were with spliced stimuli. The experiment condition was presented first in the learners’ L1 (German) and then in a similar set in the L2 (English). The effect of the accent condition and of the length and position of the target in the sentence was evaluated in a probe recognition task. In both the L1 and L2 tasks there was no significant effect in any of the five focus conditions. Target position and target word length had an effect in the L1 task. Word length did not affect accuracy rates in the L2 task. For probe recognition in the L2, word length and the position of the target interacted with the focus condition.

Multiple hierarchies : new aspects of an old solution (2005)

Witt, Andreas

In this paper, we present the Multiple Annotation approach, which solves two problems: the problem of annotating overlapping structures, and the problem that occurs when documents should be annotated according to different, possibly heterogeneous tag sets. This approach has many advantages: it is based on XML, the modeling of alternative annotations is possible, each level can be viewed separately, and new levels can be added at any time. The files can be regarded as an interrelated unit, with the text serving as the implicit link. Two representations of the information contained in the multiple files (one in Prolog and one in XML) are described. These representations serve as a base for several applications.

Diskurspragmatische Faktoren für Topikalität und Verbstellung in der ahd. Tatianübersetzung (9. Jh.) (2005)

Hinterhölzl, Roland ; Petrova, Svetlana ; Solf, Michael

The paper presents work in progress on the interaction between information structure and word order in Old High German based on data from the Tatian translation (9th century). The examination of the position of the finite verb in correspondence with the pragmatic status of discourse referents reveals an overall tendency for verb-initial order in thetic/all-focus sentences, whereas in categorical / topic-comment sentences verb-second placement with an initial topic constituent is preferred. This conclusion provides support for the hypothesis stated in Donhauser & Hinterhölzl (2003) that the finite verb form in Early Germanic serves to distinguish the information-structural domains of Topic and Focus. Finally, the investigation sheds light on the process of language change that led to the overall spread of verb-second in main clauses of modern German.

Out-of-focus encoding in Gur and Kwa (2005)

Fiedler, Ines ; Schwarz, Anne

This paper investigates the structural properties of morphosyntactically marked focus constructions, focussing on the often neglected non-focal sentence part in African tone languages. Based on new empirical evidence from five Gur and Kwa languages, we claim that these focus expressions have to be analysed as biclausal constructions even though they do not represent clefts containing restrictive relative clauses. First, we relativize the partly overgeneralized assumptions about structural correspondences between the out-of-focus part and relative clauses, and second, we show that our data do in fact support the hypothesis of a clause coordinating pattern as present in clause sequences in narration. It is argued that we deal with a non-accidental, systematic feature and that grammaticalization may conceal such basic narrative structures.

The semantics of ellipsis (2005)

Elbourne, Paul

There are four phenomena that are particularly troublesome for theories of ellipsis: the existence of sloppy readings when the relevant pronouns cannot possibly be bound; an ellipsis being resolved in such a way that an ellipsis site in the antecedent is not understood in the way it was there; an ellipsis site drawing material from two or more separate antecedents; and ellipsis with no linguistic antecedent. These cases are accounted for by means of a new theory that involves copying syntactically incomplete antecedent material and an analysis of silent VPs and NPs that makes them into higher order definite descriptions that can be bound into.

Question-answer test and givenness : some question marks (2005)

Kasimir, Elke

In order to investigate the empirical properties of focus, it is necessary to diagnose focus (or: “what is focused”) in particular linguistic examples. It is often taken for granted that the application of one single diagnostic tool, the so-called question-answer test, which roughly says that whatever a question asks for is focused in the answer, is a fool-proof test for focus. This paper investigates one example class where such uncritical belief in the question-answer test has led to the assumption of rather complex focus projection rules: in these examples, pitch accent placement has been claimed to depend on certain parts of the focused constituents being given or not. It is demonstrated that such focus projection rules are unnecessarily complex and in turn require the assumption of unnecessarily complicated meaning rules, not to speak of the difficulties to give a precise semantic/pragmatic definition of the allegedly involved givenness property. For the sake of the argument, an alternative analysis is put forward which relies solely on alternative sets following Mats Rooth´s work, and avoids any recourse to givenness. As it turns out, this alternative analysis is not only simpler but also makes in a critical case the better predictions.

Refining queries on a treebank with XSLT filters. Approaching the universal quantifier (2005)

Smith, George

This paper discusses the use of XSLT stylesheets as a filtering mechanism for refining the results of user queries on treebanks. The discussion is within the context of the TIGER treebank, the associated search engine and query language, but the general ideas can apply to any search engine for XML-encoded treebanks. It will be shown that important classes of linguistic phenomena can be accessed by applying relatively simple XSLT templates to the output of a query, effectively simulating the universal quantifier for a subset of the query language.

EXMARaLDA und Datenbank "Mehrsprachigkeit" - Konzepte und praktische Erfahrungen (2005)

Schmidt, Thomas

In diesem Aufsatz geht es um die Datenbank ‚Mehrsprachigkeit’ und das System EXMARaLDA, die am SFB 538 ‚Mehrsprachigkeit’ der Universität Hamburg entwickelt werden. Da deren konzeptuelle und technische Details bereits an anderer Stelle ausführlich dargestellt worden sind (z.B. Schmidt 2004), soll der Schwerpunkt hier einerseits auf solchen Aspekten liegen, die – gemäß dem Thema des Workshops – mit allgemeineren Fragen zum Umgang mit computerverwertbaren, heterogenen linguistischen Datenbeständen zu tun haben. Andererseits soll versucht werden, aus den praktischen Erfahrungen der nunmehr vierjährigen Projektarbeit einige Erkenntnisse abzuleiten, die über den konkreten Projektzusammenhang hinaus für die weitere Arbeit auf diesem Gebiet interessant sein könnten.

Exploring lexical patterns in text : lexical cohesion analysis with WordNet (2005)

Teich, Elke ; Fankhauser, Peter

We present a system for the linguistic exploration and analysis of lexical cohesion in English texts. Using an electronic thesaurus-like resource, Princeton WordNet, and the Brown Corpus of English, we have implemented a process of annotating text with lexical chains and a graphical user interface for inspection of the annotated text. We describe the system and report on some sample linguistic analyses carried out using the combined thesaurus-corpus resource.

Structuring information through gesture and intonation (2005)

Jannedy, Stefanie ; Mendoza-Denton, Norma

Face-to-face communication is multimodal. In unscripted spoken discourse we can observe the interaction of several “semiotic layers”, modalities of information such as syntax, discourse structure, gesture, and intonation. We explore the role of gesture and intonation in structuring and aligning information in spoken discourse through a study of the co-occurrence of pitch accents and gestural apices. Metaphorical spatialization through gesture also plays a role in conveying the contextual relationships between the speaker, the government and other external forces in a naturally-occurring political speech setting.

Stop bashing givenness : a note on Elke Kasimir´s "questions-answers test and givenness" (2005)

Weskott, Thomas

Elke Kasimir´s paper (in this volume) argues against employing the notion of Givenness in the explanation of accent assignment. I will claim that the arguments against Givenness put forward by Kasimir are inconclusive because they beg the question of the role of Givenness. It is concluded that, more generally, arguments against Givenness as a diagnostic for information structural partitions should not be accepted offhand, since the notion of Givenness of discourse referents is (a) theoretically simple, (b) readily observable and quantifiable, and (c) bears cognitive significance.

VP-fronting in Czech and Polish : a case study in corpus-oriented grammar research (2005)

Meyer, Roland

Fronting of an infinite VP across a finite main verb - akin to German "VP-topicalization" - can be found also in Czech and Polish. The paper discusses evidence from large corpora for this process and some of its properties, both syntactic and information-structural. Based on this case, criteria for more user-friedly searching and retrieval of corpus data in syntactic research are being developed.

Unity in diversity : integrating differing linguistic data in TUSNELDA (2005)

Wagner, Andreas

This paper describes the creation and preparation of TUSNELDA, a collection of corpus data built for linguistic research. This collection contains a number of linguistically annotated corpora which differ in various aspects such as language, text sorts / data types, encoded annotation levels, and linguistic theories underlying the annotation. The paper focuses on this variation on the one hand and the way how these heterogeneous data are integrated into one resource on the other hand.

Über Trampelpfade, sichtbare Hände und Sprachwandelprozesse (2005)

Kabatek, Johannes

A discourse-based account of Spanish ser/estar (2005)

Maienborn, Claudia

The study offers a discourse-based account of the Spanish copula forms ser and estar, which are generally considered to be lexical exponents of the stage-level/individual-level contrast. It argues against the popular view that the distinction between SLPs and ILPs rests on a fundamental cognitive division of the world that is reflected in the grammar. As it happens, conceptual oppositions like “temporary vs. permanent” or “arbitrary vs. essential“ provide only a preference for the interpretation of estar and ser. In addition, the evidence for an SLP/ILP impact on the grammar turns out to be far less conclusive than is currently assumed. The study argues against event-based accounts of the ser/estar contrast in particular, showing that ser and estar pattern alike in failing all of the standard eventuality tests. The discourse-based account proposed instead assumes that ser and estar both display the same lexical semantics (which is identical to the semantics of English be, German sein, etc.); estar differs from ser only in presupposing a relation to a specific discourse situation. By using estar a speaker restricts his or her claim to a specific discourse situation, whereas by using ser, the speaker makes no such restriction. The preference for interpreting estar predications as denoting temporary properties and ser predications as denoting permanent properties follows from economy principles driving the pragmatic legitimation of estars discourse dependence. The analysis proposed in this paper can also account for the observation that ser predications do not give rise to thetic judgements. The proposal is couched in terms of the framework of DRT.

Das Zustandspassiv : grammatische Einordnung – Bildungsbeschränkungen – Interpretationsspielraum (2005)

Maienborn, Claudia

Eventualities and different things : a reply (2005)

Maienborn, Claudia

“Comments are very welcome!” This basic attitude and the many ways of implementing it contribute immensely to the fascination of engaging in scientific research. I am grateful to Theoretical Linguistics for providing a public platform for this kind of scholarly exchange and I thank all commentators for their thoughtful, stimulating, and often challenging contributions to my target article. My response will address two main issues that are raised by the commentaries. The first issue is shaped by a cluster of questions relating to ontology. The second issue concerns questions of methodology pertaining in particular to the problem of judging data.

Open Access

Linguistik

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Institute

152 search hits