OPUS 4 | Linguistik

Developing a TT-MCTAG for German with an RCG-based parser (2008)

Kallmeyer, Laura ; Lichte, Timm ; Maier, Wolfgang ; Parmentier, Yannick ; Dellert, Johannes

Developing linguistic resources, in particular grammars, is known to be a complex task in itself, because of (amongst others) redundancy and consistency issues. Furthermore some languages can reveal themselves hard to describe because of specific characteristics, e.g. the free word order in German. In this context, we present (i) a framework allowing to describe tree-based grammars, and (ii) an actual fragment of a core multicomponent tree-adjoining grammar with tree tuples (TT-MCTAG) for German developed using this framework. This framework combines a metagrammar compiler and a parser based on range concatenation grammar (RCG) to respectively check the consistency and the correction of the grammar. The German grammar being developed within this framework already deals with a wide range of scrambling and extraction phenomena.

Die deutschen Sprachinseln auf den Jurahöhen der französischsprachigen Schweiz (2004)

Siebenhaar, Beat

Die dialektale Verankerung regionaler Chats in der deutschsprachigen Schweiz (2005)

Siebenhaar, Beat

In regionalen Schweizer Chaträumen stellt die Mundart mit Anteilen um 80% bis 90% die unmarkierte Varietät dar. Chats bieten somit einen Einblick in die individuell geprägte Verschriftung der Schweizer Dialekte, die sich einerseits regional verschieden präsentiert und andererseits fern von Vereinheitlichungstendenzen liegt. Durch diese Normierungsferne lässt sich aus den Chatdaten in groben Zügen eine Sprachgeographie nachzeichnen, wie sie im Sprachatlas der deutschen Schweiz SDS (1962–1997) festgehalten ist. Hier sollen Reflexe der sprachgeographischen Verteilung in der Verschriftung der flektierten Formen von «haben» nachgezeichnet werden. Neben der grundsätzlichen Bestätigung dieser Struktur zeigen sich in der Analyse auch systematisch Abweichungen, die unter Berücksichtigung der Verschriftungsbarriere Hinweise auf Sprachwandel geben können, die jedoch mit authentischen Daten gesprochener Sprache überprüft werden müssen.

Die Modellierung zeitlicher Strukturen im Schweizerdeutschen (2005)

Siebenhaar, Beat

Die Prosodie der Mundarten wurde schon früh als auffälliges und distinktes Merkmal wahrgenommen und in mehreren Arbeiten zur Grammatik des Schweizerdeutschen mittels Musiknoten festgehalten (u. a. J. Vetsch 1910, E. Wipf 1910, K. Schmid 1915, W. Clauss 1927, A. Weber 1948), wobei schon A. Weber (1948, S. 53) anmerkt, "dass sich der musikalische Gang der Rede nicht ohne Gewaltsamkeit mit der üblichen Notenschrift darstellen lässt". Da also eine adäquate Kodierung, eine theoretische Grundlage und die notwendigen phonetischen Instrumente zur Intonationsforschung fehlten, wurden diese ersten Ansätze nicht aus- und weitergeführt. Erst in der Mitte des 20. Jahrhunderts brachte die technische Entwicklung Instrumente zur Messung der Prosodie hervor, die nun durch die Popularisierung der entsprechenden Computerprogramme im Übergang zum 21. Jahrhundert für die linguistische Forschung intensiv und breit genutzt werden können.

Die Sprachen der Städte (2008)

Siebenhaar, Beat

Die frühen Sprachkarten, für die Georg Wenker Ende des 19. Jh. in über 40.000 Schulorten des deutschen Reiches schriftliche Übersetzungen in die Mundart gesammelt hatte, dokumentieren die Sonderstellung vieler Städte im sprachlichen Raum. Zum Beispiel zeigen Berlin und die nähere Umgebung sprachliche Formen, die sonst erst weiter südlich oder in der Schriftsprache gelten.

Disagreement dissected : vagueness as a source of ambiguity in nominal (co-)reference (2006)

Versley, Yannick

Using a qualitative analysis of disagreements from a referentially annotated newspaper corpus, we show that, in coreference annotation, vague referents are prone to greater disagreement. We show how potentially problematic cases can be dealt with in a way that is practical even for larger-scale annotation, considering a real-world example from newspaper text.

Dulong texts : seven fully analyzed narrative and procedural texts (2001)

LaPolla, Randy J.

Dulong is a Tibeto-Burman language spoken in Gongshan Dulong and Nu Autonomous county in Yunnan, China, by members of the Dulong nationality (pop.: 6,000), and part of the Nu nationality (roughly 6,000 people).

Evaluating POS tagging under sub-optimal conditions : or: does meticulousness pay? (2000)

Kübler, Sandra ; Wagner, Andreas

In this paper, we investigate the role of sub-optimality in training data for part-of-speech tagging. In particular, we examine to what extent the size of the training corpus and certain types of errors in it affect the performance of the tagger. We distinguish four types of errors: If a word is assigned a wrong tag, this tag can belong to the ambiguity class of the word (i.e. to the set of possible tags for that word) or not; furthermore, the major syntactic category (e.g. "N" or "V") can be correctly assigned (e.g. if a finite verb is classified as an infinitive) or not (e.g. if a verb is classified as a noun). We empirically explore the decrease of performance that each of these error types causes for different sizes of the training set. Our results show that those types of errors that are easier to eliminate have a particularly negative effect on the performance. Thus, it is worthwhile concentrating on the elimination of these types of errors, especially if the training corpus is large.

Eventualities and different things : a reply (2005)

Maienborn, Claudia

“Comments are very welcome!” This basic attitude and the many ways of implementing it contribute immensely to the fascination of engaging in scientific research. I am grateful to Theoretical Linguistics for providing a public platform for this kind of scholarly exchange and I thank all commentators for their thoughtful, stimulating, and often challenging contributions to my target article. My response will address two main issues that are raised by the commentaries. The first issue is shaped by a cluster of questions relating to ontology. The second issue concerns questions of methodology pertaining in particular to the problem of judging data.

Factoring predicate argument and scope semantics : underspecified semantics with LTAG (1999)

Kallmeyer, Laura ; Joshi, Aravind K.

This paper proposes a compositional semantics for lexicalized tree adjoining grammars (LTAG). Tree-local multicompnent derivations allow seperation of semantiv contribution of a lexical item into one component contributing to the predicate argument structure and second a component contributing to scope semantics. Based on this idea a syntx-semantics interface is presented where the compositional semantics depends only on the derivation structure. It is shown that the derivation structure allows an appropriate amount of underspecification. This is illustrated by investigating underspecified representations for quantifier scpoe ambiguities and related phenomena such as adjunct scope and island constraints.

Open Access

Linguistik

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Institute

122 search hits