OPUS 4 | Linguistik

Decorrelation and shallow semantic patterns for distributional clustering of nouns and verbs (2009)

Distributional approximations to lexical semantics are very useful not only in helping the creation of lexical semantic resources (Kilgariff et al., 2004; Snow et al., 2006), but also when directly applied in tasks that can benefit from large-coverage semantic knowledge such as coreference resolution (Poesio et al., 1998; Gasperin and Vieira, 2004; Versley, 2007), word sense disambiguation (Mc- Carthy et al., 2004) or semantical role labeling (Gordon and Swanson, 2007). We present a model that is built from Webbased corpora using both shallow patterns for grammatical and semantic relations and a window-based approach, using singular value decomposition to decorrelate the feature space which is otherwise too heavily influenced by the skewed topic distribution of Web corpora.

Parsing coordinations (2009)

Kübler, Sandra ; Hinrichs, Erhard ; Maier, Wolfgang ; Klett, Eva

The present paper is concerned with statistical parsing of constituent structures in German. The paper presents four experiments that aim at improving parsing performance of coordinate structure: 1) reranking the n-best parses of a PCFG parser, 2) enriching the input to a PCFG parser by gold scopes for any conjunct, 3) reranking the parser output for all possible scopes for conjuncts that are permissible with regard to clause structure. Experiment 4 reranks a combination of parses from experiments 1 and 3. The experiments presented show that n- best parsing combined with reranking improves results by a large margin. Providing the parser with different scope possibilities and reranking the resulting parses results in an increase in F-score from 69.76 for the baseline to 74.69. While the F-score is similar to the one of the first experiment (n-best parsing and reranking), the first experiment results in higher recall (75.48% vs. 73.69%) and the third one in higher precision (75.43% vs. 73.26%). Combining the two methods results in the best result with an F-score of 76.69.

Morphological word structure in English and Swedish : the evidence from prosody (2005)

Raffelsiefen, Renate

Trubetzkoy's recognition of a delimitative function of phonology, serving to signal boundaries between morphological units, is expressed in terms of alignment constraints in Optimality Theory, where the relevant constraints require specific morphological boundaries to coincide with phonological structure (Trubetzkoy 1936, 1939, McCarthy & Prince 1993). The approach pursued in the present article is to investigate the distribution of phonological boundary signals to gain insight into the criteria underlying morphological analysis. The evidence from English and Swedish suggests that necessary and sufficient conditions for word-internal morphological analysis concern the recognizability of head constituents, which include the rightmost members of compounds and head affixes. The claim is that the stability of word-internal boundary effects in historical perspective cannot in general be sufficiently explained in terms of memorization and imitation of phonological word form. Rather, these effects indicate a morphological parsing mechanism based on the recognition of word-internal head constituents. Head affixes can be shown to contrast systematically with modifying affixes with respect to syntactic function, semantic content, and prosodic properties. That is, head affixes, which cannot be omitted, often lack inherent meaning and have relatively unmarked boundaries, which can be obscured entirely under specific phonological conditions. By contrast, modifying affixes, which can be omitted, consistently have inherent meaning and have stronger boundaries, which resist prosodic fusion in all phonological contexts. While these correlations are hardly specific to English and Swedish it remains to be investigated to which extent they hold cross-linguistically. The observation that some of the constituents identified on the basis of prosodic evidence lack inherent meaning raises the issue of compositionality. I will argue that certain systematic aspects of word meaning cannot be captured with reference to the syntagmatic level, but require reference to the paradigmatic level instead. The assumption is then that there are two dimensions of morphological analysis: syntagmatic analysis, which centers on the criteria for decomposing words in terms of labelled constituents, and paradigmatic analysis, which centers on the criteria for establishing relations among (whole) words in the mental lexicon. While meaning is intrinsically connected with paradigmatic analysis (e.g. base relations, oppositeness) it is not essential to syntagmatic analysis.

"JazzmusikerInnen – weder Asketen noch Müsli-Fifis" : feministische Sprachkritik in der Schweiz ; ein Überblick (1998)

Peyer, Ann ; Wyss, Eva Lia

Mit Erstaunen stellen LinguistInnen aus Deutschland, Österreich und der Schweiz immer wieder fest, dass sich in der "kleinen" Schweiz der geschlechtergerechte Sprachgebrauch in Öffentlichkeit und Alltag weit stärker durchgesetzt hat als in den anderen deutschsprachigen Ländern. Diese Einschätzung gilt es hier zu überprüfen und, falls sie zutrifft, zu belegen. Ausserdem werden - als erster Schritt fur weitere Untersuchungen - Thesen formuliert, die Erklärungen liefern, worauf diese Entwicklung zurückgeführt werden kann. Mit diesem Artikel geben wir anband von ausgewählten, konkreten Beispielen einen Einblick in die Situation, wie sie sich zur Zeit in der Schweiz präsentiert. Wir konzentrieren uns - unter sprachsoziologischer Perspektive - auf eine erste Bestandesaufnahme mit dem Blick auf die Diskussion in den Medien, die Institutionalisierung und die Einstellungen, die die spezifische sprachliche Situation in der Deutschschweiz prägen. Einen Rahmen fur unsere Untersuchung bilden die Überlegungen von Schräpel (SCHRÄPEL 1986), die die Auseinandersetzung um nichtsexistische Sprache als ein besonderes Sprachwandelphänomen untersucht. Sprachwandel im Vollzug ist einerseits einfacher zu erfassen als einer, der weiter zurückliegt, andererseits erschwert die Fülle des greifbaren Materials auch den Durchblick und das klare Erkennen von Tendenzen. Aus diesem Grund werten wir unser Datenmaterial nicht quantitativ aus, sondern konzentrieren uns darauf, für verschiedene Aspekte typische Beispiele zu geben und so den Stand der öffentlichen Diskussion und die Breite der vertretenen Meinungen darzustellen. Es wäre verlockend, das hier vorliegende Material auch allgemeinerer Form unter der Thematik "Sprachkritik" oder "Einstellungen" zu analysieren. Dies ist jedoch nicht im Zentrum unserer Fragestellung, weshalb wir bei einigen Beispielen auf entsprechende Untersuchungen (z.B. BLAUBERGS 1980, SCHOENTHAL 1989) verweisen.

Intimität und Geschlecht : zur Syntax und Pragmatik der Anrede im Liebesbrief des 20. Jahrhunderts (2000)

Wyss, Eva Lia

Die Trennung der Lebenswelt in Privatsphäre und Öffentlichkeit käme der Verortung von Intimität entgegen. Es scheint aber, als ob Intimität nicht einem klar abgegrenzten Bereich zugeordnet werden kann, sondern nunmehr als relationale Kategorie zu fassen ist. Gerade der historische Vergleich (Vgl. CORBIN 1992) erlaubt weder einheitlich räumliche oder körperliche noch ästhetische Kriterien zur Abgrenzung von Intimität. ...

Unbeständigkeit und Wachsamkeit : erkenntnistheoretische Konzepte in der österreichischen und ungarischen Kultur der Jahrhundertwende (2002)

Teller, Katalin

Das ausgehende 19. und beginnende 20. Jahrhundert setzt sich von den erkenntnistheoretischen Konzepten der vorangegangenen Zeit deutlich ab:Während – stark vereinfacht – die Philosophie bis dahin die Möglichkeit der Erkenntnis entweder in der subjektiven oder objektiven Dimension zu finden glaubte,wobei die Funktion der Sprache im Erkenntnisprozess kaum hinterfragt wurde, wird zur Jahrhundertwende eine Tendenz deutlich, die einerseits die Adäquatheit der sprachlichen Vermittlung entweder in Frage stellt oder zumindest thematisiert, andererseits die tradierten Erkenntnismodi neu reflektiert oder ihnen sogar den Rücken kehrt.

Unity in diversity : integrating differing linguistic data in TUSNELDA (2005)

Wagner, Andreas

This paper describes the creation and preparation of TUSNELDA, a collection of corpus data built for linguistic research. This collection contains a number of linguistically annotated corpora which differ in various aspects such as language, text sorts / data types, encoded annotation levels, and linguistic theories underlying the annotation. The paper focuses on this variation on the one hand and the way how these heterogeneous data are integrated into one resource on the other hand.

Using the web to resolve coreferent bridging in German newspaper text (2007)

Versley, Yannick

We adopt Markert and Nissim (2005)’s approach of using the World Wide Web to resolve cases of coreferent bridging for German and discuss the strength and weaknesses of this approach. As the general approach of using surface patterns to get information on ontological relations between lexical items has only been tried on English, it is also interesting to see whether the approach works for German as well as it does for English and what differences between these languages need to be accounted for. We also present a novel approach for combining several patterns that yields an ensemble that outperforms the best-performing single patterns in terms of both precision and recall.

Parser evaluation across text types (2005)

Versley, Yannick

When a statistical parser is trained on one treebank, one usually tests it on another portion of the same treebank, partly due to the fact that a comparable annotation format is needed for testing. But the user of a parser may not be interested in parsing sentences from the same newspaper all over, or even wants syntactic annotations for a slightly different text type. Gildea (2001) for instance found that a parser trained on the WSJ portion of the Penn Treebank performs less well on the Brown corpus (the subset that is available in the PTB bracketing format) than a parser that has been trained only on the Brown corpus, although the latter one has only half as many sentences as the former. Additionally, a parser trained on both the WSJ and Brown corpora performs less well on the Brown corpus than on the WSJ one. This leads us to the following questions that we would like to address in this paper: - Is there a difference in usefulness of techniques that are used to improve parser performance between the same-corpus and the different-corpus case? - Are different types of parsers (rule-based and statistical) equally sensitive to corpus variation? To achieve this, we compared the quality of the parses of a hand-crafted constraint-based parser and a statistical PCFG-based parser that was trained on a treebank of German newspaper text.

From surface dependencies towards deeper semantic representations [Semantic representations] (2006)

Versley, Yannick ; Zinsmeister, Heike

In the past, a divide could be seen between ’deep’ parsers on the one hand, which construct a semantic representation out of their input, but usually have significant coverage problems, and more robust parsers on the other hand, which are usually based on a (statistical) model derived from a treebank and have larger coverage, but leave the problem of semantic interpretation to the user. More recently, approaches have emerged that combine the robustness of datadriven (statistical) models with more detailed linguistic interpretation such that the output could be used for deeper semantic analysis. Cahill et al. (2002) use a PCFG-based parsing model in combination with a set of principles and heuristics to derive functional (f-)structures of Lexical-Functional Grammar (LFG). They show that the derived functional structures have a better quality than those generated by a parser based on a state-of-the-art hand-crafted LFG grammar. Advocates of Dependency Grammar usually point out that dependencies already are a semantically meaningful representation (cf. Menzel, 2003). However, parsers based on dependency grammar normally create underspecified representations with respect to certain phenomena such as coordination, apposition and control structures. In these areas they are too "shallow" to be directly used for semantic interpretation. In this paper, we adopt a similar approach to Cahill et al. (2002) using a dependency-based analysis to derive functional structure, and demonstrate the feasibility of this approach using German data. A major focus of our discussion is on the treatment of coordination and other potentially underspecified structures of the dependency data input. F-structure is one of the two core levels of syntactic representation in LFG (Bresnan, 2001). Independently of surface order, it encodes abstract syntactic functions that constitute predicate argument structure and other dependency relations such as subject, predicate, adjunct, but also further semantic information such as the semantic type of an adjunct (e.g. directional). Normally f-structure is captured as a recursive attribute value matrix, which is isomorphic to a directed graph representation. Figure 5 depicts an example target f-structure. As mentioned earlier, these deeper-level dependency relations can be used to construct logical forms as in the approaches of van Genabith and Crouch (1996), who construct underspecified discourse representations (UDRSs), and Spreyer and Frank (2005), who have robust minimal recursion semantics (RMRS) as their target representation. We therefore think that f-structures are a suitable target representation for automatic syntactic analysis in a larger pipeline of mapping text to interpretation. In this paper, we report on the conversion from dependency structures to fstructure. Firstly, we evaluate the f-structure conversion in isolation, starting from hand-corrected dependencies based on the TüBa-D/Z treebank and Versley (2005)´s conversion. Secondly, we start from tokenized text to evaluate the combined process of automatic parsing (using Foth and Menzel (2006)´s parser) and f-structure conversion. As a test set, we randomly selected 100 sentences from TüBa-D/Z which we annotated using a scheme very close to that of the TiGer Dependency Bank (Forst et al., 2004). In the next section, we sketch dependency analysis, the underlying theory of our input representations, and introduce four different representations of coordination. We also describe Weighted Constraint Dependency Grammar (WCDG), the dependency parsing formalism that we use in our experiments. Section 3 characterises the conversion of dependencies to f-structures. Our evaluation is presented in section 4, and finally, section 5 summarises our results and gives an overview of problems remaining to be solved.

Open Access

Linguistik

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Institute

438 search hits