Refine
Document Type
- Doctoral Thesis (7)
- Magister's Thesis (1)
Language
- English (8)
Has Fulltext
- yes (8)
Is part of the Bibliography
- no (8)
Keywords
Institute
- Neuere Philologien (7)
- Informatik (1)
Human readers have the ability to infer knowledge from text, even if that particular information is not explicitly stated. In this thesis, we address the phenomena of text-level implicit information and outline novel automated methods for its recovery.
The main focus of this work is on two types of unexpressed content that arises between sentences (implicit discourse relations) and within sentences (implicit semantic roles).
Traditional approaches mostly rely on costly rich linguistic features, e.g., sentiment or frame-based lexicons, and require heuristics or manual feature engineering.
As an improvement, we propose a collection of generic resource-lean methods, implemented in the form of statistical background knowledge or by means of neural architectures.
Our models are largely language-independent and produce state-of-the-art performance, e.g., in the classification of Chinese implicit discourse relations, or the detection of locally covert predicative arguments in free texts.
In novel experiments, we quantitatively demonstrate that both types of implicit information are mutually dependent insofar as, for instance, some implicit roles directly correlate with implicit discourse relations of similar properties.
We show that implicit information processing further benefits downstream applications and demonstrate its applicability to the higher-level task of narrative story understanding.
In the conclusion of the dissertation, we argue for the need of implicit information processing in order to realize the goal of true natural language understanding.
Languages in general have various possibilities to express one and the same propositional content. One of these possibilities is grammatical variation. This thesis is concerned with the variation of the linear word order in a clause and the effects triggered by word order alternations. Although sharing the same propositional content, different word order variants can carry different functions; word order variation can be used to achieve certain stylistic effects. The dissertation looks at functional and stylistic preferences of English regarding variation from the canonical word order in (1).
(1) [Ernie]S [sits]V [on the table]O. (SVO)
The variation under consideration is locative inversion (LOCI), exemplified in (2).
(2) On the table sits Ernie.
As any variation from the canonical word order is said to strongly depend on the grammatical system of the language a sentence is realized in, the perspective is extended to the word order equivalent of the sentence above in German (3). The goal is to highlight possible differences/similarities between English and German with respect to one specific word order variant in a declarative main clause.
(3) Auf dem Tisch liegt ein Brief.
On the table lies a letter
‘On the table lies a letter’.
As the variation from the canonical word order is not expected to be coincidental in both languages, the features that favor the pattern under consideration are examined. This is done through a statistical analysis by employing two comparable corpora, the BNC for English and the TÜPP D/Z for German. The central questions for the thesis therefore are: What are the functions of the inverted constructions in English and German, what features favor their use in the respective languages, and how are they realized syntactically?
One finding is that German uses the syntactic pattern PP-V-NP for very similar reasons this pattern is used for in English. There seems to be a general tendency to order shorter before longer constituents. The syntactic pattern under consideration fulfills similar discourse functions in both languages. Both languages show similar preferences, they are driven by similar factors when having to decide on whether to stay with the canonical order or to prepose (respectively invert) the canonically postverbal PP.
This dissertation deals with the lexical, morphological, syntactic, and semantic properties of (VP )idioms and their behavior in combination with restrictive relative clauses, raising, constituent fronting, wh-movement, VP-ellipsis, pronominalization, the progressive form, verb placement, passivization, conjunction modification, and the N-after-N construction. It provides empirical evidence towards a combinatorial analysis of both semantically non-decomposable idioms (SNDIs) and semantically decomposable idioms (SDIs) and contributes to the (formal) formulation of such an account.
The Introduction (Chapter 1) first motivates why idioms are an exciting and challenging phenomenon and then gives a definition of the term idiom, a classification of idioms, and an overview of the wide spectrum of idiom analyses found in the linguistic literature.
Chapter 2, “Idioms as evidence for the proper analysis of relative clauses”, shows that the Modification Analysis beats the other two major analyses of restrictive relative clauses (RRCs), namely Raising and Matching, as (i) the latter two lead to a loss of numerous empirical generalizations in syntax and morphology, and (ii) contrary to the assumption in the literature, idioms in RRCs can, in fact, be licensed without literal syntactic movement of the RRC-head, which makes modification fully compatible with idiom reconstruction effects.
Chapter 3, “How frozen are frozen idioms?”, presents new empirical observations on the lexical, morphological, and syntactic flexibility of kick the bucket and displays that this idiom is not completely frozen with respect to its NP complement, the progressive form, and, in some contexts, even passivization. The chapter concludes that analyses of kick the bucket as a single lexical entry should be replaced by analyses of this and other SNDIs with a syntactically regular shape as consisting of individual word-level lexical entries that combine according to the standard rules of syntax.
This idea is taken up in Chapter 4, “The syntactic flexibility of semantically non-decomposable idioms”, which – based on the differences between English and German with regard to verb placement, constituent fronting, and passivization as well as a short outlook on Estonian and French – spells out a combinatorial analysis of SNDIs and augments it with a semantic analysis formulated in Lexical Resource Semantics, according to which some idiom parts make identical semantic contributions to the overall meaning of the idiom. The analysis further suggests that the syntactic flexibility of idioms is due to the semantic and pragmatic constraints on the involved constructions, rather than the syntactic encoding of the idioms.
Chapter 5, “Modification of literal meanings in semantically non-decomposable idioms”, reviews Ernst’s (1981) classical three types of idiom modification (internal, external, and conjunction) to then closely investigate the most challenging type, namely conjunction modification, in SNDIs. Based on naturally occurring examples of four SNDIs (two English, two German), it sketches an analysis in terms of two or more conjoined independent propositions, each of which can be the result of figurative reinterpretation. One of the propositions contains the idiomatic meaning, in (one of) the other(s), the meaning of the modifier applies to the literal meaning of the idiom’s noun.
Chapter 6, “Semantically decomposable idioms in the N-after-N construction”, offers a formal syntactic and semantic account of SDIs like pull strings in the N-after-N construction, as in Kim pulled string after string to get Alex into a good college. While the idiom contributes the type of entity at stake (‘string’ in the case of pull strings), N-after-N contributes that there are several instantiations of that type of entity and that these are subject to temporal or spatial succession. The chapter first summarizes the empirical properties of N-after-N, then provides an account of N-after-N in Head-driven Phrase Structure Grammar (HPSG), presents an updated version of the account of SDIs suggested in Chapter 2 within HPSG, and combines it with the HPSG account of N-after-N.
This thesis investigates the structure of research articles in the field of Computational Linguistics with the goal of establishing that a set of distinctive linguistic features is associated with each section type. The empirical results of the study are derived from the quantitative and qualitative evaluation of research articles from the ACL Anthology Corpus. More than 20,000 articles were analyzed for the purpose of retrieving the target section types and extracting the predefined set of linguistic features from them. Approximately 1,100 articles were found to contain all of the following five section types: abstract, introduction, related work, discussion, and conclusion. These were chosen for the purpose of comparing the frequency of occurrence of the linguistic features across the section types. Making use of frameworks for Natural Language Processing, the Stanford CoreNLP Module, and the Python library SpaCy, as well as scripts created by the author, the frequency scores of the features were retrieved and analyzed with state-of-the-art statistical techniques.
The results show that each section type possesses an individual profile of linguistic features which are associated with it more or less strongly. These section-feature associations are shown to be derivable from the hypothesized purpose of each section type.
Overall, the findings reported in this thesis provide insights into the writing strategies that authors employ so that the overall goal of the research paper is achieved.
The results of the thesis can find implementation in new state-of-the-art applications that assist academic writing and its evaluation in a way that provides the user with a more sophisticated, empirically based feedback on the relationship between linguistic mechanisms and text type. In addition, the potential of the identification of text-type specific linguistic characteristics (a text-feature mapping) can contribute to the development of more robust language-based models for disinformation detection.
The present study is concerned with the syntactic flexibility of English idioms. It is argued that two aspects must be considered when explaining the syntactic behavior of idioms. First, the idiom in question must decomposable, meaning that the individual parts must have some independent meaning. Secondly, pragmatic factors and speakers' motivations must be taken into account. This corpus-based study and its results support a speaker-based grammar model. Furthermore, some syntactic constructions can be generally ruled out for idioms.
Syntactic and semantic aspects of supplementary relative clauses in English and Sōrānī Kurdish
(2020)
In this thesis, I examine and analyse supplementary relative clauses(SRCs), also known as non-restrictive relative clauses. SRCs have received considerably less attention in the study of relative clauses than integrated, or restrictive, relative clauses (IRCs). The (surface) syntactic structure of the two types of relative clauses (RCs) is largely identical. Therefore, it is not straightforward to determine where to locate the difference in the interpretation between IRCs and SRCs.To address this question, I focus on two types of English SRCs: determiner-which RCs, and SRCs introduced by that. Determiner-which RCs can only be interpreted as SRCs. Previous HPSG approaches built on the generalisation that that RCs cannot be SRCs. Hence there is no HPSG analysis for relative that in SRCs. In this thesis I show the acceptability of the two constructions by the American native speakers and provide both structures with an HPSG analysis.I extend my discussion beyond English by looking at relative clauses in Sorānī Kurdish. I argue that RCs in Sorānī Kurdish share essential properties withEnglish bare RCs and that RCs, though Sorānī Kurdish has no equivalent of wh-RCs. I also provide Sorānī Kurdish with an HPSG analysis.
Nominal modification in language production: Extraposition of prepositional phrases in german
(2019)
In my dissertation, I investigate the phenomenon of extraposition of PP out of NP in German in language production. Four production experiments, using the method of production of memory, and three experiments testing the acceptability of extraposition were conducted. In extraposition, a constituent is realized in a position to the right of what would be considered the canonical position. A special case is extraposition out of a nominal phrase (NP), in which a constituent is moved out of NP to the end of the utterance. The example in (1a) illustrates the canonical version, in which a prepositional phrase (PP) is adjacent to its head noun. In (1b) the PP is extraposed out of NP to the right edge of the sentence.
(1) a. Gestern hat eine Frau mit einer lauten, schrillen Stimme angerufen.
b. Gestern hat eine Frau angerufen mit einer lauten, schrillen Stimme.
There are two main aspects to consider: the length of the extraposed constituent (the PP), and the length of the intervening material. Experiment 1 investigated the influence of constituent length on extraposition. The hypothesis is that longer and more complex constituents are harder to produce and are therefore produced towards the end of the utterance. In the experiment, PPs of three different lengths (2-3, 5-6, 9-11 words) had to be reproduced in either adjacent or extraposed position. As to the length of the intervening material, the hypothesis is that sentences with more intervening material between head noun and extraposed PP will tend to be reproduced with the PP in adjacent position to the head noun. In order to test this hypothesis, the length of the intervening material (1, 2 and 4 words) was manipulated in Experiment 2. The same material was used in an acceptability experiment, using the method of magnitude estimation (Experiment 5).
Previous studies found that extraposition is preferred over verbal material only, thus Experiment 3 investigated the influence of different lengths of purely verbal intervening material. Experiment 4 was concerned with the differences between PP and RC extraposition in production.
Experiment 6 and 7 used Likert scales to assess the acceptability of extraposition. Experiment 6 investigated whether the acceptability of extraposition is influenced by the definiteness status of the NP out of which is extraposed and if a soft constraint for definiteness can be found for PP extraposition in German. Experiment 7 asked if the inner structure of the extraposed constituent (PP only vs. PP+RC) influences its acceptability. An extraposed PP that includes an RC should be "heavier" than a PP without an RC, since the number of phrasal nodes is higher. If indeed heavier constituents are realized at the end of an utterance, the acceptability of an extraposed PP that includes an RC should be higher than that of an extraposed PP without one.
The results of the production experiments show that sentences are mostly reproduced in their original linear sequence, which suggests that extraposed position seems to be just as canonical as adjacent position, especially when extraposition takes place over verbal material only. With regard to constituent length, in extraposed position long PPs are shortened less often, supporting the hypothesis that longer and more complex constituents tend to be produced at the end of the utterance. Recency effects were found for intervening material as participants dropped intervening material rather than change syntactic position of constituents. The length and type of the intervening material is important with respect to how much intervening material is acceptable. Verb clusters were not shortened in sentences with extraposed PPs, however, 1⁄3 of adverbs and 1⁄2 of PP adverbials including a lexical NP were shortened to „verb only“. Extraposed PPs are more often reproduced in adjacent position than adjacent PPs are reproduced in extraposed position. However, the position of RCs is more often changed from adjacent to extraposed than from extraposed to adjacent.
While producing extraposed PPs seems not to be any more difficult than producing adjacent ones, adjacent constituents are consistently rated higher than extraposed constituents in grammaticality judgment tasks. This is in line with findings of Konieczny (2000) on German RC extraposition. The number of phrasal nodes, as suggested by Rickford et al. (1995), did not have an influence on the acceptability of extraposition, while the length of the constituent, measured in words, seems to play a role. Definiteness had no effect on adjacent PPs, but when the PP was extraposed, sentences with an indefinite antecedent were rated higher than sentences with a definite antecedent. This suggests that there is a "soft constraint" for definiteness with regard to PP extraposition out of NP in German.
This dissertation provides a comprehensive account of the grammar of relative clause extraposition in English. Based on a systematic review and evaluation of the empirical generalizations and theoretical approaches provided in the literature on generative grammar, it is shown that none of the previous theories is able to account for all the relevant facts. Among the most problematic data are the Principle C and scope effects of relative clause extraposition, cases with obligatory relative clauses, and relative clauses with elliptical NPs as antecedents.
I propose a new analysis of relative clause extraposition within the constraint-based, monostratal grammatical framework of Head-driven Phrase Structure Grammar (HPSG), enhanced with the semantic theory of Lexical Resource Semantics (LRS). Crucially, it is a general analysis of relative clause attachment, since both canonical and extraposed relative clauses are licensed by the same syntactic and semantic constraints. The basic assumption is that a relative clause can be adjoined to any phrase that contains a suitable antecedent of the relative pronoun. The semantic information that licenses the relative clause is introduced by the determiner of the antecedent NP. The techniques of underspecified semantics and the standard semantic representation language used by LRS make it possible to formulate constraints which yield the correct intersective interpretation of the relative clause (arbitrarily distant from its antecedent NP) and at the same time link the scope of the antecedent NP to the adjunction site of the relative clause.
In combination with the revised HPSG binding theory developed in this dissertation, the proposed analysis is able to capture the major properties of relative clause attachment within a unified and internally consistent monostratal constraint-based grammatical framework.