Linguistik
Refine
Year of publication
- 2009 (45) (remove)
Document Type
- Part of a Book (12)
- Preprint (11)
- Article (6)
- Conference Proceeding (6)
- Report (5)
- Doctoral Thesis (2)
- Periodical (1)
- Part of Periodical (1)
- Working Paper (1)
Language
- English (45) (remove)
Is part of the Bibliography
- no (45)
Keywords
- Pragmatik (6)
- Optimalitätstheorie (5)
- Informationsstruktur (3)
- Phonetik (3)
- Semantik (3)
- Sinotibetische Sprachen (3)
- Spieltheorie (3)
- Tibetobirmanische Sprachen (3)
- Baltoslawische Sprachen (2)
- Deutsch (2)
Institute
Multicomponent Tree Adjoining Grammars (MCTAGs) are a formalism that has been shown to be useful for many natural language applications. The definition of non-local MCTAG however is problematic since it refers to the process of the derivation itself: a simultaneity constraint must be respected concerning the way the members of the elementary tree sets are added. Looking only at the result of a derivation (i.e., the derived tree and the derivation tree), this simultaneity is no longer visible and therefore cannot be checked. I.e., this way of characterizing MCTAG does not allow to abstract away from the concrete order of derivation. In this paper, we propose an alternative definition of MCTAG that characterizes the trees in the tree language of an MCTAG via the properties of the derivation trees (in the underlying TAG) the MCTAG licences. We provide similar characterizations for various types of MCTAG. These characterizations give a better understanding of the formalisms, they allow a more systematic comparison of different types of MCTAG, and, furthermore, they can be exploited for parsing.
This paper investigates the class of Tree-Tuple MCTAG with Shared Nodes, TT-MCTAG for short, an extension of Tree Adjoining Grammars that has been proposed for natural language processing, in particular for dealing with discontinuities and word order variation in languages such as German. It has been shown that the universal recognition problem for this formalism is NP-hard, but so far it was not known whether the class of languages generated by TT-MCTAG is included in PTIME. We provide a positive answer to this question, using a new characterization of TT-MCTAG.
Traditionally, parsers are evaluated against gold standard test data. This can cause problems if there is a mismatch between the data structures and representations used by the parser and the gold standard. A particular case in point is German, for which two treebanks (TiGer and TüBa-D/Z) are available with highly different annotation schemes for the acquisition of (e.g.) PCFG parsers. The differences between the TiGer and TüBa-D/Z annotation schemes make fair and unbiased parser evaluation difficult [7, 9, 12]. The resource (TEPACOC) presented in this paper takes a different approach to parser evaluation: instead of providing evaluation data in a single annotation scheme, TEPACOC uses comparable sentences and their annotations for 5 selected key grammatical phenomena (with 20 sentences each per phenomena) from both TiGer and TüBa-D/Z resources. This provides a 2 times 100 sentence comparable testsuite which allows us to evaluate TiGer-trained parsers against the TiGer part of TEPACOC, and TüBa-D/Z-trained parsers against the TüBa-D/Z part of TEPACOC for key phenomena, instead of comparing them against a single (and potentially biased) gold standard. To overcome the problem of inconsistency in human evaluation and to bridge the gap between the two different annotation schemes, we provide an extensive error classification, which enables us to compare parser output across the two different treebanks. In the remaining part of the paper we present the testsuite and describe the grammatical phenomena covered in the data. We discuss the different annotation strategies used in the two treebanks to encode these phenomena and present our error classification of potential parser errors.
All's well that ends well
(2009)
A few years ago, Jasanoff adopted the central tenet of my accentological theory, viz. that the Balto-Slavic acute was a stød or glottal stop, not a rising tone (cf. Kortlandt 1975, 1977, 2004, Jasanoff 2004a). Of course, nobody will believe Jasanoff’s claim that he arrived at the same result independently thirty years after I published it and ten years after we discussed it when he came to Leiden to visit us. Though at the time he haughtily dismissed “the tangle of secondary hypotheses and “laws” that clutter the ground in the field of Balto-Slavic accentology” (Jasanoff 2004b: 171), he has now recognized the importance of Pedersen’s law, Hirt’s law, Winter’s law, Meillet’s law, Dolobko’s law, Dybo’s law and Stang’s law and largely accepted my relative chronology of these accent laws, including the loss of the acute shortly before Stang’s law (cf. Jasanoff 2008). He has also accepted my split of Pedersen’s law into a Balto-Slavic and a Slavic phase (to which a Lithuanian phase must be added), my thesis that the tonal contours of Baltic and Slavic languages are post-Balto-Slavic innovations (cf. Jasanoff 2008: 344, fn. 10), and the rise of a tonal distinction on non-acute initial syllables before Dybo’s law which I discussed at some length in my review (1978) of Garde’s monograph (1976). This is great progress.
We present a CYK and an Earley-style algorithm for parsing Range Concatenation Grammar (RCG), using the deductive parsing framework. The characteristic property of the Earley parser is that we use a technique of range boundary constraint propagation to compute the yields of non-terminals as late as possible. Experiments show that, compared to previous approaches, the constraint propagation helps to considerably decrease the number of items in the chart.
To some, the relation between bidirectional optimality theory and game theory seems obvious: strong bidirectional optimality corresponds to Nash equilibrium in a strategic game (Dekker and van Rooij 2000). But in the domain of pragmatics this formally sound parallel is conceptually inadequate: the sequence of utterance and its interpretation cannot be modelled reasonably as a strategic game, because this would mean that speakers choose formulations independently of a meaning that they want to express, and that hearers choose an interpretation irrespective of an utterance that they have observed. Clearly, the sequence of utterance and interpretation requires a dynamic game model. One such model, and one that is widely studied and of manageable complexity, is a signaling game. This paper is therefore concerned with an epistemic interpretation of bidirectional optimality, both strong and weak, in terms of beliefs and strategies of players in a signaling game. In particular, I suggest that strong optimality may be regarded as a process of internal self-monitoring and that weak optimality corresponds to an iterated process of such self-monitoring. This latter process can be derived by assuming that agents act rationally to (possibly partial) beliefs in a self-monitoring opponent.
There is every reason to welcome the revised edition (2009) of Thomas Olander’s dissertation (2006), which I have criticized elsewhere (2006). The book is very well written and the author has a broad command of the scholarly literature. I have not found any mistakes in Olander’s rendering of other people’s views. This makes the book especially useful as an introduction to the subject. It must be hoped that the easy access to a complex set of problems which this book offers will have a stimulating effect on the study of Balto-Slavic accentology.
Language contact has become a major focus of inquiry in historical and typological linguistics in the last twenty years, spurred in a large part by the publication of Thomason & Kaufman (1988), which tried to make sense of a large amount of language contact data. They argued that there was a direct relationship between the degree or intensity of language contact and the amount and type of influence the contact would have on one or more of the languages involved. Essentially, the greater the degree of bilingualism, the greater the degree of contact influence (see also Thomason 2001); if the contact and bilingualism was minimal, then there might just be a few loanwords adapted to the borrowing language's phonology and grammatical system, but if the contact and bilingualism was of a greater degree there would be influence in the grammar and phonology of the affected language. As more linguists came to take language contact more seriously, they came to realize how common language contact phenomena are.
Many linguists in China and the West have talked about Chinese as a topic-comment language, that is, a language in which the structure of the clause takes the form of a topic, about which something is to be said, and a comment, which is what is said about the topic, rather than being a language with a subject-predicate structure like that of English. Y. R. Chao (1968), for example, said that all Chinese clauses have topic-comment structure and there are no exceptions.
Distributional approximations to lexical semantics are very useful not only in helping the creation of lexical semantic resources (Kilgariff et al., 2004; Snow et al., 2006), but also when directly applied in tasks that can benefit from large-coverage semantic knowledge such as coreference resolution (Poesio et al., 1998; Gasperin and Vieira, 2004; Versley, 2007), word sense disambiguation (Mc- Carthy et al., 2004) or semantical role labeling (Gordon and Swanson, 2007). We present a model that is built from Webbased corpora using both shallow patterns for grammatical and semantic relations and a window-based approach, using singular value decomposition to decorrelate the feature space which is otherwise too heavily influenced by the skewed topic distribution of Web corpora.