Refine
Year of publication
- 2009 (12) (remove)
Document Type
- Preprint (12) (remove)
Language
- English (12) (remove)
Has Fulltext
- yes (12)
Is part of the Bibliography
- no (12)
Keywords
- Multicomponent Tree Adjoining Grammar (2)
- Syntaktische Analyse (2)
- Algorithmus (1)
- Coreference annotation (1)
- Deutsch (1)
- Lehnwort (1)
- MCTAG (1)
- Phonetik (1)
- Range Concatenation Grammar (1)
- Simple Range Concatenation Grammar (1)
Institute
The STAR Collaboration at the Relativistic Heavy Ion Collider presents measurements of 𝐽/𝜓→𝑒+𝑒− at midrapidity and high transverse momentum (𝑝𝑇>5 GeV/𝑐) in 𝑝+𝑝 and central Cu+Cu collisions at √𝑠𝑁𝑁=200 GeV. The inclusive 𝐽/𝜓 production cross section for Cu+Cu collisions is found to be consistent at high 𝑝𝑇 with the binary collision-scaled cross section for 𝑝+𝑝 collisions. At a confidence level of 97%, this is in contrast to a suppression of 𝐽/𝜓 production observed at lower 𝑝𝑇. Azimuthal correlations of 𝐽/𝜓 with charged hadrons in 𝑝+𝑝 collisions provide an estimate of the contribution of 𝐵-hadron decays to 𝐽/𝜓 production of 13%±5%.
The aim of this paper is to address two main counterarguments raised in Landau (2007) against the movement analysis of Control, and especially against the phenomenon of Backward Control. The paper shows that unlike the situation described in Tsez (Polinsky & Potsdam 2002), Landau's objections do not hold for Greek and Romanian, where all obligatory control verbs exhibit Backward Control. Our results thus provide stronger empirical support for a theoretical approach to Control in terms of Movement, as defended in Hornstein (1999 and subsequent work).
In the recent literature the phenomenon of long distance agreement has become the focus of several studies as it seems to violate certain locality conditions which require that agreeing elements in general stand in clause-mate relationships. In particular, it involves a verb agreeing with a constituent which is located in the verb's clausal complement and hence poses a challenge for theories that assume a strictly local relationship for agreement. In this paper we present empirical evidence from Greek and Romanian for the reality of long distance agreement. Specifically, we focus on raising constructions in these two languages and we show that they do not involve movement but rather instantiate long distance agreement. We further argue that subjunctives allowing long distance agreement lack both a CP layer and semantic Tense. However, since the embedded verb also bears phi-features, these constructions pose a further problem for assumptions that view the presence of phi-features as evidence for the presence of a C layer. Finally, we raise the question of the common properties that these languages have that lead to the presence of long distance agreement.
We show that loanword adaptation can be understood entirely in terms of phonological and phonetic comprehension and production mechanisms in the first language. We provide explicit accounts of several loanword adaptation phenomena (in Korean) in terms of an Optimality-Theoretic grammar model with the same three levels of representation that are needed to describe L1 phonology: the underlying form, the phonological surface form, and the auditory-phonetic form. The model is bidirectional, i.e., the same constraints and rankings are used by the listener and by the speaker. These constraints and rankings are the same for L1 processing and loanword adaptation.
Multicomponent Tree Adjoining Grammars (MCTAGs) are a formalism that has been shown to be useful for many natural language applications. The definition of non-local MCTAG however is problematic since it refers to the process of the derivation itself: a simultaneity constraint must be respected concerning the way the members of the elementary tree sets are added. Looking only at the result of a derivation (i.e., the derived tree and the derivation tree), this simultaneity is no longer visible and therefore cannot be checked. I.e., this way of characterizing MCTAG does not allow to abstract away from the concrete order of derivation. In this paper, we propose an alternative definition of MCTAG that characterizes the trees in the tree language of an MCTAG via the properties of the derivation trees (in the underlying TAG) the MCTAG licences. We provide similar characterizations for various types of MCTAG. These characterizations give a better understanding of the formalisms, they allow a more systematic comparison of different types of MCTAG, and, furthermore, they can be exploited for parsing.
We present a CYK and an Earley-style algorithm for parsing Range Concatenation Grammar (RCG), using the deductive parsing framework. The characteristic property of the Earley parser is that we use a technique of range boundary constraint propagation to compute the yields of non-terminals as late as possible. Experiments show that, compared to previous approaches, the constraint propagation helps to considerably decrease the number of items in the chart.
This paper investigates the class of Tree-Tuple MCTAG with Shared Nodes, TT-MCTAG for short, an extension of Tree Adjoining Grammars that has been proposed for natural language processing, in particular for dealing with discontinuities and word order variation in languages such as German. It has been shown that the universal recognition problem for this formalism is NP-hard, but so far it was not known whether the class of languages generated by TT-MCTAG is included in PTIME. We provide a positive answer to this question, using a new characterization of TT-MCTAG.
Parsing coordinations
(2009)
The present paper is concerned with statistical parsing of constituent structures in German. The paper presents four experiments that aim at improving parsing performance of coordinate structure: 1) reranking the n-best parses of a PCFG parser, 2) enriching the input to a PCFG parser by gold scopes for any conjunct, 3) reranking the parser output for all possible scopes for conjuncts that are permissible with regard to clause structure. Experiment 4 reranks a combination of parses from experiments 1 and 3. The experiments presented show that n- best parsing combined with reranking improves results by a large margin. Providing the parser with different scope possibilities and reranking the resulting parses results in an increase in F-score from 69.76 for the baseline to 74.69. While the F-score is similar to the one of the first experiment (n-best parsing and reranking), the first experiment results in higher recall (75.48% vs. 73.69%) and the third one in higher precision (75.43% vs. 73.26%). Combining the two methods results in the best result with an F-score of 76.69.
Traditionally, parsers are evaluated against gold standard test data. This can cause problems if there is a mismatch between the data structures and representations used by the parser and the gold standard. A particular case in point is German, for which two treebanks (TiGer and TüBa-D/Z) are available with highly different annotation schemes for the acquisition of (e.g.) PCFG parsers. The differences between the TiGer and TüBa-D/Z annotation schemes make fair and unbiased parser evaluation difficult [7, 9, 12]. The resource (TEPACOC) presented in this paper takes a different approach to parser evaluation: instead of providing evaluation data in a single annotation scheme, TEPACOC uses comparable sentences and their annotations for 5 selected key grammatical phenomena (with 20 sentences each per phenomena) from both TiGer and TüBa-D/Z resources. This provides a 2 times 100 sentence comparable testsuite which allows us to evaluate TiGer-trained parsers against the TiGer part of TEPACOC, and TüBa-D/Z-trained parsers against the TüBa-D/Z part of TEPACOC for key phenomena, instead of comparing them against a single (and potentially biased) gold standard. To overcome the problem of inconsistency in human evaluation and to bridge the gap between the two different annotation schemes, we provide an extensive error classification, which enables us to compare parser output across the two different treebanks. In the remaining part of the paper we present the testsuite and describe the grammatical phenomena covered in the data. We discuss the different annotation strategies used in the two treebanks to encode these phenomena and present our error classification of potential parser errors.
In this paper, we argue that difficulties in the definition of coreference itself contribute to lower inter-annotator agreement in certain cases. Data from a large referentially annotated corpus serves to corroborate this point, using a quantitative investigation to assess which effects or problems are likely to be the most prominent. Several examples where such problems occur are discussed in more detail, and we then propose a generalisation of Poesio, Reyle and Stevenson’s Justified Sloppiness Hypothesis to provide a unified model for these cases of disagreement and argue that a deeper understanding of the phenomena involved allows to tackle problematic cases in a more principled fashion than would be possible using only pre-theoretic intuitions.
Distributional approximations to lexical semantics are very useful not only in helping the creation of lexical semantic resources (Kilgariff et al., 2004; Snow et al., 2006), but also when directly applied in tasks that can benefit from large-coverage semantic knowledge such as coreference resolution (Poesio et al., 1998; Gasperin and Vieira, 2004; Versley, 2007), word sense disambiguation (Mc- Carthy et al., 2004) or semantical role labeling (Gordon and Swanson, 2007). We present a model that is built from Webbased corpora using both shallow patterns for grammatical and semantic relations and a window-based approach, using singular value decomposition to decorrelate the feature space which is otherwise too heavily influenced by the skewed topic distribution of Web corpora.