Refine
Year of publication
- 2007 (7) (remove)
Document Type
- Preprint (7) (remove)
Language
- English (7)
Has Fulltext
- yes (7)
Is part of the Bibliography
- no (7) (remove)
Keywords
- Syntaktische Analyse (2)
- Computerlinguistik (1)
- Datenstruktur (1)
- Formalismes syntaxiques (1)
- Hilfsverb (1)
- Kongress (1)
- Lexicalized Tree Adjoining Grammar (1)
- Mittelenglisch (1)
- Multicomponent Tree Adjoining Grammar (1)
- Perfekt (1)
Institute
- Extern (7) (remove)
Recent approaches to Word Sense Disambiguation (WSD) generally fall into two classes: (1) information-intensive approaches and (2) information-poor approaches. Our hypothesis is that for memory-based learning (MBL), a reduced amount of data is more beneficial than the full range of features used in the past. Our experiments show that MBL combined with a restricted set of features and a feature selection method that minimizes the feature set leads to competitive results, outperforming all systems that participated in the SENSEVAL-3 competition on the Romanian data. Thus, with this specific method, a tightly controlled feature set improves the accuracy of the classifier, reaching 74.0% in the fine-grained and 78.7% in the coarse-grained evaluation.
Prepositional phrase (PP) attachment is one of the major sources for errors in traditional statistical parsers. The reason for that lies in the type of information necessary for resolving structural ambiguities. For parsing, it is assumed that distributional information of parts-of-speech and phrases is sufficient for disambiguation. For PP attachment, in contrast, lexical information is needed. The problem of PP attachment has sparked much interest ever since Hindle and Rooth (1993) formulated the problem in a way that can be easily handled by machine learning approaches: In their approach, PP attachment is reduced to the decision between noun and verb attachment; and the relevant information is reduced to the two possible attachment sites (the noun and the verb) and the preposition of the PP. Brill and Resnik (1994) extended the feature set to the now standard 4-tupel also containing the noun inside the PP. Among many publications on the problem of PP attachment, Volk (2001; 2002) describes the only system for German. He uses a combination of supervised and unsupervised methods. The supervised method is based on the back-off model by Collins and Brooks (1995), the unsupervised part consists of heuristics such as ”If there is a support verb construction present, choose verb attachment”. Volk trains his back-off model on the Negra treebank (Skut et al., 1998) and extracts frequencies for the heuristics from the ”Computerzeitung”. The latter also serves as test data set. Consequently, it is difficult to compare Volk’s results to other results for German, including the results presented here, since not only he uses a combination of supervised and unsupervised learning, but he also performs domain adaptation. Most of the researchers working on PP attachment seem to be satisfied with a PP attachment system; we have found hardly any work on integrating the results of such approaches into actual parsers. The only exceptions are Mehl et al. (1998) and Foth and Menzel (2006), both working with German data. Mehl et al. report a slight improvement of PP attachment from 475 correct PPs out of 681 PPs for the original parser to 481 PPs. Foth and Menzel report an improvement of overall accuracy from 90.7% to 92.2%. Both integrate statistical attachment preferences into a parser. First, we will investigate whether dependency parsing, which generally uses lexical information, shows the same performance on PP attachment as an independent PP attachment classifier does. Then we will investigate an approach that allows the integration of PP attachment information into the output of a parser without having to modify the parser: The results of an independent PP attachment classifier are integrated into the parse of a dependency parser for German in a postprocessing step.
This paper presents an LTAG analysis of reflexives like himself and reciprocals like each other. These items need to find a c-commanding antecedent from which they retrieve (part of) their own denotation and with which they syntactically agree. The relation between anaphoric item and antecendent must satisfy the following important locality conditions (Chomsky (1981)).
In this paper, we will argue for a novel analysis of the auxiliary alternation in Early English, its development and subsequent loss which has broader consequences for the way that auxiliary selection is looked at cross-linguistically. We will present evidence that the choice of auxiliaries accompanying past participles in Early English differed in several significant respects from that in the familiar modern European languages. Specifically, while the construction with have became a full-fledged perfect by some time in the ME period, that with be was actually a stative resultative, which it remained until it was lost. We will show that this accounts for some otherwise surprising restrictions on the distribution of BE in Early English and allows a better understanding of the spread of HAVE through late ME and EModE. Perhaps more importantly, the Early English facts also provide insight into the genesis of the kind of auxiliary selection found in German, Dutch and Italian. Our analysis of them furthermore suggests a promising strategy for explaining cross-linguistic variation in auxiliary selection in terms of variation in the syntactico-semantic structure of the perfect. In this introductory section, we will first provide some background on the historical situation we will be discussing, then we will lay out the main claims for which we will be arguing in the paper.
In this paper, we introduce an extension of the XMG system (eXtensibleMeta-Grammar) in order to allow for the description of Multi-Component Tree Adjoining Grammars. In particular, we introduce the XMG formalism and its implementation, and show how the latter makes it possible to extend the system relatively easily to different target formalisms, thus opening the way towards multi-formalism.
We adopt Markert and Nissim (2005)’s approach of using the World Wide Web to resolve cases of coreferent bridging for German and discuss the strength and weaknesses of this approach. As the general approach of using surface patterns to get information on ontological relations between lexical items has only been tried on English, it is also interesting to see whether the approach works for German as well as it does for English and what differences between these languages need to be accounted for. We also present a novel approach for combining several patterns that yields an ensemble that outperforms the best-performing single patterns in terms of both precision and recall.
Multicomponent Tree Adjoining Grammars (MCTAG) is a formalism that has been shown to be useful for many natural language applications. The definition of MCTAG however is problematic since it refers to the process of the derivation itself: a simultaneity constraint must be respected concerning the way the members of the elementary tree sets are added. This way of characterizing MCTAG does not allow to abstract away from the concrete order of derivation. In this paper, we propose an alternative definition of MCTAG that characterizes the trees in the tree language of an MCTAG via the properties of the derivation trees (in the underlying TAG) the MCTAG licences. This definition gives a better understanding of the formalism, it allows a more systematic comparison of different types of MCTAG, and, furthermore, it can be exploited for parsing.