Refine
Year of publication
Document Type
- Preprint (82) (remove)
Has Fulltext
- yes (82)
Is part of the Bibliography
- no (82)
Keywords
- Deutsch (16)
- Multicomponent Tree Adjoining Grammar (9)
- Syntaktische Analyse (8)
- Syntax (8)
- Semantik (6)
- Kongress (5)
- Optimalitätstheorie (5)
- Range Concatenation Grammar (5)
- Aufsatzsammlung (4)
- German (4)
Institute
- Extern (82) (remove)
Die Bedeutung des philosophischen Programms John McDowells, das schon in der theoretischen Philosophie eine revolutionäre Neuausrichtung vornimmt, kann erst voll erkannt werden, wenn man auch seine Konsequenzen für die praktische Philosophie in den Blick nimmt. Zwar geht Geist und Welt primär von Dilemmata der Erkenntnistheorie aus. Aus McDowells Vorschlag, die Gleichsetzung der äußeren Natur mit dem bedeutungsfreien Raum der Naturgesetze zugunsten einer Konzeption von Gründen in der Welt aufzugeben, ergibt sich aber die Möglichkeit einer so neuartigen Perspektive auf die Natur moralischer Urteile, dass es fast so scheint, als sei McDowells theoretisches Programm auf diesen Gewinn für die praktische Philosophie hin angelegt worden.
Impurismus ist eine uralte Weltanschauung und eine alte Poetik. Beides habe ich in meinem Buch von 2007 Illustrierte Poetik des Impurismus ausführlich dargestellt. Da ich mich nicht wiederholen will, kann ich die umfangreichen Funde zum Thema hier nicht erneut vortragen. Andererseits soll der Leser dieser Fortsetzung nicht ganz unvorbereitet in die Materie einsteigen. Deshalb will ich einige nackte Fakten als Erinnerung hier zusammenstellen, muß aber doch dringend auf die anschaulichen Grundlagen in dem genannten Buch verweisen, sonst verschreckt die in aller Kürze vorgetragene Ungeheuerlichkeit der ganzen Entdeckung manchen willigen Leser. ...
Wir Philologen haben gut reden. Wir sehen zu, wie andere, die zumeist nicht zu unserer Zunft gehören, die unübersehbare Fülle von Geschriebenem aus seiner jeweiligen Ursprache in alle möglichen Sprachen bringen, und wir verhalten uns dazu als interessierte Zuschauer. Wir haben allen Grund, uns daran zu freuen: Ohne diesen grenzüberschreitenden Waren- und Gedankentausch bliebe das Feld, auf dem wir grasen, enger und parzellierter, als es nach der Intention der Autoren und auch der Sache nach sein müsste. Wir können (sofern wir den nötigen Überblick haben) das loben, was die Übersetzer zu Wege gebracht haben: die Entsprechungen, die sie entdeckt oder erfunden haben, die Kraft, Geschmeidigkeit und Modulationsvielfalt, die sie in ihren Zielsprachen mit Tausenden von einleuchtenden Funden oder mit dem ganzen Ton und Duktus ihrer Übersetzungen erst aktiviert haben. Wenn wir es uns zutrauen, können wir ihnen ins Handwerk pfuschen und einzelne Stellen oder ganze Werke selber übersetzen. Wir können sie kritisieren, wo uns die vorgelegten Übersetzungen zu matt erscheinen oder wo sie sachlich oder stilistisch mehr als nötig ‚hinter dem Original zurückbleiben; wir können Verbesserungsvorschläge machen. Wenn wir Übersetzungen zitieren und es nötig finden, sie abzuwandeln, bewegen wir uns in einer Grauzone zwischen dem Respekt vor dem Übersetzer, der Lust an noch weiteren erkannten Potenzen des Textes und dem Drang, möglichst ‚alles, was wir aus dem Original herausgelesen haben, in der eigenen Sprache den Hörern oder Lesern nahezubringen.
Römische Bildnisse : Bibliographie, ungekürzt, mit den zu ergänzenden Literaturverweisen des Autors
(2010)
Originalfassung der in der Verlagspublikation um zahlreiche Literaturverweise gekürzten Bibliographie des Werkes: Götz Lahusen: Römische Bildnisse : Auftraggeber, Funktionen, Standorte. - Mainz : von Zabern, 2010. - Lizenz der WBG (Wiss. Buchges.) Darmstadt. - ISBN: 978-3-8053-3738-0. Pp. : EUR 49.90
In seinen Sammlungen bildet das Deutsche Literaturarchiv Marbach (DLA) das Netzwerk des literarischen Lebens in all seinen Facetten ab. Im Zentrum des quellenorientierten Sammelns und der Erschließung steht der Autor (bzw. die Autorin). Die Literatur wird dokumentiert vom Entstehungsprozess eines Werkes über die verschiedenen Ausgaben und dessen Rezeption in der Literaturkritik, seine dramaturgische Umsetzung in Hörfunk, Film, auf der Bühne und in der Musik. Seit 2008 bezieht das DLA auch Internetquellen wie literarische Zeitschriften, Netzliteratur und Weblogs in sein Spektrum mit ein und reagiert damit auf die zunehmende Bedeutung des Internets als Publikationsforum. Sammeln, Erschließen und Archivieren bilden eine notwendige Einheit; gerade die Flüchtigkeit der netzbasierten Ressourcen macht eine langfristige Sicherung der Verfügbarkeit erforderlich. Notwendig sind daher mehrere Säulen, auf denen diese neue Sammlung von „Literatur im Netz“ basiert.
Nous présentons ici différents algorithmes d’analyse pour grammaires à concaténation d’intervalles (Range Concatenation Grammar, RCG), dont un nouvel algorithme de type Earley, dans le paradigme de l’analyse déductive. Notre travail est motivé par l’intérêt porté récemment à ce type de grammaire, et comble un manque dans la littérature existante.
Multicomponent Tree Adjoining Grammars (MCTAGs) are a formalism that has been shown to be useful for many natural language applications. The definition of non-local MCTAG however is problematic since it refers to the process of the derivation itself: a simultaneity constraint must be respected concerning the way the members of the elementary tree sets are added. Looking only at the result of a derivation (i.e., the derived tree and the derivation tree), this simultaneity is no longer visible and therefore cannot be checked. I.e., this way of characterizing MCTAG does not allow to abstract away from the concrete order of derivation. In this paper, we propose an alternative definition of MCTAG that characterizes the trees in the tree language of an MCTAG via the properties of the derivation trees (in the underlying TAG) the MCTAG licences. We provide similar characterizations for various types of MCTAG. These characterizations give a better understanding of the formalisms, they allow a more systematic comparison of different types of MCTAG, and, furthermore, they can be exploited for parsing.
This paper investigates the class of Tree-Tuple MCTAG with Shared Nodes, TT-MCTAG for short, an extension of Tree Adjoining Grammars that has been proposed for natural language processing, in particular for dealing with discontinuities and word order variation in languages such as German. It has been shown that the universal recognition problem for this formalism is NP-hard, but so far it was not known whether the class of languages generated by TT-MCTAG is included in PTIME. We provide a positive answer to this question, using a new characterization of TT-MCTAG.
We present a CYK and an Earley-style algorithm for parsing Range Concatenation Grammar (RCG), using the deductive parsing framework. The characteristic property of the Earley parser is that we use a technique of range boundary constraint propagation to compute the yields of non-terminals as late as possible. Experiments show that, compared to previous approaches, the constraint propagation helps to considerably decrease the number of items in the chart.
In this paper, we argue that difficulties in the definition of coreference itself contribute to lower inter-annotator agreement in certain cases. Data from a large referentially annotated corpus serves to corroborate this point, using a quantitative investigation to assess which effects or problems are likely to be the most prominent. Several examples where such problems occur are discussed in more detail, and we then propose a generalisation of Poesio, Reyle and Stevenson’s Justified Sloppiness Hypothesis to provide a unified model for these cases of disagreement and argue that a deeper understanding of the phenomena involved allows to tackle problematic cases in a more principled fashion than would be possible using only pre-theoretic intuitions.
Traditionally, parsers are evaluated against gold standard test data. This can cause problems if there is a mismatch between the data structures and representations used by the parser and the gold standard. A particular case in point is German, for which two treebanks (TiGer and TüBa-D/Z) are available with highly different annotation schemes for the acquisition of (e.g.) PCFG parsers. The differences between the TiGer and TüBa-D/Z annotation schemes make fair and unbiased parser evaluation difficult [7, 9, 12]. The resource (TEPACOC) presented in this paper takes a different approach to parser evaluation: instead of providing evaluation data in a single annotation scheme, TEPACOC uses comparable sentences and their annotations for 5 selected key grammatical phenomena (with 20 sentences each per phenomena) from both TiGer and TüBa-D/Z resources. This provides a 2 times 100 sentence comparable testsuite which allows us to evaluate TiGer-trained parsers against the TiGer part of TEPACOC, and TüBa-D/Z-trained parsers against the TüBa-D/Z part of TEPACOC for key phenomena, instead of comparing them against a single (and potentially biased) gold standard. To overcome the problem of inconsistency in human evaluation and to bridge the gap between the two different annotation schemes, we provide an extensive error classification, which enables us to compare parser output across the two different treebanks. In the remaining part of the paper we present the testsuite and describe the grammatical phenomena covered in the data. We discuss the different annotation strategies used in the two treebanks to encode these phenomena and present our error classification of potential parser errors.
Parsing coordinations
(2009)
The present paper is concerned with statistical parsing of constituent structures in German. The paper presents four experiments that aim at improving parsing performance of coordinate structure: 1) reranking the n-best parses of a PCFG parser, 2) enriching the input to a PCFG parser by gold scopes for any conjunct, 3) reranking the parser output for all possible scopes for conjuncts that are permissible with regard to clause structure. Experiment 4 reranks a combination of parses from experiments 1 and 3. The experiments presented show that n- best parsing combined with reranking improves results by a large margin. Providing the parser with different scope possibilities and reranking the resulting parses results in an increase in F-score from 69.76 for the baseline to 74.69. While the F-score is similar to the one of the first experiment (n-best parsing and reranking), the first experiment results in higher recall (75.48% vs. 73.69%) and the third one in higher precision (75.43% vs. 73.26%). Combining the two methods results in the best result with an F-score of 76.69.
Distributional approximations to lexical semantics are very useful not only in helping the creation of lexical semantic resources (Kilgariff et al., 2004; Snow et al., 2006), but also when directly applied in tasks that can benefit from large-coverage semantic knowledge such as coreference resolution (Poesio et al., 1998; Gasperin and Vieira, 2004; Versley, 2007), word sense disambiguation (Mc- Carthy et al., 2004) or semantical role labeling (Gordon and Swanson, 2007). We present a model that is built from Webbased corpora using both shallow patterns for grammatical and semantic relations and a window-based approach, using singular value decomposition to decorrelate the feature space which is otherwise too heavily influenced by the skewed topic distribution of Web corpora.
This paper investigates the relation between TT-MCTAG, a formalism used in computational linguistics, and RCG. RCGs are known to describe exactly the class PTIME; simple RCG even have been shown to be equivalent to linear context-free rewriting systems, i.e., to be mildly context-sensitive. TT-MCTAG has been proposed to model free word order languages. In general, it is NP-complete. In this paper, we will put an additional limitation on the derivations licensed in TT-MCTAG. We show that TT-MCTAG with this additional limitation can be transformed into equivalent simple RCGs. This result is interesting for theoretical reasons (since it shows that TT-MCTAG in this limited form is mildly context-sensitive) and, furthermore, even for practical reasons: We use the proposed transformation from TT-MCTAG to RCG in an actual parser that we have implemented.
The ACL 2008 Workshop on Parsing German features a shared task on parsing German. The goal of the shared task was to find reasons for the radically different behavior of parsers on the different treebanks and between constituent and dependency representations. In this paper, we describe the task and the data sets. In addition, we provide an overview of the test results and a first analysis.
The problem of vocalization, or diacritization, is essential to many tasks in Arabic NLP. Arabic is generally written without the short vowels, which leads to one written form having several pronunciations with each pronunciation carrying its own meaning(s). In the experiments reported here, we define vocalization as a classification problem in which we decide for each character in the unvocalized word whether it is followed by a short vowel. We investigate the importance of different types of context. Our results show that the combination of using memory-based learning with only a word internal context leads to a word error rate of 6.64%. If a lexical context is added, the results deteriorate slowly.
How to compare treebanks
(2008)
Recent years have seen an increasing interest in developing standards for linguistic annotation, with a focus on the interoperability of the resources. This effort, however, requires a profound knowledge of the advantages and disadvantages of linguistic annotation schemes in order to avoid importing the flaws and weaknesses of existing encoding schemes into the new standards. This paper addresses the question how to compare syntactically annotated corpora and gain insights into the usefulness of specific design decisions. We present an exhaustive evaluation of two German treebanks with crucially different encoding schemes. We evaluate three different parsers trained on the two treebanks and compare results using EVALB, the Leaf-Ancestor metric, and a dependency-based evaluation. Furthermore, we present TePaCoC, a new testsuite for the evaluation of parsers on complex German grammatical constructions. The testsuite provides a well thought-out error classification, which enables us to compare parser output for parsers trained on treebanks with different encoding schemes and provides interesting insights into the impact of treebank annotation schemes on specific constructions like PP attachment or non-constituent coordination.
Part-of-Speech tagging is generally performed by Markov models, based on bigram or trigram models. While Markov models have a strong concentration on the left context of a word, many languages require the inclusion of right context for correct disambiguation. We show for German that the best results are reached by a combination of left and right context. If only left context is available, then changing the direction of analysis and going from right to left improves the results. In a version of MBT (Daelemans et al., 1996) with default parameter settings, the inclusion of the right context improved POS tagging accuracy from 94.00% to 96.08%, thus corroborating our hypothesis. The version with optimized parameters reaches 96.73%.
This article presents linguistic features of and educational approaches to a new variety of German that has emerged in multi-ethnic urban areas in Germany: Kiezdeutsch (‘Hood German’). From a linguistic point of view, Kiezdeutsch is very interesting, as it is a multi-ethnolect that combines features of a youth language with those of a contact language. We will present examples that illustrate the grammatical productivity and innovative potential of this variety. From an educational perspective, Kiezdeutsch has also a high potential in many respects: school projects can help enrich intercultural communication and weaken derogatory attitudes. In grammar lessons, Kiezdeutsch can be a means to enhance linguistic competence by having the adolescents analyse their own language. Keywords: German, Kiezdeutsch, multi-ethnolect, migrants’ language, language change, educational proposals
Cet article étudie la relation entre les grammaires darbres adjoints à composantes multiples avec tuples darbres (TT-MCTAG), un formalisme utilisé en linguistique informatique, et les grammaires à concaténation dintervalles (RCG). Les RCGs sont connues pour décrire exactement la classe PTIME, il a en outre été démontré que les RCGs « simples » sont même équivalentes aux systèmes de réécriture hors-contextes linéaires (LCFRS), en dautres termes, elles sont légèrement sensibles au contexte. TT-MCTAG a été proposé pour modéliser les langages à ordre des mots libre. En général ces langages sont NP-complets. Dans cet article, nous définissons une contrainte additionnelle sur les dérivations autorisées par le formalisme TT-MCTAG. Nous montrons ensuite comment cette forme restreinte de TT-MCTAG peut être convertie en une RCG simple équivalente. Le résultat est intéressant pour des raisons théoriques (puisqu’il montre que la forme restreinte de TT-MCTAG est légèrement sensible au contexte), mais également pour des raisons pratiques (la transformation proposée ici a été utilisée pour implanter un analyseur pour TT-MCTAG).