OPUS 4 | Linguistik

From surface dependencies towards deeper semantic representations [Semantic representations] (2006)

In the past, a divide could be seen between ’deep’ parsers on the one hand, which construct a semantic representation out of their input, but usually have significant coverage problems, and more robust parsers on the other hand, which are usually based on a (statistical) model derived from a treebank and have larger coverage, but leave the problem of semantic interpretation to the user. More recently, approaches have emerged that combine the robustness of datadriven (statistical) models with more detailed linguistic interpretation such that the output could be used for deeper semantic analysis. Cahill et al. (2002) use a PCFG-based parsing model in combination with a set of principles and heuristics to derive functional (f-)structures of Lexical-Functional Grammar (LFG). They show that the derived functional structures have a better quality than those generated by a parser based on a state-of-the-art hand-crafted LFG grammar. Advocates of Dependency Grammar usually point out that dependencies already are a semantically meaningful representation (cf. Menzel, 2003). However, parsers based on dependency grammar normally create underspecified representations with respect to certain phenomena such as coordination, apposition and control structures. In these areas they are too "shallow" to be directly used for semantic interpretation. In this paper, we adopt a similar approach to Cahill et al. (2002) using a dependency-based analysis to derive functional structure, and demonstrate the feasibility of this approach using German data. A major focus of our discussion is on the treatment of coordination and other potentially underspecified structures of the dependency data input. F-structure is one of the two core levels of syntactic representation in LFG (Bresnan, 2001). Independently of surface order, it encodes abstract syntactic functions that constitute predicate argument structure and other dependency relations such as subject, predicate, adjunct, but also further semantic information such as the semantic type of an adjunct (e.g. directional). Normally f-structure is captured as a recursive attribute value matrix, which is isomorphic to a directed graph representation. Figure 5 depicts an example target f-structure. As mentioned earlier, these deeper-level dependency relations can be used to construct logical forms as in the approaches of van Genabith and Crouch (1996), who construct underspecified discourse representations (UDRSs), and Spreyer and Frank (2005), who have robust minimal recursion semantics (RMRS) as their target representation. We therefore think that f-structures are a suitable target representation for automatic syntactic analysis in a larger pipeline of mapping text to interpretation. In this paper, we report on the conversion from dependency structures to fstructure. Firstly, we evaluate the f-structure conversion in isolation, starting from hand-corrected dependencies based on the TüBa-D/Z treebank and Versley (2005)´s conversion. Secondly, we start from tokenized text to evaluate the combined process of automatic parsing (using Foth and Menzel (2006)´s parser) and f-structure conversion. As a test set, we randomly selected 100 sentences from TüBa-D/Z which we annotated using a scheme very close to that of the TiGer Dependency Bank (Forst et al., 2004). In the next section, we sketch dependency analysis, the underlying theory of our input representations, and introduce four different representations of coordination. We also describe Weighted Constraint Dependency Grammar (WCDG), the dependency parsing formalism that we use in our experiments. Section 3 characterises the conversion of dependencies to f-structures. Our evaluation is presented in section 4, and finally, section 5 summarises our results and gives an overview of problems remaining to be solved.

A constraint-based approach to noun phrase coreference resolution in German newspaper text (2006)

Versley, Yannick

In this paper, we investigate the usefulness of a wide range of features for their usefulness in the resolution of nominal coreference, both as hard constraints (i.e. completely removing elements from the list of possible candidates) as well as soft constraints (where a cumulation of violations of soft constraints will make it less likely that a candidate is chosen as the antecedent). We present a state of the art system based on such constraints and weights estimated with a maximum entropy model, using lexical information to resolve cases of coreferent bridging.

Disagreement dissected : vagueness as a source of ambiguity in nominal (co-)reference (2006)

Versley, Yannick

Using a qualitative analysis of disagreements from a referentially annotated newspaper corpus, we show that, in coreference annotation, vague referents are prone to greater disagreement. We show how potentially problematic cases can be dealt with in a way that is practical even for larger-scale annotation, considering a real-world example from newspaper text.

JACY - a grammar for annotating syntax, semantics and pragmatics of written and spoken japanese for NLP application purposes (2006)

Siegel, Melanie

In this text, we describe the development of a broad coverage grammar for Japanese that has been built for and used in different application contexts. The grammar is based on work done in the Verbmobil project (Siegel 2000) on machine translation of spoken dialogues in the domain of travel planning. The second application for JACY was the automatic email response task. Grammar development was described in Oepen et al. (2002a). Third, it was applied to the task of understanding material on mobile phones available on the internet, while embedded in the project DeepThought (Callmeier et al. 2004, Uszkoreit et al. 2004). Currently, it is being used for treebanking and ontology extraction from dictionary definition sentences by the Japanese company NTT (Bond et al. 2004).

Language and mediality : on the medial status of "everyday language" (2006)

Schneider, Jan Georg

The medium of (oral) language is mostly disregarded (or overlooked) in contemporary media theories. This "ignoring of language" in media studies is often accompanied by an inadequate transport model of communication, and it converges with an "ignoring of mediality" in mentalistic theories of language. In the present article it will be argued that this misleading opposition of language and media can only be overcome if one already regards oral language, not just written language, as a medium of the human mind. In my argumentation I fall back on Wittgenstein’s conception of language games to try to show how Wittgenstein’s ideas can help us to clear up the problem of the mediality of language and also to show to what extent the mentalistic conception of Chomskyan provenance cannot be adequate to the phenomenon of language.

New perspectives on Müller, Meyer, Schmidt : computer-based surname geography and the German Surname Atlas Project (2006)

Nübling, Damaris ; Kunze, Konrad

In order to understand the specific structures and features of the German surnames the most important facts about their emergence and history should be outlined and, at the same time, be compared with the Swedish surnames because there are considerable differences (for further details cf. Nubling 1997 a, b). First of all, surnames in Germany emerged rather early, with the first instances occurring in the 11th century in southern Germany; by the 16th century surnames were common all over Germany. Differences are related to geography (from south to north), social class (from the upper to the lower classes) und urban versus rural areas.

Auf Umwegen zum Passivauxiliar : die Grammatikalisierungspfade von GEBEN, WERDEN, KOMMEN und BLEIBEN im Luxemburgischen, Deutschen und Schwedischen (2006)

Nübling, Damaris

Die synchrone wie diachrone Untersuchung von vier Passivauxiliaren in der deutschen Standardsprache und in deutschen Dialekten, im Schwedischen und im Luxemburgischen liefert deutliche Evidenz dafür, dass Vollverben nicht direkt zu Passivauxiliaren grammatikalisieren, sondern dass dieser Pfad über die Inchoativkopula verläuft. Inchoativkopulas sind soweit grammatikalisiert (und damit reduziert), dass sie über den Weg einer Reanalyse zu Vorgangspassivauxiliaren mutieren können: Erst verbinden sie sich mit (prädikativen) Substantiven, dann mit Adjektiven und schließlich partizipialen Verben. Bereits im Kopulastadium haben sie sich (sofern vorhanden gewesen) ihres Dativ- und Akkusativobjekts entledigt (Intransitivierung). Das Subjekt ist nach seiner Entkoppelung mit dem Agens eine neue Koppelung mit dem Patiens eingegangen. Damit hat die einstige Handlungsperspektive eine Umkehr zur Geschehensperspektive erfahren. Diese Schritte dokumentiert die folgende Figur: .... Als weniger problematisch hat sich, bedingt durch die Ausgangssemantik, der Grammatikalisierungspfad bei nhd. werden, bair.lalem. kommen und schwed. bli erwiesen im Gegensatz zu lux. ginn 'geben', das in jeder Hinsicht die stärksten Reduktionen erfahren hat und einen besonders langen, verschlungenen und "steinigen" Weg absolviert hat. Mit Sicherheit kann geben nicht als Idealkandidat für Passivgrarnmatikalisierungen gelten. Nur so lässt sich erklären, weshalb diese Grarnmatikalisierung in anderen Sprachen der Welt bisher nicht beobachtet wurde.

Zur Entstehung und Struktur ungebändigter Allomorphie : Pluralbildungsverfahren im Luxemburgischen (2006)

Nübling, Damaris

Aus gesamtgermanistischer Perspektive verfügt das Luxemburgische über ein außergewöhnliches Maß an Pluralallomorphie bzw., nach H. GIRNTH (2000), an Heterograffimie. Oberstes Prinzip dabei scheint die deutliche Markierung der Kategorie 'Plural' direkt ani bzw. im Substantiv zu sein. Die morphologische Komplexität betrifft mehrere Dimensionen: Zum einen ist es die Vielzahl an Pluralisierungsprinzipien, die von additiven über modulatorische und Nullprozesse bis hin zu subtraktiven Techniken reichen, zum zweiten die Vielzahl an konkret sich manifestierender Allomorphie. Schließlich ist der maximale . Ausbau des reinen Umlauttyps auch bei Einsilblern hervorzuheben. Selbst Fremdwörter können noch heute ihren Plural mit reinem Vokalwechsel bilden, und dies auch auf nebenbetonten Silben. Aus diachroner Perspektive bildet. der reine Vokalwechsel einen wichtigen Endpunkt einer sich seit Jahrhunderten in diese Richtung vollziehenden Entwicklung. Aus synchroner Perspektive ist es mittlerweile verfehlt, noch - wie etwa beim deutschen Pluralsystem - von Umlaut zu sprechen, da längst eine Arbitrarisierung .des Vokalwechsels stattgefunden hat, die fast ablautähnliche Züge erreicht hat. Zusammenfassend gelangt man zu dem Eindruck, dass sich das Luxemburgische - etwa im Hinblick auf die subtraktive Pluralbildung - fast jedweden phonologischen Wandel zu Nutze macht bzw. - im Hinblick auf den Umlaut über die Morphologisierung sogar produktiv werden lässt. Aus der vorliegenden Untersuchung ergeben sich mehrere Fragestellungen, die Gegenstand weiterer Untersuchungen sein sollten. Zuerst wären genaue quantitative Erhebungen vorzunehmen, um die Nutzung und Verteilung der einzelnen Verfahren zu ermitteln. Auch die Produktivität der Regeln müsste untersucht werden. Des Weiteren ist noch ungeklärt, welche Regeln es genau sind, die die Distribution der Allomorphe steuern. Nimmt man z.B. das Englische mit seinen drei Pluralallomorphen [IZ], [z] und [s], so ist deren Verteilung rein phonologisch - nach dem Auslaut des Substantivs - gesteuert: Endet es auf einen Sibilanten, folgt silbisches [IZ] (horse-s ['horsIz]), endet es auf einen stimmhaften Laut, folgt stimmhaftes [z] (dog-s), und auf einen stimmlosen folgt stimmloses [s] (cat-s). Das Deutsche, das insgesamt neun konkrete Pluralallomorphe "besitzt, erlaubt auf grund der Singularform kaum Erschließbarkeit des Plurals, wie die folgenden drei einsilbigen Reimwörter gleichen Genus demonstrieren: der Hund - die Hunde, der Grund - die Gründe, der Mund - die Münder. Prosodische Kriterien wie die AkzentsteIle, syllabische (Silbenzahl), phonologische (Auslaut) und morphologische Kriterien " einschließlich der Genuszugehörigkeit fuhren nicht immer zum Ziel: Bei vielen Substantiven muss der Plural - siehe oben - mitgelernt werden, d.h. er ist Bestandteil des Lexikons. Was das Luxemburgische betrifft, so scheint das Steuerungsinstrumentarium komplexer zu sein, doch ist dies nur eine durch Stichproben gewonnene Vermutung, die zu fundieren wäre.

Annotation compatibility working group report (2006)

This report explores the question of compatibility between annotation projects including translating annotation formalisms to each other or to common forms. Compatibility issues are crucial for systems that use the results of multiple annotation projects. We hope that this report will begin a concerted effort in the field to track the compatibility of annotation schemes for part of speech tagging, time annotation, treebanking, role labeling and other phenomena.

Auxiliary selection and counterfactuality in the history of English and Germanic (2006)

McFadden, Thomas ; Alexiadou, Artemis

The retreat of BE as perfect auxiliary in the history of English is examined. Corpus data are presented showing that the initial advance of HAVE was most closely connected to a restriction against BE in past counterfactuals. Other factors which have been reported to favor the spread of HAVE are either dependent on the counterfactual effect, or significantly weaker in comparison. It is argued that the effect can be traced to the semantics of the BE perfect, which denoted resultativity rather than anteriority proper. Related data from other older Germanic and Romance languages are presented, and finally implications for existing theories of auxiliary selection stemming from the findings presented are discussed.

Open Access

Linguistik

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Institute

23 search hits