OPUS 4 | Linguistik

From surface dependencies towards deeper semantic representations [Semantic representations] (2006)

In the past, a divide could be seen between ’deep’ parsers on the one hand, which construct a semantic representation out of their input, but usually have significant coverage problems, and more robust parsers on the other hand, which are usually based on a (statistical) model derived from a treebank and have larger coverage, but leave the problem of semantic interpretation to the user. More recently, approaches have emerged that combine the robustness of datadriven (statistical) models with more detailed linguistic interpretation such that the output could be used for deeper semantic analysis. Cahill et al. (2002) use a PCFG-based parsing model in combination with a set of principles and heuristics to derive functional (f-)structures of Lexical-Functional Grammar (LFG). They show that the derived functional structures have a better quality than those generated by a parser based on a state-of-the-art hand-crafted LFG grammar. Advocates of Dependency Grammar usually point out that dependencies already are a semantically meaningful representation (cf. Menzel, 2003). However, parsers based on dependency grammar normally create underspecified representations with respect to certain phenomena such as coordination, apposition and control structures. In these areas they are too "shallow" to be directly used for semantic interpretation. In this paper, we adopt a similar approach to Cahill et al. (2002) using a dependency-based analysis to derive functional structure, and demonstrate the feasibility of this approach using German data. A major focus of our discussion is on the treatment of coordination and other potentially underspecified structures of the dependency data input. F-structure is one of the two core levels of syntactic representation in LFG (Bresnan, 2001). Independently of surface order, it encodes abstract syntactic functions that constitute predicate argument structure and other dependency relations such as subject, predicate, adjunct, but also further semantic information such as the semantic type of an adjunct (e.g. directional). Normally f-structure is captured as a recursive attribute value matrix, which is isomorphic to a directed graph representation. Figure 5 depicts an example target f-structure. As mentioned earlier, these deeper-level dependency relations can be used to construct logical forms as in the approaches of van Genabith and Crouch (1996), who construct underspecified discourse representations (UDRSs), and Spreyer and Frank (2005), who have robust minimal recursion semantics (RMRS) as their target representation. We therefore think that f-structures are a suitable target representation for automatic syntactic analysis in a larger pipeline of mapping text to interpretation. In this paper, we report on the conversion from dependency structures to fstructure. Firstly, we evaluate the f-structure conversion in isolation, starting from hand-corrected dependencies based on the TüBa-D/Z treebank and Versley (2005)´s conversion. Secondly, we start from tokenized text to evaluate the combined process of automatic parsing (using Foth and Menzel (2006)´s parser) and f-structure conversion. As a test set, we randomly selected 100 sentences from TüBa-D/Z which we annotated using a scheme very close to that of the TiGer Dependency Bank (Forst et al., 2004). In the next section, we sketch dependency analysis, the underlying theory of our input representations, and introduce four different representations of coordination. We also describe Weighted Constraint Dependency Grammar (WCDG), the dependency parsing formalism that we use in our experiments. Section 3 characterises the conversion of dependencies to f-structures. Our evaluation is presented in section 4, and finally, section 5 summarises our results and gives an overview of problems remaining to be solved.

A constraint-based approach to noun phrase coreference resolution in German newspaper text (2006)

Versley, Yannick

In this paper, we investigate the usefulness of a wide range of features for their usefulness in the resolution of nominal coreference, both as hard constraints (i.e. completely removing elements from the list of possible candidates) as well as soft constraints (where a cumulation of violations of soft constraints will make it less likely that a candidate is chosen as the antecedent). We present a state of the art system based on such constraints and weights estimated with a maximum entropy model, using lexical information to resolve cases of coreferent bridging.

Disagreement dissected : vagueness as a source of ambiguity in nominal (co-)reference (2006)

Versley, Yannick

Using a qualitative analysis of disagreements from a referentially annotated newspaper corpus, we show that, in coreference annotation, vague referents are prone to greater disagreement. We show how potentially problematic cases can be dealt with in a way that is practical even for larger-scale annotation, considering a real-world example from newspaper text.

JACY - a grammar for annotating syntax, semantics and pragmatics of written and spoken japanese for NLP application purposes (2006)

Siegel, Melanie

In this text, we describe the development of a broad coverage grammar for Japanese that has been built for and used in different application contexts. The grammar is based on work done in the Verbmobil project (Siegel 2000) on machine translation of spoken dialogues in the domain of travel planning. The second application for JACY was the automatic email response task. Grammar development was described in Oepen et al. (2002a). Third, it was applied to the task of understanding material on mobile phones available on the internet, while embedded in the project DeepThought (Callmeier et al. 2004, Uszkoreit et al. 2004). Currently, it is being used for treebanking and ontology extraction from dictionary definition sentences by the Japanese company NTT (Bond et al. 2004).

Language and mediality : on the medial status of "everyday language" (2006)

Schneider, Jan Georg

The medium of (oral) language is mostly disregarded (or overlooked) in contemporary media theories. This "ignoring of language" in media studies is often accompanied by an inadequate transport model of communication, and it converges with an "ignoring of mediality" in mentalistic theories of language. In the present article it will be argued that this misleading opposition of language and media can only be overcome if one already regards oral language, not just written language, as a medium of the human mind. In my argumentation I fall back on Wittgenstein’s conception of language games to try to show how Wittgenstein’s ideas can help us to clear up the problem of the mediality of language and also to show to what extent the mentalistic conception of Chomskyan provenance cannot be adequate to the phenomenon of language.

New perspectives on Müller, Meyer, Schmidt : computer-based surname geography and the German Surname Atlas Project (2006)

Nübling, Damaris ; Kunze, Konrad

In order to understand the specific structures and features of the German surnames the most important facts about their emergence and history should be outlined and, at the same time, be compared with the Swedish surnames because there are considerable differences (for further details cf. Nubling 1997 a, b). First of all, surnames in Germany emerged rather early, with the first instances occurring in the 11th century in southern Germany; by the 16th century surnames were common all over Germany. Differences are related to geography (from south to north), social class (from the upper to the lower classes) und urban versus rural areas.

Auf Umwegen zum Passivauxiliar : die Grammatikalisierungspfade von GEBEN, WERDEN, KOMMEN und BLEIBEN im Luxemburgischen, Deutschen und Schwedischen (2006)

Nübling, Damaris

Die synchrone wie diachrone Untersuchung von vier Passivauxiliaren in der deutschen Standardsprache und in deutschen Dialekten, im Schwedischen und im Luxemburgischen liefert deutliche Evidenz dafür, dass Vollverben nicht direkt zu Passivauxiliaren grammatikalisieren, sondern dass dieser Pfad über die Inchoativkopula verläuft. Inchoativkopulas sind soweit grammatikalisiert (und damit reduziert), dass sie über den Weg einer Reanalyse zu Vorgangspassivauxiliaren mutieren können: Erst verbinden sie sich mit (prädikativen) Substantiven, dann mit Adjektiven und schließlich partizipialen Verben. Bereits im Kopulastadium haben sie sich (sofern vorhanden gewesen) ihres Dativ- und Akkusativobjekts entledigt (Intransitivierung). Das Subjekt ist nach seiner Entkoppelung mit dem Agens eine neue Koppelung mit dem Patiens eingegangen. Damit hat die einstige Handlungsperspektive eine Umkehr zur Geschehensperspektive erfahren. Diese Schritte dokumentiert die folgende Figur: .... Als weniger problematisch hat sich, bedingt durch die Ausgangssemantik, der Grammatikalisierungspfad bei nhd. werden, bair.lalem. kommen und schwed. bli erwiesen im Gegensatz zu lux. ginn 'geben', das in jeder Hinsicht die stärksten Reduktionen erfahren hat und einen besonders langen, verschlungenen und "steinigen" Weg absolviert hat. Mit Sicherheit kann geben nicht als Idealkandidat für Passivgrarnmatikalisierungen gelten. Nur so lässt sich erklären, weshalb diese Grarnmatikalisierung in anderen Sprachen der Welt bisher nicht beobachtet wurde.

Zur Entstehung und Struktur ungebändigter Allomorphie : Pluralbildungsverfahren im Luxemburgischen (2006)

Nübling, Damaris

Aus gesamtgermanistischer Perspektive verfügt das Luxemburgische über ein außergewöhnliches Maß an Pluralallomorphie bzw., nach H. GIRNTH (2000), an Heterograffimie. Oberstes Prinzip dabei scheint die deutliche Markierung der Kategorie 'Plural' direkt ani bzw. im Substantiv zu sein. Die morphologische Komplexität betrifft mehrere Dimensionen: Zum einen ist es die Vielzahl an Pluralisierungsprinzipien, die von additiven über modulatorische und Nullprozesse bis hin zu subtraktiven Techniken reichen, zum zweiten die Vielzahl an konkret sich manifestierender Allomorphie. Schließlich ist der maximale . Ausbau des reinen Umlauttyps auch bei Einsilblern hervorzuheben. Selbst Fremdwörter können noch heute ihren Plural mit reinem Vokalwechsel bilden, und dies auch auf nebenbetonten Silben. Aus diachroner Perspektive bildet. der reine Vokalwechsel einen wichtigen Endpunkt einer sich seit Jahrhunderten in diese Richtung vollziehenden Entwicklung. Aus synchroner Perspektive ist es mittlerweile verfehlt, noch - wie etwa beim deutschen Pluralsystem - von Umlaut zu sprechen, da längst eine Arbitrarisierung .des Vokalwechsels stattgefunden hat, die fast ablautähnliche Züge erreicht hat. Zusammenfassend gelangt man zu dem Eindruck, dass sich das Luxemburgische - etwa im Hinblick auf die subtraktive Pluralbildung - fast jedweden phonologischen Wandel zu Nutze macht bzw. - im Hinblick auf den Umlaut über die Morphologisierung sogar produktiv werden lässt. Aus der vorliegenden Untersuchung ergeben sich mehrere Fragestellungen, die Gegenstand weiterer Untersuchungen sein sollten. Zuerst wären genaue quantitative Erhebungen vorzunehmen, um die Nutzung und Verteilung der einzelnen Verfahren zu ermitteln. Auch die Produktivität der Regeln müsste untersucht werden. Des Weiteren ist noch ungeklärt, welche Regeln es genau sind, die die Distribution der Allomorphe steuern. Nimmt man z.B. das Englische mit seinen drei Pluralallomorphen [IZ], [z] und [s], so ist deren Verteilung rein phonologisch - nach dem Auslaut des Substantivs - gesteuert: Endet es auf einen Sibilanten, folgt silbisches [IZ] (horse-s ['horsIz]), endet es auf einen stimmhaften Laut, folgt stimmhaftes [z] (dog-s), und auf einen stimmlosen folgt stimmloses [s] (cat-s). Das Deutsche, das insgesamt neun konkrete Pluralallomorphe "besitzt, erlaubt auf grund der Singularform kaum Erschließbarkeit des Plurals, wie die folgenden drei einsilbigen Reimwörter gleichen Genus demonstrieren: der Hund - die Hunde, der Grund - die Gründe, der Mund - die Münder. Prosodische Kriterien wie die AkzentsteIle, syllabische (Silbenzahl), phonologische (Auslaut) und morphologische Kriterien " einschließlich der Genuszugehörigkeit fuhren nicht immer zum Ziel: Bei vielen Substantiven muss der Plural - siehe oben - mitgelernt werden, d.h. er ist Bestandteil des Lexikons. Was das Luxemburgische betrifft, so scheint das Steuerungsinstrumentarium komplexer zu sein, doch ist dies nur eine durch Stichproben gewonnene Vermutung, die zu fundieren wäre.

Annotation compatibility working group report (2006)

This report explores the question of compatibility between annotation projects including translating annotation formalisms to each other or to common forms. Compatibility issues are crucial for systems that use the results of multiple annotation projects. We hope that this report will begin a concerted effort in the field to track the compatibility of annotation schemes for part of speech tagging, time annotation, treebanking, role labeling and other phenomena.

Auxiliary selection and counterfactuality in the history of English and Germanic (2006)

McFadden, Thomas ; Alexiadou, Artemis

The retreat of BE as perfect auxiliary in the history of English is examined. Corpus data are presented showing that the initial advance of HAVE was most closely connected to a restriction against BE in past counterfactuals. Other factors which have been reported to favor the spread of HAVE are either dependent on the counterfactual effect, or significantly weaker in comparison. It is argued that the effect can be traced to the semantics of the BE perfect, which denoted resultativity rather than anteriority proper. Related data from other older Germanic and Romance languages are presented, and finally implications for existing theories of auxiliary selection stemming from the findings presented are discussed.

Why is German dependency parsing more reliable than constituent parsing? (2006)

Kübler, Sandra ; Prokic, Jelena

In recent years, research in parsing has extended in several new directions. One of these directions is concerned with parsing languages other than English. Treebanks have become available for many European languages, but also for Arabic, Chinese, or Japanese. However, it was shown that parsing results on these treebanks depend on the types of treebank annotations used. Another direction in parsing research is the development of dependency parsers. Dependency parsing profits from the non-hierarchical nature of dependency relations, thus lexical information can be included in the parsing process in a much more natural way. Especially machine learning based approaches are very successful (cf. e.g.). The results achieved by these dependency parsers are very competitive although comparisons are difficult because of the differences in annotation. For English, the Penn Treebank has been converted to dependencies. For this version, Nivre et al. report an accuracy rate of 86.3%, as compared to an F-score of 92.1 for Charniaks parser. The Penn Chinese Treebank is also available in a constituent and a dependency representations. The best results reported for parsing experiments with this treebank give an F-score of 81.8 for the constituent version and 79.8% accuracy for the dependency version. The general trend in comparisons between constituent and dependency parsers is that the dependency parser performs slightly worse than the constituent parser. The only exception occurs for German, where F-scores for constituent plus grammatical function parses range between 51.4 and 75.3, depending on the treebank, NEGRA or TüBa-D/Z. The dependency parser based on a converted version of Tüba-D/Z, in contrast, reached an accuracy of 83.4%, i.e. 12 percent points better than the best constituent analysis including grammatical functions.

Is it really that difficult to parse German? (2006)

Kübler, Sandra ; Hinrichs, Erhard ; Maier, Wolfgang

This paper presents a comparative study of probabilistic treebank parsing of German, using the Negra and TüBa-D/Z treebanks. Experiments with the Stanford parser, which uses a factored PCFG and dependency model, show that, contrary to previous claims for other parsers, lexicalization of PCFG models boosts parsing performance for both treebanks. The experiments also show that there is a big difference in parsing performance, when trained on the Negra and on the TüBa-D/Z treebanks. Parser performance for the models trained on TüBa-D/Z are comparable to parsing results for English with the Stanford parser, when trained on the Penn treebank. This comparison at least suggests that German is not harder to parse than its West-Germanic neighbor language English.

Towards case-based parsing : are chunks reliable indicators for syntax trees? (2006)

Kübler, Sandra

This paper presents an approach to the question whether it is possible to construct a parser based on ideas from case-based reasoning. Such a parser would employ a partial analysis of the input sentence to select a (nearly) complete syntax tree and then adapt this tree to the input sentence. The experiments performed on German data from the Tüba-D/Z treebank and the KaRoPars partial parser show that a wide range of levels of generality can be reached, depending on which types of information are used to determine the similarity between input sentence and training sentences. The results are such that it is possible to construct a case-based parser. The optimal setting out of those presented here need to be determined empirically.

Quantifier scope in German : an MCTAG analysis (2006)

Kallmeyer, Laura ; Romero, Maribel

Relative quantifier scope in German depends, in contrast to English, very much on word order. The scope possibilities of a quantifier are determined by its surface position, its base position and the type of the quantifier. In this paper we propose a multicomponent analysis for German quantifiers computing the scope of the quantifier, in particular its minimal nuclear scope, depending on the syntactic configuration it occurs in.

Constraint-based computational semantics : a comparison between LTAG and LRS (2006)

Kallmeyer, Laura ; Richter, Frank

This paper compares two approaches to computational semantics, namely semantic unification in Lexicalized Tree Adjoining Grammars (LTAG) and Lexical Resource Semantics (LRS) in HPSG. There are striking similarities between the frameworks that make them comparable in many respects. We will exemplify the differences and similarities by looking at several phenomena. We will show, first of all, that many intuitions about the mechanisms of semantic computations can be implemented in similar ways in both frameworks. Secondly, we will identify some aspects in which the frameworks intrinsically differ due to more general differences between the approaches to formal grammar adopted by LTAG and HPSG.

Comparing lexicalized grammar formalisms in an empirically adequate way : the notion of generative attachment capacity (2006)

Kallmeyer, Laura

The work presented here addresses the question of how to determine whether a grammar formalism is powerful enough to describe natural languages. The expressive power of a formalism can be characterized in terms of i) the string languages it generates (weak generative capacity (WGC)) or ii) the tree languages it generates (strong generative capacity (SGC)). The notion of WGC is not enough to determine whether a formalism is adequate for natural languages. We argue that even SGC is problematic since the sets of trees a grammar formalism for natural languages should be able to generate is difficult to determine. The concrete syntactic structures assumed for natural languages depend very much on theoretical stipulations and empirical evidence for syntactic structures is rather hard to obtain. Therefore, for lexicalized formalisms, we propose to consider the ability to generate certain strings together with specific predicate argument dependencies as a criterion for adequacy for natural languages.

What linguists always wanted to know about german and did not know how to estimate (2006)

Hinrichs, Erhard ; Kübler, Sandra

This paper profiles significant differences in syntactic distribution and differences in word class frequencies for two treebanks of spoken and written German: the TüBa-D/S, a treebank of transliterated spontaneous dialogues, and the TüBa-D/Z treebank of newspaper articles published in the German daily newspaper die tageszeitung´(taz). The approach can be used more generally as a means of distinguishing and classifying language corpora of different genres.

"Downgeloaded" und "geforwardet" - Sprechverhalten in morphologischen Zweifelsfällen am Beispiel des Sprachgebrauch im Internet (2006)

Hausmann, Dagmar

Eine unübersehbare Menge neuer Anglizismen findet über Fach- und Gruppensprachen Eingang in die deutsche Alltagssprache, in der ein Teil von ihnen inzwischen seinen festen Platz hat. […] Insbesondere in den Bereichen der Lautung und der Schreibung bleibt bei den neueren Entlehnungen oberflächlich eine große Nähe zu gebersprachlichen Strukturen erhalten. Diese Entwicklung wird von einigen Fachleuten und Politikern […] als Indiz für eine schleichende ‚Kolonialisierung’ der deutschen Sprache durch das Englische herangezogen. [...] Dieser Einschätzung widersprechen zahlreiche Organe […] und Autoren […] ausdrücklich. […] Im Kontext dieser Auseinandersetzung ist die vorliegende Arbeit verortet. Ihr Ziel ist es zu zeigen, daß die Sprecher des Deutschen Anglizismen sehr wohl phonologisch, graphematisch und morphologisch in die deutsche Sprache integrieren. Untersuchungsgegenstand sind mehrgliedrige Verben, die aus dem Englischen entlehnt wurden und überwiegend in Fach- und Gruppensprachen und/oder in informellem, vorwiegend mündlichem Text auftreten. Für das Problemfeld der verbalen Wortbildung wird dargelegt, daß morphologische Integration nicht unsystematisch erfolgt, sondern sich an den Flexionsmustern deutscher komplexer Verben orientiert. Der Integrationsgrad der einzelnen Lexeme ist dynamisch und sprecherabhängig.

The superstable marker as an indicator of categorial weakness? (2006)

Dammel, Antje ; Nübling, Damaris

In this article we examine and "exapt" Wurzel's concept of superstable markers in an innovative manner. We develop an extended view of superstability through a critical discussion of Wurzel's original definition and the status of marker-superstability versus allomorphy in Natural Morphology: As we understand it, superstability is - above and beyond a step towards uniformity - mainly a symptom for the weakening of the category affected (cf. 1.,2. and 4.). This view is exemplified in four short case studies on superstability in different grammatical categories of four Germanic languages: genitive case in Mainland Scandinavian and English (3.1), plural formation in Dutch (3.2), second person singular ending -st in German (3.3), and ablaut generalisation in Luxembourgish (3.4).

Verb derivation in modern Greek inside alternation classes (2006)

Charitōnidēs, Charitōn Ch.

In this paper I present five alternations of the verb system of Modern Greek, which are recurrently mapped on the syntactic frame NPi__NP. The actual claim is that only the participation in alternations and/or the allocation to an alternation variant can reliably determine the relation between a verb derivative and its base. In the second part, the conceptual structures and semantic/situational fields of a large number of “-ízo” derivatives appearing inside alternation classes are presented. The restricted character of the conceptual and situational preferences inside alternations classes suggests the dominant character of the alternations component.

Generating and visualizing a soccer knowledge base (2006)

Buitelaar, Paul ; Eigner, Thomas ; Gulrajani, Greg ; Schutz, Alexander ; Siegel, Melanie ; Weber, Nicolas ; Cimiano, Philipp ; Ladwig, Günter ; Mantel, Matthias ; Zhu, Honggang

This demo abstract describes the SmartWeb Ontology-based Information Extraction System (SOBIE). A key feature of SOBIE is that all information is extracted and stored with respect to the SmartWeb ontology. In this way, other components of the systems, which use the same ontology, can access this information in a straightforward way. We will show how information extracted by SOBIE is visualized within its original context, thus enhancing the browsing experience of the end user.

Ontology-based Information Extraction with SOBA (2006)

Buitelaar, Paul ; Cimiano, Philipp ; Racioppa, Stefania ; Siegel, Melanie

In this paper we describe SOBA, a sub-component of the SmartWeb multi-modal dialog system. SOBA is a component for ontologybased information extraction from soccer web pages for automatic population of a knowledge base that can be used for domainspecific question answering. SOBA realizes a tight connection between the ontology, knowledge base and the information extraction component. The originality of SOBA is in the fact that it extracts information from heterogeneous sources such as tabular structures, text and image captions in a semantically integrated way. In particular, it stores extracted information in a knowledge base, and in turn uses the knowledge base to interpret and link newly extracted information with respect to already existing entities.

Algumas notas gramaticais sobre Imarenje (2006)

Prefácio (...) O Objectivo A finalidade deste livrinho é de publicar dados de uma língua minoritária para contribuir ao património cultural da nação moçambicana, de que Imarenje faz parte. O próximo passo será a implem-entação de mais correcções e modificações necessárias, seja nos detalhes ortográficos, seja na escolha de exemplos e frases. Neste sentido faço um apelo a todos que se interessam pelo desenvolvi-mento das línguas nacionais, em particular aos falantes de Imarenje: Façam comentários, contribuam para que futuras edições deste livrinho possam ser mais ricas! Oliver Kröger Editor da série Monografias Linguísticas Moçambicanas Nampula, Outubro de 2006

Open Access

Linguistik

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Institute

23 search hits