Linguistik-Klassifikation
Refine
Year of publication
Document Type
- Preprint (53)
- Conference Proceeding (35)
- Article (13)
- Book (9)
- Part of a Book (9)
- Working Paper (4)
- Review (2)
- diplomthesis (1)
Language
- English (99)
- German (22)
- Portuguese (4)
- French (1)
Has Fulltext
- yes (126)
Is part of the Bibliography
- no (126)
Keywords
- Computerlinguistik (38)
- Japanisch (18)
- Deutsch (16)
- Maschinelle Übersetzung (12)
- Syntaktische Analyse (10)
- Multicomponent Tree Adjoining Grammar (8)
- Semantik (6)
- Grammatik (5)
- Lexicalized Tree Adjoining Grammar (5)
- Korpus <Linguistik> (4)
Institute
- Extern (90)
- Universitätsbibliothek (1)
Linguistisches Impact-Assessment: Maschinelle Prognose mit Realitätsabgleich im Projekt TextTransfer
(2024)
Empirische Ansätze halten zunehmend Einzug in die Methodik und Herangehensweise geisteswissenschaftlicher Forschung. Die Sprachwissenschaften stützen sich zunehmend auf Forschungsdaten und Sprachmodelle, um ein digitales Bild natürlicher Sprachen zu erzeugen. Auf dieser Grundlage wird es möglich, entlang nutzerspezifischer Suchanfragen des distant reading automatisiert semantische Muster in Texten zu erkennen. Seit mithilfe solcher Modelle, etwa in Suchmaschinen, webbasierten Übersetzungs- oder Konversationstools, sprachliche Informationen maschinell in sinnhaften Zusammenhängen reproduziert werden können, sind die Implikationen sogenannter Künstlicher Intelligenz (KI) zu einem Thema im gesamtgesellschaftlichen Diskurs avanciert. Vielen Linguisten ist es deshalb ein Anliegen, ihre Erkenntnisse für neue Anwendungsfelder jenseits ihrer unmittelbaren disziplinären Umgebung zu öffnen und zu einer fundierten Debatte beizutragen. Dieser Feststellung gegenüber steht die Einsicht, dass Forschungsergebnisse aller Disziplinen zwar archiviert, aber mangels gezielter Interpretierbarkeit großer und komplexer Datenmengen häufig für diesen breiten Diskurs nicht genutzt werden. Ein nachweisbarer Impact bleibt aus. An dieser Schnittstelle erarbeitet das vom Bundesministerium für Bildung und Forschung (BMBF) finanzierte Projekt TextTransfer einen Ansatz, um per distant reading auf Art und Wahrscheinlichkeit eines gesellschaftlichen, wirtschaftlichen oder politischen Impacts textgebundenen Forschungswissens zu schließen. Zu diesem Zweck baut TextTransfer ein maschinelles Lernverfahren auf, das auf empirischem Erfahrungswissen zu Impacterfolgen von Forschungsprojekten fußt. Als wesentlicher Baustein dieses Erfahrungsgewinns gilt die Verifizierbarkeit der Lernergebnisse. Der vorliegende Artikel zeigt einen ersten Ansatz im Projekt, ein Sprachmodell in einem gesteuerten Lernverfahren mit belastbaren Lerndaten zu trainieren, um möglichst hohe Präzision im Impact-Assessment zu erreichen.
Computational Grammars can be adapted to detect ungrammatical sentences, effectively transforming them into error detection (or correction) systems. In this paper we provide a theoretical account of how to adapt implemented HPSG grammars for grammatical error detection. We discuss how a single ungrammatical input can be reconstructed in multiple ways and, in turn, be used to provide specific, high-quality feedback to language learners. We then move on to exemplify this with a few of the most common error classes made by learners of Mandarin Chinese. We conclude with some notes concerning the adaptation and implementation of the methods described here in ZHONG, an open-source HPSG grammar for Mandarin Chinese.
The Free Linguistic Environment (FLE) project focuses on the development of an open and free library of natural language processing functions and a grammar engineering platform for Lexical Functional Grammar (LFG) and related grammar frameworks. In its present state the code-base of FLE contains basic essential elements for LFG-parsing. It uses finite-state-based morphological analyzers and syntactic unification parsers to generate parse-trees and related functional representations for input sentences based on a grammar. It can process a variety of grammar formalisms, which can be used independently or serve as backbones for the LFG parser. Among the supported formalisms are Context-free Grammars (CFG), Probabilistic Contextfree Grammars (PCFG), and all formal grammar components of the XLEgrammar formalism. The current implementation of the LFG-parser includes the possibility to use a PCFG backbone to model probabilistic c-structures. It also includes f-structure representations that allow for the specification or calculation of probabilities for complete f-structure representations, as well as for sub-paths in f-structure trees. Given these design features, FLE enables various forms of probabilistic modeling of c-structures and f-structures for input or output sentences that go beyond the capabilities of other technologies based on the LFG framework.
In this paper I use the formal framework of minimalist grammars to implement a version of the traditional approach to ellipsis as 'deletion under syntactic (derivational) identity', which, in conjunction with canonical analyses of voice phenomena, immediately allows for voice mismatches in verb phrase ellipsis, but not in sluicing. This approach to ellipsis is naturally implemented in a parser by means of threading a state encoding a set of possible antecedent derivation contexts through the derivation tree. Similarities between ellipsis and pronominal resolution are easily stated in these terms. In the context of this implementation, two approaches to ellipsis in the transformational community are naturally seen as equivalent descriptions at different levels: the LF-copying approach to ellipsis resolution is best seen as a description of the parser, whereas the phonological deletion approach a description of the underlying relation between form and meaning.
In this paper, we report on a transformation scheme that turns a Categorial Grammar, more specifically, a Combinatory Categorial Grammar (CCG; see Baldridge, 2002) into a derivation- and meaning-preserving typed feature structure (TFS) grammar.
We describe the main idea which can be traced back at least to work by Karttunen (1986), Uszkoreit (1986), Bouma (1988), and Calder et al. (1988). We then show how a typed representation of complex categories can be extended by other constraints, such as modes, and indicate how the Lambda semantics of combinators is mapped into a TFS representation, using unification to perform perform alpha-conversion and beta-reduction (Barendregt, 1984). We also present first findings concerning runtime measurements, showing that the PET system, originally developed for the HPSG grammar framework, outperforms the OpenCCG parser by a factor of 8–10 in the time domain and a factor of 4–5 in the space domain.
We consider two alternatives for memory management in typed-feature-structure-based parsers by identifying structural properties of grammar signatures that may be of some predictive value in determining the consequences of those alternatives. We define these properties, summarize the results of a number of experiments on artificially constructed signatures with respect to the relative rank of their asymptotic cost at parse-time, and experimentally consider how they impact memory management.
The process of turning a hand-written HPSG theory into a working computational grammar requires complex considerations. Two leading platforms are available for implementing HPSG grammars: The LKB and TRALE. These platforms are based on different approaches, distinct in their underlying logics and implementation details. This paper adopts the perspective of a computational linguist whose goal is to implement an HPSG theory. It focuses on ten different dimensions, relevant to HPSG grammar implementation, and examines, compares, and evaluates the different means which the two approaches provide for implementing them. The paper concludes that the approaches occupy opposite positions on two axes: faithfulness to the hand-written theory and computational accessibility. The choice between them depends largely on the grammar writer's preferences regarding those properties.
We present a novel well-formedness condition for underspecified semantic representations which requires that every correct MRS representation must be a net. We argue that (almost) all correct MRS representations are indeed nets, and apply this condition to identify a set of eleven rules in the English Resource Grammar (ERG) with bugs in their semantics component. Thus we demonstrate that the net test is useful in grammar debugging.
During the past fifty years sign languages have been recognised as genuine languages with their own syntax and distinctive phonology. In the case of sign languages, phonetic description characterises the manual and non-manual aspects of signing. The latter relate to facial expression and upper torso position. In the case of manual components these characterise hand shape, orientation and position, and hand/arm movement in three dimensional space around the signer's body. These phonetic charcaterisations can be notated as HamNoSys descriptions of signs which has an executable interpretation to drive an avatar.
The HPSG sign language generation component of a text to sign language system prototype is described. The assimilation of SL morphological features to generate signs which respect positional agreement in signing space are emphasised.