Linguistik-Klassifikation
Refine
Year of publication
Document Type
- Preprint (53)
- Conference Proceeding (34)
- Article (13)
- Part of a Book (9)
- Book (8)
- Working Paper (4)
- Review (2)
- diplomthesis (1)
Language
- English (98)
- German (21)
- Portuguese (4)
- French (1)
Has Fulltext
- yes (124) (remove)
Is part of the Bibliography
- no (124)
Keywords
- Computerlinguistik (38)
- Japanisch (18)
- Deutsch (16)
- Maschinelle Übersetzung (12)
- Syntaktische Analyse (10)
- Multicomponent Tree Adjoining Grammar (8)
- Semantik (6)
- Lexicalized Tree Adjoining Grammar (5)
- Grammatik (4)
- Korpus <Linguistik> (4)
Institute
- Extern (90)
- Universitätsbibliothek (1)
The Free Linguistic Environment (FLE) project focuses on the development of an open and free library of natural language processing functions and a grammar engineering platform for Lexical Functional Grammar (LFG) and related grammar frameworks. In its present state the code-base of FLE contains basic essential elements for LFG-parsing. It uses finite-state-based morphological analyzers and syntactic unification parsers to generate parse-trees and related functional representations for input sentences based on a grammar. It can process a variety of grammar formalisms, which can be used independently or serve as backbones for the LFG parser. Among the supported formalisms are Context-free Grammars (CFG), Probabilistic Contextfree Grammars (PCFG), and all formal grammar components of the XLEgrammar formalism. The current implementation of the LFG-parser includes the possibility to use a PCFG backbone to model probabilistic c-structures. It also includes f-structure representations that allow for the specification or calculation of probabilities for complete f-structure representations, as well as for sub-paths in f-structure trees. Given these design features, FLE enables various forms of probabilistic modeling of c-structures and f-structures for input or output sentences that go beyond the capabilities of other technologies based on the LFG framework.
In this paper I use the formal framework of minimalist grammars to implement a version of the traditional approach to ellipsis as 'deletion under syntactic (derivational) identity', which, in conjunction with canonical analyses of voice phenomena, immediately allows for voice mismatches in verb phrase ellipsis, but not in sluicing. This approach to ellipsis is naturally implemented in a parser by means of threading a state encoding a set of possible antecedent derivation contexts through the derivation tree. Similarities between ellipsis and pronominal resolution are easily stated in these terms. In the context of this implementation, two approaches to ellipsis in the transformational community are naturally seen as equivalent descriptions at different levels: the LF-copying approach to ellipsis resolution is best seen as a description of the parser, whereas the phonological deletion approach a description of the underlying relation between form and meaning.
In this paper, we report on a transformation scheme that turns a Categorial Grammar, more specifically, a Combinatory Categorial Grammar (CCG; see Baldridge, 2002) into a derivation- and meaning-preserving typed feature structure (TFS) grammar.
We describe the main idea which can be traced back at least to work by Karttunen (1986), Uszkoreit (1986), Bouma (1988), and Calder et al. (1988). We then show how a typed representation of complex categories can be extended by other constraints, such as modes, and indicate how the Lambda semantics of combinators is mapped into a TFS representation, using unification to perform perform alpha-conversion and beta-reduction (Barendregt, 1984). We also present first findings concerning runtime measurements, showing that the PET system, originally developed for the HPSG grammar framework, outperforms the OpenCCG parser by a factor of 8–10 in the time domain and a factor of 4–5 in the space domain.
We consider two alternatives for memory management in typed-feature-structure-based parsers by identifying structural properties of grammar signatures that may be of some predictive value in determining the consequences of those alternatives. We define these properties, summarize the results of a number of experiments on artificially constructed signatures with respect to the relative rank of their asymptotic cost at parse-time, and experimentally consider how they impact memory management.
The process of turning a hand-written HPSG theory into a working computational grammar requires complex considerations. Two leading platforms are available for implementing HPSG grammars: The LKB and TRALE. These platforms are based on different approaches, distinct in their underlying logics and implementation details. This paper adopts the perspective of a computational linguist whose goal is to implement an HPSG theory. It focuses on ten different dimensions, relevant to HPSG grammar implementation, and examines, compares, and evaluates the different means which the two approaches provide for implementing them. The paper concludes that the approaches occupy opposite positions on two axes: faithfulness to the hand-written theory and computational accessibility. The choice between them depends largely on the grammar writer's preferences regarding those properties.
We present a novel well-formedness condition for underspecified semantic representations which requires that every correct MRS representation must be a net. We argue that (almost) all correct MRS representations are indeed nets, and apply this condition to identify a set of eleven rules in the English Resource Grammar (ERG) with bugs in their semantics component. Thus we demonstrate that the net test is useful in grammar debugging.
During the past fifty years sign languages have been recognised as genuine languages with their own syntax and distinctive phonology. In the case of sign languages, phonetic description characterises the manual and non-manual aspects of signing. The latter relate to facial expression and upper torso position. In the case of manual components these characterise hand shape, orientation and position, and hand/arm movement in three dimensional space around the signer's body. These phonetic charcaterisations can be notated as HamNoSys descriptions of signs which has an executable interpretation to drive an avatar.
The HPSG sign language generation component of a text to sign language system prototype is described. The assimilation of SL morphological features to generate signs which respect positional agreement in signing space are emphasised.
The project WBLUX (Wortbildung des moselfränkisch-luxemburgischen Raumes) at the University of Luxembourg aims at the investigation of Luxembourgish word formation through different text sorts and genres. In order to achieve this goal the compilation of an annotated corpus is needed. This article gives an example for benefits of using a corpus with annotations like parts of speech, lemmata and word formation affixes in the analysis of productivity of some selected word formation affixes of Luxembourgish. Then it describes how one can achieve such a corpus from a technical point of view. This includes the choice of corpus format, of a database platform and the designing of programs needed for the annotation process of word formation itself. This article also suggests new corpus linguistic approaches for research of word formation like analyzing the usage of word formation bases in the entire corpus or performing context analysis in order to determine semantical functions of each suffix.
In this paper, I revisit the arguments against the use of fuzzy logic in linguistics (or more generally, against a truth-functional account of vagueness). In part, this is an exercise to explain to fuzzy logicians why linguists have shown little interest in their research paradigm. But, the paper contains more than this interdisciplinary service effort that I started out on: In fact, this seems an opportune time for revisiting the arguments against fuzzy logic in linguistics since three recent developments affect the argument. First, the formal apparatus of fuzzy logic has been made more general since the 1970s, specifically by Hajek [6], and this may make it possible to define operators in a way to make fuzzy logic more suitable for linguistic purposes. Secondly, recent research in philosophy has examined variations of fuzzy logic ([18, 19]). Since the goals of linguistic semantics seem sometimes closer to those of some branches of philosophy of language than they are to the goals of mathematical logic, fuzzy logic work in philosophy may mark the right time to reexamine fuzzy logic from a linguistic perspective as well. Finally, the reasoning used to exclude fuzzy logic in linguistics has been tied to the intuition that p and not p is a contradiction. However, this intuition seems dubious especially when p contains a vague predicate. For instance, one can easily think of circumstances where 'What I did was smart and not smart.' or 'Bea is both tall and not tall.' don’t sound like senseless contradictions. In fact, some recent experimental work that I describe below has shown that contradictions of classical logic aren’t always felt to be contradictory by speakers. So, it is important to see to what extent the argument against fuzzy logic depends on a specific stance on the semantics of contradictions. In sum then, there are three good reasons to take another look at fuzzy logic for linguistic purposes.
Dieser Beitrag basiert auf dem Forschungsprojekt DICONALE, das sich die Erstellung eines konzeptuell orientierten, zweisprachigen Wörterbuchs mit Online-Zugang für Verballexeme des Deutschen und Spanischen zum Ziel gesetzt hat. Das Anliegen dieses Beitrags ist es, die relevantesten Eigenschaften des geplanten Wörterbuchs exemplarisch anhand von zwei Verblexemen aus dem konzeptuellen Feld der KOGNITION vorzustellen. Neben der Beschreibung der paradigmatischen Sinnrelationen der Feldelemente zueinander wird besonderer Wert auf die syntagmatischen Inhalts- und Ausdrucksstrukturen und auf die kontrastive Analyse gelegt. Es wird versucht, einerseits einen Überblick über die wichtigsten Besonderheiten des Wörterbuchs anzubieten und andererseits die Relevanz solcher Kriterien für die heutige kontrastive Lexikographie Deutsch-Spanisch nachzuweisen.
Im folgenden Beitrag handelt es sich um die Entwicklung eines semantischen Wörterbuches der deutschen Sprache für maschinelle Sprachverarbeitungssysteme im Rahmen des Projektes "Compreno" bei dem russischen IT-Unternehmen ABBYY. Es wird eine kurze Übersicht über andere elektronische Quellen zur deutschen Sprache gegeben, ferner werden ihre Unterschiede im Vergleich zum Projektwörterbuch analysiert. An einigen Beispielen werden aktuelle Probleme der Computerlexikografie (Bedeutungsunterscheidung, Komposita-Analyse u.a.) und ihre mögliche Lösung in Bezug auf das Projektwörterbuch betrachtet.
This paper discusses an attempt to write a computer program that would properly model the phonological development of Chinese from Middle Chinese to Modern Peking Mandarin, using the rules in Chen 1976. Several problems are encountered, the most significant being that the rules cannot apply in the same order for all lexical items. The significance of this in terms of the implementation of sound change is briefly discussed.
This special issue of the ZAS Papers in Linguistics contains a collection of papers of the French-German Thematic Summerschool on "Cognitive and physical models of speech production, and speech perception and of their interaction".
Organized by Susanne Fuchs (ZAS Berlin), Jonathan Harrington (IPdS Kiel), Pascal Perrier (ICP Grenoble) and Bernd Pompino-Marschall (HUB and ZAS Berlin) and funded by the German-French University in Saarbrücken this summerschool was held from September 19th till 24th 2004 at the coast of the Baltic Sea at the Heimvolkshochschule Lubmin (Germany) with 45 participants from Germany, France, Great Britain, Italy and Canada. The scientific program of this summerschool that is reprinted at the end of this volume included 11 key-note presentations by invited speakers, 21 oral presentations and a poster session (8 presentations). The names and addresses of all participants are also given in the back matter of this volume.
All participants was offered the opportunity to publish an extended version of their presentation in the ZAS Papers in Linguistics. All submitted papers underwent a review and an editing procedure by external experts and the organizers of the summerschool. As it is the case in a summerschool, papers present either works in progress, or works at a more advanced stage, or tutorials. They are ordered alphabetically by their first author's name, fortunately resulting in the fact that this special issue starts out with the paper that won the award as best pre-doctoral presentation, i.e. Sophie Dupont, Jérôme Aubin and Lucie Ménard with "A study of the McGurk effect in 4 and 5-year-old French Canadian children".
The author presents MASSY, the MODULAR AUDIOVISUAL SPEECH SYNTHESIZER. The system combines two approaches of visual speech synthesis. Two control models are implemented: a (data based) di-viseme model and a (rule based) dominance model where both produce control commands in a parameterized articulation space. Analogously two visualization methods are implemented: an image based (video-realistic) face model and a 3D synthetic head. Both face models can be driven by both the data based and the rule based articulation model.
The high-level visual speech synthesis generates a sequence of control commands for the visible articulation. For every virtual articulator (articulation parameter) the 3D synthetic face model defines a set of displacement vectors for the vertices of the 3D objects of the head. The vertices of the 3D synthetic head then are moved by linear combinations of these displacement vectors to visualize articulation movements. For the image based video synthesis a single reference image is deformed to fit the facial properties derived from the control commands. Facial feature points and facial displacements have to be defined for the reference image. The algorithm can also use an image database with appropriately annotated facial properties. An example database was built automatically from video recordings. Both the 3D synthetic face and the image based face generate visual speech that is capable to increase the intelligibility of audible speech.
Other well known image based audiovisual speech synthesis systems like MIKETALK and VIDEO REWRITE concatenate pre-recorded single images or video sequences, respectively. Parametric talking heads like BALDI control a parametric face with a parametric articulation model. The presented system demonstrates the compatibility of parametric and data based visual speech synthesis approaches.
The goal of our current project is to build a system that can learn to imitate a version of a spoken utterance using an articulatory speech synthesiser. The approach is informed and inspired by knowledge of early infant speech development. Thus we expect our system to reproduce and exploit the utility of infant behaviours such as listening, vocal play, babbling and word imitation. We expect our system to develop a relationship between the sound-making capabilities of its vocal tract and the phonetic/phonological structure of imitated utterances. At the heart of our approach is the learning of an inverse model that relates acoustic and motor representations of speech. The acoustic to auditory mappings uses an auditory filter bank and a self-organizing phase of learning. The inverse model from auditory to vocal tract control parameters is estimated using a babbling phase, in which the vocal tract is essentially driven in a random manner, much like the babbling phase of speech acquisition in infants. The complete system can be used to imitate simple utterances through a direct mapping from sound to control parameters. Our initial results show that this procedure works well for sounds generated by its own voice. Further work is needed to build a phonological control level and achieve better performance with real speech.
Der Beitrag behandelt zunächst die Frage, welche Vorteile elektronische Wörterbücher gegenüber traditionell gedruckten Wörterbüchern besitzen. Danach werden drei Online-Programme zur automatischen Übersetzung (Babelfish, Google Übersetzer, Bing Translator) vorgestellt. Beispieltexte werden mit diesen Programmen übersetzt, danach wird die jeweilige Qualität der Übersetzungen beurteilt. Schließlich diskutiert der Beitrag noch die Folgen, die durch die Möglichkeiten automatischen Übersetzens für die Auslandsgermanistik zu erwarten sind. Dabei zeigt sich, dass Programme für das automatische Übersetzen künftig durchaus ernstzunehmende Auswirkungen auf die philologischen Wissenschaften haben können.
Seit einiger Zeit ist zu beobachten, dass zu dem Handwerkszeug eines DaF-Lerners […] nicht mehr Grammatiken und Wörterbücher im klassischen Sinne gehören. Das Nachschlagen in Printwerken wird auf allen Stufen und für alle Benutzersituationen durch die Konsultation in den unterschiedlichsten über Internet frei zugänglichen Materialien ersetzt. […] So scheint es, dass gerade im DaF-Bereich die Printnachschlagewerke bald schon zu einem Relikt anderer Zeiten angehören werden. Aber genauso wie für die Benutzung von Printwörterbüchern, benötigt der DaF-Lerner durch die ganz neu entstehenden online-Nachschlagetechniken (Engelberg/Lemnitzer 2009, 111) genügend Information und Schulung, um für seine jeweilige Benutzersituation in dem dafür am besten geeigneten Konsultationssystem die jeweils adäquateste Rechercheoption auszuwählen. […] Das gilt gleichermaßen für Print- wie für Onlineressourcen, wobei allerdings gerade bei Internetwörterbüchern bei der Suchanfrage das Risiko des Orientierungsverlustes („lost in hyperspace“) verstärkt auftreten kann (cfr. Haß/Schmitz 2010, 4). Es ist daher Aufgabe der Lehrenden, die entsprechende Orientierung und Hilfestellung zu leisten. Leider ist zu bemerken, dass im DaF-Bereich die nötige lexikographische Kompetenz nicht genügend vermittelt wird, was nicht zuletzt oft an der mangelnden lexikographischen Vorbildung der DaF-Lehrer liegt. Ziel des Beitrages ist es daher, einige Internetwörterbücher (IWB) mit freiem Zugang für die Deutsche Sprache in groben Zügen vorzustellen und für ihren Nutzen in unterschiedliche Benutzersituationen im Bereich DaF zu kommentieren, um dem DaF-Lerner und Lehrer die Auswahl aus dem inzwischen recht unübersichtlichen Angebot für seine jeweiligen Bedürfnisse zu erleichtern. In Anlehnung an die vorgeschlagenen Kriterien von Engelberg/Lemnitzer (42009, 220ff.), Storrer (2010) und das Evaluationsraster zur Beurteilung von online-WB von Kemmer (2010) sollen verschiedene aktuelle IWB der deutschen Gegenwartssprache beurteilt werden. Zur Wörterbuch-Typologisierung orientiere ich mich an den Vorschlägen von Engelberg/Lemnitzer (42009), beschränke aber in diesem Rahmen den Gegenstandsbereich auf zweisprachige IWB, spezifische einsprachige DaF-IWB und einige modularisierte allgemeinsprachige Wörterbuchportale, in denen verschiedene IWB miteinander verlinkt sind.
Estudiosos dos campos da Educação, da Linguística Aplicada e da Formação de Professores de Línguas insistem hoje na grande importância da inclusão de Tecnologias de Informação e Comunicação (TICs) na formação inicial, bem como na necessidade de promover o desenvolvimento do pensamento crítico-reflexivo dos futuros professores. Tomando como pressupostos teóricos estudos acerca das características da sociedade de informação, dos ambientes virtuais e da formação de professores, este trabalho tem como objetivo discutir possibilidades oferecidas pela plataforma Moodle de aprendizagem na formação inicial de professores de alemão. Para tanto, apresentaremos diferentes formas de uso de ambientes virtuais e de ferramentas neles disponíveis, que demonstraram ser de grande valor no processo de formação de licenciandos, tanto em língua alemã, quanto durante suas práticas iniciais. As experiências apontam para um valor inestimável de ambientes virtuais no acompanhamento de licenciandos no processo de aprendizagem da língua e nas primeiras experiências com a docência.
Um den schwierigen Wettbewerbsbedingungen im internationalen Vergleich entgegentreten zu können, benötigen kleine und mittlere Unternehmen nicht nur den Einsatz moderner Informationstechniken und eine kommerzielle Präsenz im multimedialen und grafikintensiven Teil des Internets, sondern auch eine an den Kunden angepasste Web-Präsenz. In diesem Sinne widmen wir uns in diesem Beitrag der wirtschaftlichen Notwendigkeit einer kontrastiven Hypertextgrammatik. In den letzten Jahren ist dank der zunehmenden Bedeutung des Internets als Handelsplattform eine grammatische Unterdisziplin entstanden, die zur Geschäftsoptimierung kleiner und mittlerer Unternehmen einen beachtlichen Beitrag leisten könnte: die kontrastive Hypertextgrammatik. Wir gehen hier der Frage nach, wie man bei einer kontrastiven hypertextgrammatischen Studie vorgehen könnte.
The purpose of this article is to report on the work carried out during the research project "O trabalho de tradutor como fonte para a constituição de base de dados" (The translator´s work as a source for the constitution of a database). Through the restoration, organization and digitalization of the personal glossary and part of the books containing the translations made by the deceased public translator Gustavo Lohnefink, this research project intends to construct a digital database of German – Portuguese technical terms (for the language pair), which could then be used by other translators. In order to achieve this purpose, a specific methodology had to be developed, which could be used as a starting-point for the treatment and recovery of other similarly organized data-collections.
This paper aims to investigate the dynamics of text-image interplay as exemplified by various text types applied to second language teaching and translation didactics. Based on examples of texts from the fields of Science, Technology, Literature and Language Teaching, the authors attempt to assess both successful and unsuccessful instances of the application of iconical resources in text production. Some didactic consequences are discussed.
The two papers included in this volume have developed from work with the CHILDES tools and the Media Editor in the two research projects, "Second language acquisition of German by Russian learners", sponsored by the Max Planck Institute for Psycholinguistics, Nijmegen, from 1998 to 1999 (directed by Ursula Stephany, University of Cologne, and Wolfgang Klein, Max Planck Institute for Psycholinguistics, Nijmegen) and "The age factor in the acquisition of German as a second language", sponsored by the German Science Foundation (DFG), Bonn, since 2000 (directed by Ursula Stephany, University of Cologne, and Christine Dimroth, Max Planck Institute for Psycholinguistics, Nijmegen). The CHILDES Project has been developed and is being continuously improved at Carnegie Mellon University, Pittsburgh, under the supervision of Brian MacWhinney. Having used the CHILDES tools for more than ten years for transcribing and analyzing Greek child data there it was no question that I would also use them for research into the acquisition of German as a second language and analyze the big amount of spontaneous speech gathered from two Russian girls with the help of the CLAN programs. When in the spring of 1997, Steven Gillis from the University of Antwerp (in collaboration with Gert Durieux) developed a lexicon-based automatic coding system based on the CLAN program MOR and suitable for coding languages with richer morphologies than English, such as Modern Greek. Coding huge amounts of data then became much quicker and more comfortable so that I decided to adopt this system for German as well. The paper "Working with the CHILDES Tools" is based on two earlier manuscripts which have grown out of my research on Greek child language and the many CHILDES workshops taught in Germany, Greece, Portugal, and Brazil over the years. Its contents have now been adapted to the requirements of research into the acquisition of German as a second language and for use on Windows.
In this text, we describe the development of a broad coverage grammar for Japanese that has been built for and used in different application contexts. The grammar is based on work done in the Verbmobil project (Siegel 2000) on machine translation of spoken dialogues in the domain of travel planning. The second application for JACY was the automatic email response task. Grammar development was described in Oepen et al. (2002a). Third, it was applied to the task of understanding material on mobile phones available on the internet, while embedded in the project DeepThought (Callmeier et al. 2004, Uszkoreit et al. 2004). Currently, it is being used for treebanking and ontology extraction from dictionary definition sentences by the Japanese company NTT (Bond et al. 2004).
Das Problem des Transfers in der maschinellen Übersetzung von Japanisch nach Englisch ist fehlende Information über Numerus und Definitheit im Japanischen, die für die Wahl der englischen Artikel und die Nomenmarkierung gebraucht wird. Obwohl dieses Problem signifikant ist, beschäftigt sich die Forschungsliteratur kaum damit. [...] Wir bsaieren unsere Untersuchungen auf experimentell erhobenen Daten aus einem Experiment über deutsch-japanische gedolmetschte Terminaushandlungsdialoge [...]. Auf diese Weise können Phänomene bestimmt werden, die für die Domäne von VERBMOBIL relevant sind. Wir sehen unser Vorgehen in Übereinstimmung mit dem 'Sublanguage'-Ansatz [...].
Eins der signifikanten Probleme in der maschinellen Übersetzung japanische in deutsche Sprache ist die fehlende Information und Definitheit im japanischen Analyse-Output. Eine effiziente Lösung dieses Problems ist es, die Suche nach der relevanten Information in den Transfer zu integrieren. Transferregeln werden mit Präferenzregeln und Default-Regeln kombiniert. Dadurch wird Information über lexikalische Restriktionen der Zielsprache, über die Domäne und über den Diskurs zugänglich.
Japanese is often taken to be strictly head-final in its syntax. In our work on a broad-coverage, precision implemented HPSG for Japanese, we have found that while this is generally true, there are nonetheless a few minor exceptions to the broad trend. In this paper, we describe the grammar engineering project, present the exceptions we have found, and conclude that this kind of phenomenon motivates on the one hand the HPSG type hierarchical approach which allows for the statement of both broad generalizations and exceptions to those generalizations and on the other hand the usefulness of grammar engineering as a means of testing linguistic hypotheses.
We present a broad coverage Japanese grammar written in the HPSG formalism with MRS semantics. The grammar is created for use in real world applications, such that robustness and performance issues play an important role. It is connected to a POS tagging and word segmentation tool. This grammar is being developed in a multilingual context, requiring MRS structures that are easily comparable across languages.
Based on a detailed case study of parallel grammar development distributed across two sites, we review some of the requirements for regression testing in grammar engineering, summarize our approach to systematic competence and performance profiling, and discuss our experience with grammar development for a commercial application. If possible, the workshop presentation will be organized around a software demonstration.
This paper proposes an annotating scheme that encodes honorifics (respectful words). Honorifics are used extensively in Japanese, reflecting the social relationship (e.g. social ranks and age) of the referents. This referential information is vital for resolving zero
pronouns and improving machine translation outputs. Annotating honorifics is a complex task that involves identifying a predicate with honorifics, assigning ranks to referents of the
predicate, calibrating the ranks, and connecting referents with their predicates.
Some requirements for a VERBMOBIL system capable of processing Japanese dialogue input have been explored. Based on a pilot study in the VERBMOBIL domain, dialogues between 2 participants and a professional Japanese interpreter have been analyzed with respect to a very typical and frequent feature: zero pronouns. Zero pronouns in Japanese texts or dialogues as well as overt pronouns in English texts or dialogues are an important element of discourse coherence. As to translation, this difference in the use of pronouns is a case of translation mismatch: information not explicitly expressed in the source language is needed in the target language. (Verb argument positions, normally obligatory in English, are rather frequently omitted in Japanese. Furthermore, verbs in Japanese are not marked with respect to features necessary for pronoun selection in English.)
We present an architecture for the integration of shallow and deep NLP components which is aimed at flexible combination of different language technologies for a range of practical current and future applications. In particular, we describe the integration of a high-level HPSG parsing system with different high-performance shallow components, ranging from named entity recognition to chunk parsing and shallow clause recognition. The NLP components enrich a representation of natural language text with layers of new XML meta-information using a single shared data structure, called the text chart. We describe details of the integration methods, and show how information extraction and language checking applications for realworld German text benefit from a deep grammatical analysis.
Hybrid robust deep and shallow semantic processing for creativity support in document production
(2004)
The research performed in the DeepThought project (http://www.project-deepthought.net) aims at demonstrating the potential of deep linguistic processing if added to existing shallow methods that ensure robustness. Classical information retrieval is extended by high precision concept indexing and relation detection. We use this approach to demonstrate the feasibility of three ambitious applications, one of which is a tool for creativity support in document production and collective brainstorming. This application is described in detail in this paper. Common to all three applications, and the basis for their development is a platform for integrated linguistic processing. This platform is based on a generic software architecture that combines multiple NLP components and on robust minimal recursive semantics (RMRS) as a uniform representation language.
In this paper we show an approach to the customization of GermaNet to the German HPSG grammar lexicon developed in the Verbmobil project. GermaNet has a broad coverage of the German base vocabulary and fine-grained semantic classification; while the HPSG grammar lexicon is comparatively small und has a coarse-grained semantic classification. In our approach, we have developed a mapping algorithm to relate the synsets in GermaNet with the semantic sorts in HPSG. The evaluation result shows that this approach is useful for the lexical extension of our deep grammar development to cope with real-world text understanding.
Particles fullfill several distinct central roles in the Japanese language. They can mark arguments as well as adjuncts, can be functional or have semantic functions. There is, however, no straightforward matching from particles to functions, as, e.g., 'ga' can mark the subject, the object or the adjunct of a sentence. Particles can cooccur. Verbal arguments that could be identified by particles can be eliminated in the Japanese sentence. And finally, in spoken language particles are often omitted. A proper treatment of particles is thus necessary to make an analysis of Japanese sentences possible. Our treatment is based on an empirical investigation of 800 dialogues. We set up a type hierarchy of particles motivated by their subcategorizational and modificational behaviour. This type hierarchy is part of the Japanese syntax in VERBMOBIL.
Sprachtechnologie für übersetzungsgerechtes Schreiben am Beispiel Deutsch, Englisch, Japanisch
(2009)
Wir [...] haben uns zur Aufgabe gesetzt, Wege zu finden, wie linguistisch basierte Software den Prozess des Schreibens technischer Dokumentation unterstützen kann. Dabei haben wir einerseits die Schwierigkeiten im Blick, die japanische und deutsche Autoren (und andere Nicht-Muttersprachler des Englischen) beim Schreiben englischer Texte haben. Besonders japanische Autoren haben mit Schwierigkeiten zu kämpfen, weil sie hochkomplexe Ideen in einer Sprache ausdrücken müssen, die von Informationsstandpunkt her sehr unterschiedlich zu ihrer Muttersprache ist. Andererseits untersuchen wir technische Dokumentation, die von Autoren in ihrer Muttersprache geschrieben wird. Obwohl hier die fremdsprachliche Komponente entfällt, ist doch auch erhebliches Verbesserungspotential vorhanden. Das Ziel ist hier, Dokumente verständlich, konsistent und übersetzungsgerecht zu schreiben. Der fundamentale Ansatz in der Entwicklung linguistisch-basierter Software ist, dass gute linguistische Software auf Datenmaterial basiert und sich an den konkreten Zielen der besseren Dokumentation orientiert.
Preferences and defaults for definiteness and number in japanese to german machine translation
(1996)
A significant problem when translating Japanese dialogues into German is the missing information on number and definiteness in the Japanese analysis output. The integration of the search for such information into the transfer process provides an efficient solution. General transfer includes conditions to make it possible to consider external knowledge. Thereby, grammatical and lexical knowledge of the source language, knowledge of lexical restrictions on the target language, domain knowledge and discourse knowledge are accessible.
Ein einer Äußerung können Nullpronomina aus mehreren [...] Gruppen vorkommen. Die [...] Gruppen können auf die Ebenen eines Schicht-Dialogmodells bezogen werden; andererseits können sie Hinweise geben, welche Informationen in einem Dialogmodell verfügbar sein sollten. Dies wird in der Folgezeit genauer zu untersuchen sein. Im folgenden werden die genannten Typen von Nullpronomina genauer dargestellt und Lösungsverfahren zum Auffinden der Referenten genannt.
Die Entwicklung eines individuellen Standards „vom grünen Tisch“ führt selten zu zufriedenstellenden Ergebnissen. Bei der automatischen Prüfung stellt man schnell fest, dass die „ausgedachten“ Regeln einer systematischen Anwendung nicht standhalten. Bei der Implementierung solcher Richtlinien stellt man fest, dass sie oft zu wenig konkret formuliert sind, wie z.B. „formulieren Sie Handlungsanweisungen knapp und präzise“. Wie jedoch kann ein Standard entwickelt werden, der zu einem Unternehmen, seiner Branche und Zielgruppen passt und für die automatische Prüfung implementiert werden kann? Sprachtechnologie hilft effizient bei der Entwicklung individueller Richtlinien. Durch Datenanalyse, Satzcluster und Parametrisierung entsteht ein textspezifischer individueller Standard. Ist damit aber der Gegensatz von Kreativität und Standardisierung aufgehoben?
Die Domäne in VERBMOBIL sind Terminaushandlungsdialoge. Für die Syntax bedeutet das zunächst, daß die Sytnax sich an gesprochener Sprache orientieren muß. Das beinhaltet Nullanaphern, Phrasen, die auf die Kommunikationssituation bezogen sind und Phrasen, die für geschriebene Sprache als nicht wohlgeformt bezeichnet werden. Weitergehend gibt es einige domänenspezifische syntaktische besonderheiten, wie zum Biepsiel die Realisierung von Zeitangaben.
A comprehensive investigation of Japanese particle was missing up to now. General implications were set up without the fact that a comprehensive analysis was carried out. [...] We offer a lexicalist treatment of the problem. Instead of assuming different phrase structure rules we state a type hierarchy of Japanese particles. This makes a uniform treatment of phrase structure as well as a differentiation of subcategorization patterns possible.
We present a solution for the representation of Japanese honorifical information in the HPSG framework. Basically, there are three dimensions of honorification. We show that a treatment is necessary that involves both the syntactic and the contextual level of information. The japanese grammar is part of a machine translation system.
The research performed in the DeepThought project aims at demonstrating the potential of deep linguistic processing if combined with shallow methods for robustness. Classical information retrieval is extended by high precision concept indexing and relation detection. On the basis of this approach, the feasibility of three ambitious applications will be demonstrated, namely: precise information extraction for business intelligence; email response management for customer relationship management; creativity support for document production and collective brainstorming. Common to these applications, and the basis for their development is the XML-based, RMRS-enabled core architecture framework that will be described in detail in this paper. The framework is not limited to the applications envisaged in the DeepThought project, but can also be employed e.g. to generate and make use of XML standoff annotation of documents and linguistic corpora, and in general for a wide range of NLP-based applications and research purposes.
In this paper we describe SOBA, a sub-component of the SmartWeb multi-modal dialog system. SOBA is a component for ontologybased information extraction from soccer web pages for automatic population of a knowledge base that can be used for domainspecific question answering. SOBA realizes a tight connection between the ontology, knowledge base and the information extraction component. The originality of SOBA is in the fact that it extracts information from heterogeneous sources such as tabular structures, text and image captions in a semantically integrated way. In particular, it stores extracted information in a knowledge base, and in turn uses the knowledge base to interpret and link newly extracted information with respect to already existing entities.
This demo abstract describes the SmartWeb Ontology-based Information Extraction System (SOBIE). A key feature of SOBIE is that all information is extracted and stored with respect to the SmartWeb ontology. In this way, other components of the systems, which use the same ontology, can access this information in a straightforward way. We will show how information extracted by SOBIE is visualized within its original context, thus enhancing the browsing experience of the end user.
Dialogue acts in Verbmobil 2
(1998)
This report describes the dialogue phases and the second edition dialogue acts which are used in the VERBMOBIL 2 project [...]. While in the first project phase the scenario was restricted to appointment scheduling dialogues, it has been extended to travel planning in the second phase with appointment scheduling being only a part of the new scenario.