Linguistik-Klassifikation
Refine
Year of publication
Document Type
- Conference Proceeding (17)
- Book (5)
- Part of a Book (5)
- Working Paper (3)
- Article (2)
- Preprint (1)
Language
- English (33) (remove)
Has Fulltext
- yes (33)
Is part of the Bibliography
- no (33)
Keywords
- Computerlinguistik (33) (remove)
Institute
- Extern (28)
Some requirements for a VERBMOBIL system capable of processing Japanese dialogue input have been explored. Based on a pilot study in the VERBMOBIL domain, dialogues between 2 participants and a professional Japanese interpreter have been analyzed with respect to a very typical and frequent feature: zero pronouns. Zero pronouns in Japanese texts or dialogues as well as overt pronouns in English texts or dialogues are an important element of discourse coherence. As to translation, this difference in the use of pronouns is a case of translation mismatch: information not explicitly expressed in the source language is needed in the target language. (Verb argument positions, normally obligatory in English, are rather frequently omitted in Japanese. Furthermore, verbs in Japanese are not marked with respect to features necessary for pronoun selection in English.)
The Child Language Data Exchange System (CHILDES) consists of Codes for the Human Analysis of Transcripts (CHAT), Computerized Language Analysis (CLAN), and a database. There is also an online manual which includes the CHILDES bibliography, the database, and the CHAT conventions as well as the CLAN instructions. The first three parts of this paper concern the CHAT format of transcription, grammatical coding, and analyzing transcripts by using the CLAN programs. The fourth part shows examples of transcribed and coded data.
Particles fullfill several distinct central roles in the Japanese language. They can mark arguments as well as adjuncts, can be functional or have semantic functions. There is, however, no straightforward matching from particles to functions, as, e.g., 'ga' can mark the subject, the object or the adjunct of a sentence. Particles can cooccur. Verbal arguments that could be identified by particles can be eliminated in the Japanese sentence. And finally, in spoken language particles are often omitted. A proper treatment of particles is thus necessary to make an analysis of Japanese sentences possible. Our treatment is based on an empirical investigation of 800 dialogues. We set up a type hierarchy of particles motivated by their subcategorizational and modificational behaviour. This type hierarchy is part of the Japanese syntax in VERBMOBIL.
The research performed in the DeepThought project aims at demonstrating the potential of deep linguistic processing if combined with shallow methods for robustness. Classical information retrieval is extended by high precision concept indexing and relation detection. On the basis of this approach, the feasibility of three ambitious applications will be demonstrated, namely: precise information extraction for business intelligence; email response management for customer relationship management; creativity support for document production and collective brainstorming. Common to these applications, and the basis for their development is the XML-based, RMRS-enabled core architecture framework that will be described in detail in this paper. The framework is not limited to the applications envisaged in the DeepThought project, but can also be employed e.g. to generate and make use of XML standoff annotation of documents and linguistic corpora, and in general for a wide range of NLP-based applications and research purposes.
MED (Media EDitor) is a program designed to facilitate the transcription of digitized soundfiles into textfiles. It was written by Hans Drexler and Daan Broeder, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands. [...] The aim of MED is to facilitate the transcription of sound into text using a single program. It works on the principle of the coexistence and interaction of two basic elements, the waveform display window and the text window. [...] This means that you no longer need to use both a sound editor and a word processor at the same time in order to transcribe digitized speech files. Instead, you can directly type the sound you hear (and see) via MED into the text window. Furthermore, you can directly link sound portions of the waveform display window to text portions of the text window, so that you can easily locate and listen to the original source of your transcription once the links have been set. In this function the waveform display window and the text window virtually interact with each other.
This special issue of the ZAS Papers in Linguistics contains a collection of papers of the French-German Thematic Summerschool on "Cognitive and physical models of speech production, and speech perception and of their interaction".
Organized by Susanne Fuchs (ZAS Berlin), Jonathan Harrington (IPdS Kiel), Pascal Perrier (ICP Grenoble) and Bernd Pompino-Marschall (HUB and ZAS Berlin) and funded by the German-French University in Saarbrücken this summerschool was held from September 19th till 24th 2004 at the coast of the Baltic Sea at the Heimvolkshochschule Lubmin (Germany) with 45 participants from Germany, France, Great Britain, Italy and Canada. The scientific program of this summerschool that is reprinted at the end of this volume included 11 key-note presentations by invited speakers, 21 oral presentations and a poster session (8 presentations). The names and addresses of all participants are also given in the back matter of this volume.
All participants was offered the opportunity to publish an extended version of their presentation in the ZAS Papers in Linguistics. All submitted papers underwent a review and an editing procedure by external experts and the organizers of the summerschool. As it is the case in a summerschool, papers present either works in progress, or works at a more advanced stage, or tutorials. They are ordered alphabetically by their first author's name, fortunately resulting in the fact that this special issue starts out with the paper that won the award as best pre-doctoral presentation, i.e. Sophie Dupont, Jérôme Aubin and Lucie Ménard with "A study of the McGurk effect in 4 and 5-year-old French Canadian children".
Preferences and defaults for definiteness and number in japanese to german machine translation
(1996)
A significant problem when translating Japanese dialogues into German is the missing information on number and definiteness in the Japanese analysis output. The integration of the search for such information into the transfer process provides an efficient solution. General transfer includes conditions to make it possible to consider external knowledge. Thereby, grammatical and lexical knowledge of the source language, knowledge of lexical restrictions on the target language, domain knowledge and discourse knowledge are accessible.
Based on a detailed case study of parallel grammar development distributed across two sites, we review some of the requirements for regression testing in grammar engineering, summarize our approach to systematic competence and performance profiling, and discuss our experience with grammar development for a commercial application. If possible, the workshop presentation will be organized around a software demonstration.
The Deep Linguistic Processing with HPSG Initiative (DELH-IN) provides the infrastructure needed to produce open-source semantic transfer-based machine translation systems. We have made available a prototype Japanese-English machine translation system built from existing resources include parsers, generators, bidirectional grammars and a transfer engine.
In this paper we describe SOBA, a sub-component of the SmartWeb multi-modal dialog system. SOBA is a component for ontologybased information extraction from soccer web pages for automatic population of a knowledge base that can be used for domainspecific question answering. SOBA realizes a tight connection between the ontology, knowledge base and the information extraction component. The originality of SOBA is in the fact that it extracts information from heterogeneous sources such as tabular structures, text and image captions in a semantically integrated way. In particular, it stores extracted information in a knowledge base, and in turn uses the knowledge base to interpret and link newly extracted information with respect to already existing entities.