Linguistik
Refine
Year of publication
Document Type
- Article (1213)
- Part of a Book (784)
- Working Paper (254)
- Review (181)
- Conference Proceeding (166)
- Preprint (122)
- Book (108)
- Part of Periodical (64)
- Report (58)
- Doctoral Thesis (23)
Language
Has Fulltext
- yes (2991) (remove)
Keywords
- Deutsch (436)
- Syntax (151)
- Linguistik (126)
- Englisch (123)
- Semantik (112)
- Spracherwerb (96)
- Phonologie (85)
- Rezension (77)
- Kroatisch (68)
- Fremdsprachenlernen (67)
Institute
- Extern (438)
- Institut für Deutsche Sprache (IDS) Mannheim (113)
- Neuere Philologien (43)
- Sprachwissenschaften (43)
- Universitätsbibliothek (4)
- Sprach- und Kulturwissenschaften (3)
- Gesellschaftswissenschaften (2)
- Medizin (2)
- Präsidium (2)
- SFB 268 (2)
In this paper we show an approach to the customization of GermaNet to the German HPSG grammar lexicon developed in the Verbmobil project. GermaNet has a broad coverage of the German base vocabulary and fine-grained semantic classification; while the HPSG grammar lexicon is comparatively small und has a coarse-grained semantic classification. In our approach, we have developed a mapping algorithm to relate the synsets in GermaNet with the semantic sorts in HPSG. The evaluation result shows that this approach is useful for the lexical extension of our deep grammar development to cope with real-world text understanding.
Particles fullfill several distinct central roles in the Japanese language. They can mark arguments as well as adjuncts, can be functional or have semantic functions. There is, however, no straightforward matching from particles to functions, as, e.g., 'ga' can mark the subject, the object or the adjunct of a sentence. Particles can cooccur. Verbal arguments that could be identified by particles can be eliminated in the Japanese sentence. And finally, in spoken language particles are often omitted. A proper treatment of particles is thus necessary to make an analysis of Japanese sentences possible. Our treatment is based on an empirical investigation of 800 dialogues. We set up a type hierarchy of particles motivated by their subcategorizational and modificational behaviour. This type hierarchy is part of the Japanese syntax in VERBMOBIL.
Sprachtechnologie für übersetzungsgerechtes Schreiben am Beispiel Deutsch, Englisch, Japanisch
(2009)
Wir [...] haben uns zur Aufgabe gesetzt, Wege zu finden, wie linguistisch basierte Software den Prozess des Schreibens technischer Dokumentation unterstützen kann. Dabei haben wir einerseits die Schwierigkeiten im Blick, die japanische und deutsche Autoren (und andere Nicht-Muttersprachler des Englischen) beim Schreiben englischer Texte haben. Besonders japanische Autoren haben mit Schwierigkeiten zu kämpfen, weil sie hochkomplexe Ideen in einer Sprache ausdrücken müssen, die von Informationsstandpunkt her sehr unterschiedlich zu ihrer Muttersprache ist. Andererseits untersuchen wir technische Dokumentation, die von Autoren in ihrer Muttersprache geschrieben wird. Obwohl hier die fremdsprachliche Komponente entfällt, ist doch auch erhebliches Verbesserungspotential vorhanden. Das Ziel ist hier, Dokumente verständlich, konsistent und übersetzungsgerecht zu schreiben. Der fundamentale Ansatz in der Entwicklung linguistisch-basierter Software ist, dass gute linguistische Software auf Datenmaterial basiert und sich an den konkreten Zielen der besseren Dokumentation orientiert.
Preferences and defaults for definiteness and number in japanese to german machine translation
(1996)
A significant problem when translating Japanese dialogues into German is the missing information on number and definiteness in the Japanese analysis output. The integration of the search for such information into the transfer process provides an efficient solution. General transfer includes conditions to make it possible to consider external knowledge. Thereby, grammatical and lexical knowledge of the source language, knowledge of lexical restrictions on the target language, domain knowledge and discourse knowledge are accessible.
Ein einer Äußerung können Nullpronomina aus mehreren [...] Gruppen vorkommen. Die [...] Gruppen können auf die Ebenen eines Schicht-Dialogmodells bezogen werden; andererseits können sie Hinweise geben, welche Informationen in einem Dialogmodell verfügbar sein sollten. Dies wird in der Folgezeit genauer zu untersuchen sein. Im folgenden werden die genannten Typen von Nullpronomina genauer dargestellt und Lösungsverfahren zum Auffinden der Referenten genannt.
Die Entwicklung eines individuellen Standards „vom grünen Tisch“ führt selten zu zufriedenstellenden Ergebnissen. Bei der automatischen Prüfung stellt man schnell fest, dass die „ausgedachten“ Regeln einer systematischen Anwendung nicht standhalten. Bei der Implementierung solcher Richtlinien stellt man fest, dass sie oft zu wenig konkret formuliert sind, wie z.B. „formulieren Sie Handlungsanweisungen knapp und präzise“. Wie jedoch kann ein Standard entwickelt werden, der zu einem Unternehmen, seiner Branche und Zielgruppen passt und für die automatische Prüfung implementiert werden kann? Sprachtechnologie hilft effizient bei der Entwicklung individueller Richtlinien. Durch Datenanalyse, Satzcluster und Parametrisierung entsteht ein textspezifischer individueller Standard. Ist damit aber der Gegensatz von Kreativität und Standardisierung aufgehoben?
Die Domäne in VERBMOBIL sind Terminaushandlungsdialoge. Für die Syntax bedeutet das zunächst, daß die Sytnax sich an gesprochener Sprache orientieren muß. Das beinhaltet Nullanaphern, Phrasen, die auf die Kommunikationssituation bezogen sind und Phrasen, die für geschriebene Sprache als nicht wohlgeformt bezeichnet werden. Weitergehend gibt es einige domänenspezifische syntaktische besonderheiten, wie zum Biepsiel die Realisierung von Zeitangaben.
A comprehensive investigation of Japanese particle was missing up to now. General implications were set up without the fact that a comprehensive analysis was carried out. [...] We offer a lexicalist treatment of the problem. Instead of assuming different phrase structure rules we state a type hierarchy of Japanese particles. This makes a uniform treatment of phrase structure as well as a differentiation of subcategorization patterns possible.
We present a solution for the representation of Japanese honorifical information in the HPSG framework. Basically, there are three dimensions of honorification. We show that a treatment is necessary that involves both the syntactic and the contextual level of information. The japanese grammar is part of a machine translation system.
The research performed in the DeepThought project aims at demonstrating the potential of deep linguistic processing if combined with shallow methods for robustness. Classical information retrieval is extended by high precision concept indexing and relation detection. On the basis of this approach, the feasibility of three ambitious applications will be demonstrated, namely: precise information extraction for business intelligence; email response management for customer relationship management; creativity support for document production and collective brainstorming. Common to these applications, and the basis for their development is the XML-based, RMRS-enabled core architecture framework that will be described in detail in this paper. The framework is not limited to the applications envisaged in the DeepThought project, but can also be employed e.g. to generate and make use of XML standoff annotation of documents and linguistic corpora, and in general for a wide range of NLP-based applications and research purposes.
In this paper we describe SOBA, a sub-component of the SmartWeb multi-modal dialog system. SOBA is a component for ontologybased information extraction from soccer web pages for automatic population of a knowledge base that can be used for domainspecific question answering. SOBA realizes a tight connection between the ontology, knowledge base and the information extraction component. The originality of SOBA is in the fact that it extracts information from heterogeneous sources such as tabular structures, text and image captions in a semantically integrated way. In particular, it stores extracted information in a knowledge base, and in turn uses the knowledge base to interpret and link newly extracted information with respect to already existing entities.
This demo abstract describes the SmartWeb Ontology-based Information Extraction System (SOBIE). A key feature of SOBIE is that all information is extracted and stored with respect to the SmartWeb ontology. In this way, other components of the systems, which use the same ontology, can access this information in a straightforward way. We will show how information extracted by SOBIE is visualized within its original context, thus enhancing the browsing experience of the end user.
Dialogue acts in Verbmobil 2
(1998)
This report describes the dialogue phases and the second edition dialogue acts which are used in the VERBMOBIL 2 project [...]. While in the first project phase the scenario was restricted to appointment scheduling dialogues, it has been extended to travel planning in the second phase with appointment scheduling being only a part of the new scenario.
The Deep Linguistic Processing with HPSG Initiative (DELH-IN) provides the infrastructure needed to produce open-source semantic transfer-based machine translation systems. We have made available a prototype Japanese-English machine translation system built from existing resources include parsers, generators, bidirectional grammars and a transfer engine.
While the sortal constraints associated with Japanese numeral classifiers are well-studied, less attention has been paid to the details of their syntax. We describe an analysis implemented within a broad-coverage HPSG that handles an intricate set of numeral classifier construction types and compositionally relates each to an appropriate semantic representation, using Minimal Recursion Semantics.
While the sortal constraints associated with Japanese numeral classifiers are wellstudied, less attention has been paid to the details of their syntax. We describe an analysis implemented within a broadcoverage HPSG that handles an intricate set of numeral classifier construction types and compositionally relates each to an appropriate semantic representation, using Minimal Recursion Semantics.
Standardisierung ist der bedeutendste Ansatz zu Qualitätssteigerung und Kostensenkung in der Technischen Dokumentation. Es gibt eine Reihe von Standardisierungsansätzen: Modularisierung, Informationsstrukturen, Terminologie, Sprachstrukturen. Dennoch werden diese Ebenen meist getrennt voneinander beschrieben. Wir untersuchen, wie Standardisierungen im Informationsmodell, in der Terminologie und in den sprachlichen Strukturen verknüpft werden und miteinander interagieren.
Der Übersetzungsprozess der Technischen Dokumentation wird zunehmend mit Maschineller Übersetzung (MÜ) unterstützt. Wir blicken zunächst auf die Ausgangstexte und erstellen automatisch prüfbare Regeln, mit denen diese Texte so editiert werden können, dass sie optimale Ergebnisse in der MÜ liefern. Diese Regeln basieren auf Forschungsergebnissen zur Übersetzbarkeit, auf Forschungsergebnissen zu Translation Mismatches in der MÜ und auf Experimenten.
We present an effort for the development of multilingual named entity grammars in a unification-based finite-state formalism (SProUT). Following an extended version of the MUC7 standard, we have developed Named Entity Recognition grammars for German, Chinese, Japanese, French, Spanish, English, and Czech. The grammars recognize person names, organizations, geographical locations, currency, time and date expressions. Subgrammars and gazetteers are shared as much as possible for the grammars of the different languages. Multilingual corpora from the business domain are used for grammar development and evaluation. The annotation format (named entity and other linguistic information) is described. We present an evaluation tool which provides detailed statistics and diagnostics, allows for partial matching of annotations, and supports user-defined mappings between different annotation and grammar output formats.
Pokazatelji brojivosti
(2007)
U radu se analizira drugi cjeloviti objavljeni prijevod Svetoga pisma na hrvatski jezik, Škarićevo Sveto pismo Staroga i Novoga uvita (Beč, 1858. – 1861.); opisuju se njegove jezične osobine, utvrđuje se njegovo mjesto u dugoj hrvatskoj svetopisamskoj prevodilačkoj tradiciji te njegov utjecaj na proces standardizacije hrvatskoga jezika.
U ovome se radu pokušava dati pregled mnogobrojnih i raznolikih odraza svetačkog imena Juraj u hrvatskome antroponimijskom sustavu s osobitim naglaskom na područje Zažablja (prostora između rječice Misline, istočno od Metkovića, i zapadnih granica nekadašnje Dubrovačke Republike, a danas općine Dubrovačko primorje, te prostora od Hrasna na sjeveru do Neuma na jugu) i Popova (jugozapadne Hercegovine). Na temelju odabrane literature i autorova terenskog istraživanja nastoje se iznijeti i neke izvanjezične (poglavito povijesne i sociolingvističke) činjenice koje su uzrok takvu stanju.
U radu se analizira sintaktička funkcija participa u hrvatskome jeziku 15./16. st. jer su se otprilike u to vrijeme u sintaktičkom ustrojstvu (staro)hrvatskoga jezika događale vrlo krupne jezične promjene, koje su posljedica “departicipijalizacije” participa, tj. preobrazbe naslijeđenih participnih oblika u glagolske priloge.
U radu se analizira uloga jednog tipa referencijalnih izraza – anaforičkih izraza – u diskurzivnom oblikovanju odabranog medijsko-znanstvenog događaja (“uskrsnuće” bakterije Deinococcus radiodurans). Predlaže se transverzalna analiza anaforičkih izraza utemeljena na modularnom pristupu kompleksnosti organizacije diskursa i na dinamičnoj koncepciji anaforičke referencije, shvaćene kao segment šireg procesa konceptualnog strukturiranja svijeta diskursa i usuglašavanja mentalnih predodžbi sudionika u interakciji.
Predmet ovog rada su kajkavizmi u Tkonskom zborniku – glagoljskom rukopisu koji je početkom 16. stoljeća pisan na frankopanskim posjedima. Utvrđeno je da su u tom rukopisu prisutni kajkavizmi na svim razinama: fonološkoj, morfološkoj, leksičkoj i sintaktičkoj. Najviše je kajkavizama na leksičkoj razini, a oni se mogu podijeliti u dvije skupine: 1. zajednički čakavsko- kajkavski sloj, npr. betegь, gdo, nigdar, hiniti, hud, kaštigati, lotar itd.; 2. kajkavski sloj, npr. fajtati, gorup, nekoteri, pokrivača, škoda, špotati, tanac itd. Prva je kategorija leksema interpolirana u gotovo svim dijelovima CTk, a druga je najčešća u Cvetu od kreposti i Muci. Tkonski zbornik čuva jedno ogromno leksičko bogatstvo, a pri usporedbi pojedinih leksema s onima u hrvatskoglagoljskim misalima i brevijarima, zaključeno je da su neki od njih potvrđeni i ranije, npr. betegь, kaštigati, praviti, gorup, tanac itd. To je potvrda o kontinuitetu hrvatskoglagoljske književnosti. Interpolacija kajkavizama nije ujednačena u svim dijelovima zbornika, kajkavske su intervencije najčešće u Cvetu od kreposti (f. 67 – 85) i u Muci Spasitelja našega (f. 109 – 161). Na temelju provedenog istraživanja može se zaključiti da je Tkonski zbornik rukopis sastavljen iz različitih dijelova, koji nisu nastali u istom razdoblju, ni na istom mjestu. Budući da kajkavizme u pojedinim dijelovima nalazimo na svim razinama (Cvet od kreposti i Muka), može se pretpostaviti da su oni nastali u sjevernom području, tj. bliže kajkavskom.
U radu se iznose tzv. lažni parovi (prijatelji), leksemi u hrvatskom i rumunjskom jeziku koji zbog svoje izrazne podudarnosti navode na pogrešno prevođenje. Navode se značajke koje su dovele do takvih pojava. S obzirom na podrijetlo, najčešće je riječ o leksemima naslijeđenima iz latinskoga jezika ili kasnijim romanizmima te dakako slavenskima, kojih je u rumunjskome nezanemariv broj. Izdvojeni se leksemi razvrstavaju u tablicu koja omogućuje njihovu prozirniju usporedbu i lakše prepoznavanje.
Govorni se činovi najlakše prepoznaju i razgraničuju u dijalogu pa su dramski tekstovi vrlo pogodni za analizu i propitivanje teorije govornih činova. Krležinoj drami U agoniji možemo pristupiti kao korpusu za oprimjerenje konstativnoga i performativnoga shvaćanja jezika. U toj se drami sukob doista gradi na oprečnome shvaćanju jezika, a to se i verbalno eksplicira, pa se drama odvija na svojevrsnoj metajezičnoj razini gdje se glavni karakteri “svađaju” zato što govore različitim jezicima. Govorni činovi u drami, posebice komplimenti, analizirani su i s aspekta feminističke lingvistike.
U radu se obrađuju načini tvorbe pridjeva, priloga, prijedloga, zamjenica i veznika na primjerima iz Tadijanovićeva djela „Svašta po malo“. Posebno se upozorava na tipove tvorba koji su neobični zbog značenja koje ima tvorenica, na tvorbu neuobičajenih tvorenica prema već postojećim modelima, na različite pristupe i tumačenja u određivanju tvorbenih načina te na odnos motiviranih i nemotiviranih riječi sa stajališta povijesne i suvremene tvorbe. Analizirani se primjeri uspoređuju s potvrdama iz „Rječnika hrvatskoga ili srpskoga jezika JAZU“.
U članku se opisuje morfonologija glagolske osnove u prezentskoj paradigmi na građi hrvatsko-crkvenoslavenskih (dalje: HCS) glagola s temeljnom osnovom na -i- u kojih tom završnom -i- prethodi zubni sonant: r, l, n (tj. tipa tvori-ti, moli-ti, brani-ti). U obzir su uzeti svi glagolski leksemi tog tipa iz kartoteke Rječnika crkvenoslavenskoga jezika hrvatske redakcije: 110 li-glagola, 127 ni-glagola i 83 ri-glagola i njihovi prezentski oblici. Metoda opisa je usporedba dotičnog fragmenta HCS gramatike sa staroslavenskim stanjem kao i sa stanjem u starohrvatskim (čakavskim) govorima. U staroslavenskom jeziku u prezentskoj je paradigmi tihglagola osnova okrnjena (tj. okrnjen je sufiks -i-) i pojavljuje se u dvije varijante: palatalnoj (u 1. licu jednine), i tvrdoj (u svim ostalim oblicima). Tako u prezentu nalazimo u osnovi alternacije r ~ ŕ, l ~ ĺ i n ~ ń. U HCS tekstovima morfonološki su najinovativniji ri-glagoli. Kako je u hrvatskom depalataliziran fonem ŕ, kod ri-glagola nije sačuvan staroslavenski morfonološki model. HCS građa ne pokazuje staroslavensku alternaciju r ~ ŕ, tj. kod ri-glagola nema variranja osnove u prezentu (okrnjena osnova u svim oblicima završava nepalatalnim suglasnikom). Kod li-glagola i ni-glagola staroslavenski je morfonološki model očuvan. Međutim, u tekstovima su ipak potvrđene rijetke devijacije od tog modela. Naime, usprkos postojanju grafijskoga sredstva za označavanje palatalnosti fonema ĺ i ń ispred gramatičkog morfema 1. lica jednine -u (tj. uporaba slova ű iza l, n), neki su pisari u rijetkim slučajevima izostavljali označavanje palatalnosti, tj. pisali grafem u (molu, branu). Autorica predlaže različita moguća objašnjenja te pogreške i utvrđuje u kojoj je mjeri ta pojava ograničena na određene HCS tekstove.