Linguistik
Refine
Year of publication
Document Type
- Article (186)
- Preprint (69)
- Part of a Book (65)
- Working Paper (40)
- Conference Proceeding (33)
- Book (24)
- Review (12)
- Part of Periodical (7)
- Course Material (1)
- Report (1)
Language
- Croatian (150)
- English (141)
- German (120)
- Portuguese (9)
- Turkish (7)
- mis (4)
- French (3)
- Italian (2)
- Multiple languages (1)
- Spanish (1)
Has Fulltext
- yes (438) (remove)
Is part of the Bibliography
- no (438)
Keywords
- Kroatisch (50)
- Linguistik (50)
- Rezension (48)
- Deutsch (35)
- Computerlinguistik (32)
- Syntax (19)
- Japanisch (18)
- Grammatik (17)
- Namenkunde (17)
- Rezensionen (17)
Institute
- Extern (438) (remove)
U ovome su radu obradena 232 obiteljska nadimka u Puciscima na otoku Bracu. Obiteljski su nadimci, kao dodatan vid identifikacije koji se razvio još u pretprezimenskome razdoblju, a kasnije je sve zastupljeniji zbog brojnosti nositelja pojedinih prezimena, svojevrsni specifikum hrvatskih otoka koji dosad nije dostatno proucen. U Puciscima se obiteljski nadimci bilježe od konca 16. st. te se na temelju njihove motivacije može djelomicno rekonstruirati fond osobnih imena (odnos hrvatskih narodnih imena te hrvatskih i novijih romanskih prilagodenica kršcanskih imena), vanjština (posebice tjelesne mane), karakterne crte (uglavnom nekonvencionalne) te podrijetlo i svakodnevni život Puciscana. Fond je obiteljskih nadimaka znatno otvoreniji inojezicnim sustavima (poglavito romanskim) te je odraz svojevrsne tisucljetne hrvatsko-romanske simbioze na istocnoj obali Jadranskoga mora.
U ovome se radu nastoji dati pregled mnogobrojnih i raznolikih odraza svetačkog imena Ivan u hrvatskome antroponimijskom fondu s osobitim naglaskom na područje južne Dalmacije (uključujući Boku kotorsku) i Donje Hercegovine. U uvodnome se dijelu rada donose odrazi hebrejskoga muškog osobnog imena Jehochánán u raznim indoeuropskim i neindoeuropskim jezicima, potom se tumači postanje hrvatskoga svetačkog imena Ivan i njegovi odrazi u hrvatskome antroponimijskom fondu s posebnim naglaskom na sličnosti i razlike s antroponimijskim fondovima bliskih južnoslavenskih jezika.
U ovome se radu pokušava dati pregled mnogobrojnih i raznolikih odraza svetačkog imena Juraj u hrvatskome antroponimijskom sustavu s osobitim naglaskom na područje Zažablja (prostora između rječice Misline, istočno od Metkovića, i zapadnih granica nekadašnje Dubrovačke Republike, a danas općine Dubrovačko primorje, te prostora od Hrasna na sjeveru do Neuma na jugu) i Popova (jugozapadne Hercegovine). Na temelju odabrane literature i autorova terenskog istraživanja nastoje se iznijeti i neke izvanjezične (poglavito povijesne i sociolingvističke) činjenice koje su uzrok takvu stanju.
U ovome se radu na temelju terenskog istraživanja obrađuje toponimija danas gotovo posve napuštenoga sela Dubljani u Popovu u istočnoj Hercegovini. U mjesnoj su toponimiji najzastupljeniji toponimi antroponimnoga postanja s pomoću kojih se upoznajemo s negdašnjim i današnjim imovinsko-pravnim ustrojem srednjovjekovnog Huma, toponim Satùlija (‘Sanctus Elias’) spomen je na davne romansko-hrvatske dodire, a na primjeru toponima Sačìvišće upoznajemo se s veoma složenom dijalektnom slikom istočne Hercegovine.
In the past, a divide could be seen between ’deep’ parsers on the one hand, which construct a semantic representation out of their input, but usually have significant coverage problems, and more robust parsers on the other hand, which are usually based on a (statistical) model derived from a treebank and have larger coverage, but leave the problem of semantic interpretation to the user. More recently, approaches have emerged that combine the robustness of datadriven (statistical) models with more detailed linguistic interpretation such that the output could be used for deeper semantic analysis. Cahill et al. (2002) use a PCFG-based parsing model in combination with a set of principles and heuristics to derive functional (f-)structures of Lexical-Functional Grammar (LFG). They show that the derived functional structures have a better quality than those generated by a parser based on a state-of-the-art hand-crafted LFG grammar. Advocates of Dependency Grammar usually point out that dependencies already are a semantically meaningful representation (cf. Menzel, 2003). However, parsers based on dependency grammar normally create underspecified representations with respect to certain phenomena such as coordination, apposition and control structures. In these areas they are too "shallow" to be directly used for semantic interpretation. In this paper, we adopt a similar approach to Cahill et al. (2002) using a dependency-based analysis to derive functional structure, and demonstrate the feasibility of this approach using German data. A major focus of our discussion is on the treatment of coordination and other potentially underspecified structures of the dependency data input. F-structure is one of the two core levels of syntactic representation in LFG (Bresnan, 2001). Independently of surface order, it encodes abstract syntactic functions that constitute predicate argument structure and other dependency relations such as subject, predicate, adjunct, but also further semantic information such as the semantic type of an adjunct (e.g. directional). Normally f-structure is captured as a recursive attribute value matrix, which is isomorphic to a directed graph representation. Figure 5 depicts an example target f-structure. As mentioned earlier, these deeper-level dependency relations can be used to construct logical forms as in the approaches of van Genabith and Crouch (1996), who construct underspecified discourse representations (UDRSs), and Spreyer and Frank (2005), who have robust minimal recursion semantics (RMRS) as their target representation. We therefore think that f-structures are a suitable target representation for automatic syntactic analysis in a larger pipeline of mapping text to interpretation. In this paper, we report on the conversion from dependency structures to fstructure. Firstly, we evaluate the f-structure conversion in isolation, starting from hand-corrected dependencies based on the TüBa-D/Z treebank and Versley (2005)´s conversion. Secondly, we start from tokenized text to evaluate the combined process of automatic parsing (using Foth and Menzel (2006)´s parser) and f-structure conversion. As a test set, we randomly selected 100 sentences from TüBa-D/Z which we annotated using a scheme very close to that of the TiGer Dependency Bank (Forst et al., 2004). In the next section, we sketch dependency analysis, the underlying theory of our input representations, and introduce four different representations of coordination. We also describe Weighted Constraint Dependency Grammar (WCDG), the dependency parsing formalism that we use in our experiments. Section 3 characterises the conversion of dependencies to f-structures. Our evaluation is presented in section 4, and finally, section 5 summarises our results and gives an overview of problems remaining to be solved.
In this paper, we investigate the usefulness of a wide range of features for their usefulness in the resolution of nominal coreference, both as hard constraints (i.e. completely removing elements from the list of possible candidates) as well as soft constraints (where a cumulation of violations of soft constraints will make it less likely that a candidate is chosen as the antecedent). We present a state of the art system based on such constraints and weights estimated with a maximum entropy model, using lexical information to resolve cases of coreferent bridging.
When a statistical parser is trained on one treebank, one usually tests it on another portion of the same treebank, partly due to the fact that a comparable annotation format is needed for testing. But the user of a parser may not be interested in parsing sentences from the same newspaper all over, or even wants syntactic annotations for a slightly different text type. Gildea (2001) for instance found that a parser trained on the WSJ portion of the Penn Treebank performs less well on the Brown corpus (the subset that is available in the PTB bracketing format) than a parser that has been trained only on the Brown corpus, although the latter one has only half as many sentences as the former. Additionally, a parser trained on both the WSJ and Brown corpora performs less well on the Brown corpus than on the WSJ one. This leads us to the following questions that we would like to address in this paper: - Is there a difference in usefulness of techniques that are used to improve parser performance between the same-corpus and the different-corpus case? - Are different types of parsers (rule-based and statistical) equally sensitive to corpus variation? To achieve this, we compared the quality of the parses of a hand-crafted constraint-based parser and a statistical PCFG-based parser that was trained on a treebank of German newspaper text.
We investigate methods to improve the recall in coreference resolution by also trying to resolve those definite descriptions where no earlier mention of the referent shares the same lexical head (coreferent bridging). The problem, which is notably harder than identifying coreference relations among mentions which have the same lexical head, has been tackled with several rather different approaches, and we attempt to provide a meaningful classification along with a quantitative comparison. Based on the different merits of the methods, we discuss possibilities to improve them and show how they can be effectively combined.
Using a qualitative analysis of disagreements from a referentially annotated newspaper corpus, we show that, in coreference annotation, vague referents are prone to greater disagreement. We show how potentially problematic cases can be dealt with in a way that is practical even for larger-scale annotation, considering a real-world example from newspaper text.
In this paper, we argue that difficulties in the definition of coreference itself contribute to lower inter-annotator agreement in certain cases. Data from a large referentially annotated corpus serves to corroborate this point, using a quantitative investigation to assess which effects or problems are likely to be the most prominent. Several examples where such problems occur are discussed in more detail, and we then propose a generalisation of Poesio, Reyle and Stevenson’s Justified Sloppiness Hypothesis to provide a unified model for these cases of disagreement and argue that a deeper understanding of the phenomena involved allows to tackle problematic cases in a more principled fashion than would be possible using only pre-theoretic intuitions.
Distributional approximations to lexical semantics are very useful not only in helping the creation of lexical semantic resources (Kilgariff et al., 2004; Snow et al., 2006), but also when directly applied in tasks that can benefit from large-coverage semantic knowledge such as coreference resolution (Poesio et al., 1998; Gasperin and Vieira, 2004; Versley, 2007), word sense disambiguation (Mc- Carthy et al., 2004) or semantical role labeling (Gordon and Swanson, 2007). We present a model that is built from Webbased corpora using both shallow patterns for grammatical and semantic relations and a window-based approach, using singular value decomposition to decorrelate the feature space which is otherwise too heavily influenced by the skewed topic distribution of Web corpora.
We adopt Markert and Nissim (2005)’s approach of using the World Wide Web to resolve cases of coreferent bridging for German and discuss the strength and weaknesses of this approach. As the general approach of using surface patterns to get information on ontological relations between lexical items has only been tried on English, it is also interesting to see whether the approach works for German as well as it does for English and what differences between these languages need to be accounted for. We also present a novel approach for combining several patterns that yields an ensemble that outperforms the best-performing single patterns in terms of both precision and recall.
This paper aims to determine and classify by syntactic criteria, the functions of reflexivity (reflexive pronoun kendi) in Turkish, in contrast to German.
Reflexivity in Turkish can be expressed by synthetic elements such as affixes, but also by an analytical element – the reflexive pronoun kendi. And in German it is formed by the reflexive pronoun sich. The reflexive pronoun sich in German used both in anaphorical and lexical functions, which can be distinguished from each other by certain criteria.
Hybrid robust deep and shallow semantic processing for creativity support in document production
(2004)
The research performed in the DeepThought project (http://www.project-deepthought.net) aims at demonstrating the potential of deep linguistic processing if added to existing shallow methods that ensure robustness. Classical information retrieval is extended by high precision concept indexing and relation detection. We use this approach to demonstrate the feasibility of three ambitious applications, one of which is a tool for creativity support in document production and collective brainstorming. This application is described in detail in this paper. Common to all three applications, and the basis for their development is a platform for integrated linguistic processing. This platform is based on a generic software architecture that combines multiple NLP components and on robust minimal recursive semantics (RMRS) as a uniform representation language.
Im ersten Teil wird zunächst die wenige Forschungsliteratur zum Thema Deskriptivität selbst und eng verwandten Themen vorgestellt und besprochen. Daraus soll sich im Anschluss auch eine Definition des Begriffes ergeben, die weit genug gefasst ist, um die übliche Verwendungsweise des Begriffs bei Autoren, die ihn zwar benutzen, aber nicht theoretisch behandeln, zu erfassen, die sich aber andererseits dennoch in klar definierten und nachvollziehbaren Grenzen bewegt. Dabei soll weiterhin deutlich werden, dass es sich bei Deskriptivität um ein prinzipiell in allen Sprachen anzutreffendes Phänomen handelt, dass sich aber die Frequenz deskriptiver Ausdrücke von Sprache zu Sprache stark unterscheiden kann. Dabei werde ich Daten aus ausgewählten Sprachen einbeziehen und eine quantitative Analyse des Ausmaßes, mit dem verschiedene Sprachen von deskriptiven Bildungen Gebrauch machen vorstellen. Der zweite Hauptteil der Arbeit beschäftigt sich mit folgender Frage: Wenn jede Sprache zu einem gewissen Grad von deskriptiven Benennungen Gebrauch macht, welche Mechanismen des Sprachwandels gibt es, die die Position einer Sprache auf dieser Skala in die eine oder die andere Richtung verändern können?
Wenn wie im Falle des Instituts für Angewandte Linguistik und Translatologie der Universität Leipzig eine mehr als zehnjährige Germanistische Institutspartnerschaft mit gleich zwei russischen Partnern – den Übersetzer-Fakultäten der Linguistischen Universitäten Moskau und Pjatigorsk – nunmehr ihren Abschluss findet, so bietet es sich natürlich an zu fragen, was die GIP-Langzeitkooperation beiden Seiten an messbaren wissenschaftlichen, wissenschaftsmethodischen und curricularen Ergebnissen, an „Zuwächsen“ im Sinne der Nachwuchsförderung, des Austauschs von Dozenten und Studierenden gebracht hat. Die Bilanz – von uns dargelegt im Jubiläumsband 52 der Dokumente & Materialien des Deutschen Akademischen Austausch Dienstes – kann sich durchaus sehen lassen und rechtfertigt nicht nur die aufgewandten Mittel, sondern auch die kontinuierliche Arbeit, den nachhaltigen Einsatz und die vielfältigen Initiativen der zahlreichen Beteiligten auf beiden Seiten.
Transforming constituent-based annotation into dependency-based annotation has been shown to work for different treebanks and annotation schemes (e.g. Lin (1995) has transformed the Penn treebank, and Kübler and Telljohann (2002) the Tübinger Baumbank des Deutschen (TüBa-D/Z)). These ventures are usually triggered by the conflict between theory-neutral annotation, that targets most needs of a wider audience, and theory-specific annotation, that provides more fine-grained information for a smaller audience. As a compromise, it has been pointed out that treebanks can be designed to support more than one theory from the start (Nivre, 2003). We argue that information can also be added to an existing annotation scheme so that it supports additional theory-specific annotations. We also argue that such a transformation is useful for improving and extending the original annotation scheme with respect to both ambiguous annotation and annotation errors. We show this by analysing problems that arise when generating dependency information from the constituent-based TüBa-D/Z.
Deutsch im Kreis Schanfigg
(2012)
In dieser Arbeit wird unter Schanfigg nach Kessler "Schanfigg im weitern Sinne" verstanden, d.h. die Dörfer des politischen Kreises Schanfigg [...]. Da Dialekte im Gegensatz zu Hochsprachen nicht-normierte Sprachvarietäten darstellen, zeichnen sich die Ortsgrammatiken durch eine jeweils enorme Formenvielfalt in lautlicher und in morphologischer Hinsicht aus. Dies war denn auch eines der Ziele der Untersuchung: Mit Hilfe der Prager Phonologie und der auf ihr beruhenden Morphologie sollte aufgezeigt werden, wie groß die allophonische und allomorphische Bandbreite ist, derer sich die Sprecher im Gespräch unbewußt bedienen. Sehr schön läßt sich dies anhand der Verbalmorphologie bei den unregelmäßigen Verben (Kurzverben) aufzeigen. Ein weiteres Ziel der Untersuchung war es, die Stellung der Ortsdialekte des Schanfiggs und ihres Gesamts, also das Schanfigger Diasystem, innerhalb der dem Schanfigg benachbarten Mundarten darzustellen. Idealerweise hätten das Prättigau, das Churwaldner Tal und die Churer bzw. Churerrheintaler Mundarten herangezogen werden müssen. Da aber leider keine Untersuchungen zu den Verhältnissen im Prättigau und im Churwaldner Tal vorhanden sind, wurden die Schanfigger Verhältnisse mit denjenigen der Stadt Chur (vgl. Eckhardt 1991) und des Deutschen im Bezirk Imboden (vgl. Toth und Ebneter 1996) verglichen.
Neugriechische Wortbildung
(1988)
Ziel dieser Arbeit ist es, einen Überblick über das ngr. Wortbildungssystem zu geben. und zugleich die wichtigsten Probleme, die mit der Abgrenzung der ,verschiedenen Wortbildungsverfahren voneinander im NGR. zusammenhängen, so weit wie möglich zu behandeln. Die Arbeit ist in drei Hauptteile gegliedert: der erste Teil (Kap. 2 und 3) ist allgemeinen Problemen gewidmet; die sich auf die Abgrenzung des Bereichs der Wortbildung von der Flexion sowie auf die wichtigsten Aspekte der Wortstruktur im NGR. beziehen. In den beiden .anderen Teilen (Kap. 4 und 5) werden die Wortbildungsverfahren der Ableitung und der Komposition im Bereich des Nomens und im Bereich des Verbs diskutiert. Eine ausführliche Darstellung der Präfixbildung im NGR. ist im Rahmen dieser Arbeit nicht möglich; jedoch werden die Probleme, die mit der Abgrenzung von Präfixbildungen und Komposita zusammenhängen, in Kap. 5.1 kurz besprochen. Besondere Arten der Wortbildung wie z.B. Akronymie, (Wort)Kürzung, "blending" werden nicht behandelt.
The purpose of this paper is to describe the TüBa-D/Z treebank of written German and to compare it to the independently developed TIGER treebank (Brants et al., 2002). Both treebanks, TIGER and TüBa-D/Z, use an annotation framework that is based on phrase structure grammar and that is enhanced by a level of predicate-argument structure. The comparison between the annotation schemes of the two treebanks focuses on the different treatments of free word order and discontinuous constituents in German as well as on differences in phrase-internal annotation.
Das ausgehende 19. und beginnende 20. Jahrhundert setzt sich von den erkenntnistheoretischen Konzepten der vorangegangenen Zeit deutlich ab:Während – stark vereinfacht – die Philosophie bis dahin die Möglichkeit der Erkenntnis entweder in der subjektiven oder objektiven Dimension zu finden glaubte,wobei die Funktion der Sprache im Erkenntnisprozess kaum hinterfragt wurde, wird zur Jahrhundertwende eine Tendenz deutlich, die einerseits die Adäquatheit der sprachlichen Vermittlung entweder in Frage stellt oder zumindest thematisiert, andererseits die tradierten Erkenntnismodi neu reflektiert oder ihnen sogar den Rücken kehrt.
U ovome se članku obrađuju posuđenice mletačkoga podrijetla u sjevernočakavskom govoru Boljuna u sjeveroistočnoj Istri. Cilj rada bio je etimološki obraditi pridjeve i imenice iz semantičke domene karakternih osobina koji nisu bili uvršteni u Skokov Etimologijski rječnik ni u Vinjine Jadranske etimologije. Polazišna građa ekscerpirana je iz rukopisnoga Rječnika boljunskih govora Ivana Francetića, provjerena je na terenu te je etimološkom i leksičkom analizom dovedena u vezu s istromletačkim, venecijanskim, tršćanskim i talijanskim (etymologia proxima) te s latinskim ili drugim etimonom (etymologia remota), a na sinkronijskoj i dijatopijskoj razini s rječničkim potvrdama u ostalim čakavskim govorima Istre, Kvarnera i Dalmacije.
[D]ie polnischen Familiennamen [unterlagen] bis ins 19. Jahrhundert hinein nur geringer amtlicher Kontrolle [...]. Diese Situation begünstigte den sukzessiven Aufbau onymischer Allomorphik aus den […] Flexions- und Derivationsmorphemen, die ursprünglich zur Bildung von Herkunftsbezeichnungen, Patronymika und Übernamen angewendet wurden. Die sekundäre Nutzung dieser Flexions- und Wortbildungsmorpheme als onymische Suffixe trieb den […] Dissoziationsprozess der Familiennamen voran. Die wachsende Produktivität dieser onymischen Morphe, die bis heute andauert, sicherte ihnen die Spitzenposition unter den Proprialitätsmarkern im polnischen Familiennamensystem. Heute sind die onymischen Allomorphe -ska, -ski, -icz, -ak das wichtigste Mittel, mit dem die Zugehörigkeit eines Wortes zum Onomastikon gekennzeichnet wird. […] In diesem Beitrag werden die Entstehungswege und die Ausbreitungspfade der drei produktivsten Gruppen der polnischen onymischen Suffixe präsentiert. Es werden auch die außersprachlichen Faktoren berücksichtigt, die die Erhöhung der Produktivität durch sukzessive Erweiterung der Kombinationsmöglichkeiten der einzelnen Suffixe ermöglicht haben. Es wird gezeigt, dass die ursprünglichen Selektionsbeschränkungen der Basen mit den Suffixen (Toponyme + -ska-Suffixe, Appellative und Adjektive + k-haltige Suffixe, Vornamen + -icz-Suffixe) im Zuge ihrer Ausbreitung und Festigung aufgegeben wurden. Die onymischen Allomorphe sind heute frei kombinierbar und können im Falle des Namenwechsels zur Bildung eines neuen Namens herangezogen werden.
Das hethitische Phonem /xw/
(2014)
In the Hittite phonological system there was a labialized velar fricative /xw/ beside the plain velar fricative /x/ parallel to the opposition between the velar stops /kw/ and /k/. The frequent syllable /xwa/ was spelled either hu-(u) or hu-wa. Evidence from the frequency of words with initial hu in the lexicon, from spelling variations and from ablaut alternations is presented to demonstrate the existence of /xw/. It is suggested that Hittite /xw/ regularly corresponds to the reflexes of *w in the non-Anatolian Indo-European languages.
To reach even language users not acquainted to the use of grammars the Institut für Deutsche Sprache in Mannheim (Germany) looked for new way to handle grammatical problems. Instead of confronting users with abstractions frequent difficulties of German grammar are introduced in form of exemplary questions like „Which form should be used or preferred: Anfang dieses Jahre or Anfang diesen Jahres?” Looking through the long list of such questions even laymen may find solutions of grammatical problems they might not be able to formulate as such.
Broj njemackih posudenica u hrvatskome jeziku je manji nego što bi se moglo ocekivati, s obzirom na to da je višestoljetna politicka i kulturna povezanost Hrvatske s habsburškom državom uvjetovala izravni dodir njemackoga i hrvatskoga jezika. Razlog je tome jezicna politika koja se svjesno odupirala snažnom utjecaju njemackoga jezika na hrvatski, dajuci u standardnome jeziku prednost hrvatskim rijecima. U supstandardnom jeziku se, medutim, održao veci broj njemackih posudenica, iako za te rijeci postoje hrvatski ekvivalenti. U ovome ce se radu preispitati odnos njemacke posudenice i njezine domace zamjene, tj. u kojoj mjeri je hrvatski ekvivalent uspješna zamjena njemackoj posudenici kao i to o cemu sve ovisi ta uspješnost.
The early acquisition of Greek compounds by two monolingual Greek girls aged between 1;8 and 3;0 years is studied in a usage-based theoretical framework. Special importance is attached to the morphological structure of Greek compound types occurring in child speech and child-directed speech. Greek nominal compound formation does not consist in the mere juxtaposition of words or roots, but involves stems as well as a compound marker. Major questions addressed are the transparency of compounds and productive nominal compound formation. Evidence for productivity of nominal compound formation has been found with only one of the two girls. In contrast to other languages, neoclassical nominal compounds by far exceed endocentric subordinative ones tokenwise in Greek child speech and child-directed speech providing evidence of entrenchment rather than productivity.
In a cross-linguistic comparison it is shown that, in spite of the fact that both Standard Modern Greek and German are rich in nominal compounds, their number is much more limited in Greek than in German child speech. An explanation for this apparent paradox is provided by an onomasiological approach to lexical typology based on a sample list of nominal compounds occurring in German child language and their Greek translational equivalents. It has been found that while use of nominal compounds is common in colloquial German including child-centered situations, it is more typical of Greek formal than colloquial registers.
Children […] growing up with highly inflected languages such as Modern Greek will frequently hear different grammatical forms of a given lexeme used in different grammatical and semantic-pragmatic contexts. In spite of the fact that the Greek noun is not as highly inflected as the verb, acquisition of nominal inflection of this inflecting-fusional language is quite complex, comprising the three categories of case, number, and gender. As is usual in this type of language, the formation of case-number forms obeys different patterns that apply to largely arbitrary classes of nominal lexemes partially based on gender. Further, frequency of the occurrence of the three gender classes and case-number forms of nouns greatly differs in spoken Greek, regarding both the types and tokens. […] [A] child learning an inflecting-fusional language like Greek must construct different inflectional patterns depending not only on parts of speech but also on subclasses within a given part of speech, such as gender classes of nouns and inflectional classes within or (exceptionally) across genders. It is therefore to be expected that the early development of case and number distinctions will apply to specific nouns and subclasses of nouns rather than the totality of Greek nouns. The two main theoretical approaches of morphological development that will be discussed in the present paper are the usage-based approach and the pre- and protomorphology approach.
The two papers included in this volume have developed from work with the CHILDES tools and the Media Editor in the two research projects, "Second language acquisition of German by Russian learners", sponsored by the Max Planck Institute for Psycholinguistics, Nijmegen, from 1998 to 1999 (directed by Ursula Stephany, University of Cologne, and Wolfgang Klein, Max Planck Institute for Psycholinguistics, Nijmegen) and "The age factor in the acquisition of German as a second language", sponsored by the German Science Foundation (DFG), Bonn, since 2000 (directed by Ursula Stephany, University of Cologne, and Christine Dimroth, Max Planck Institute for Psycholinguistics, Nijmegen). The CHILDES Project has been developed and is being continuously improved at Carnegie Mellon University, Pittsburgh, under the supervision of Brian MacWhinney. Having used the CHILDES tools for more than ten years for transcribing and analyzing Greek child data there it was no question that I would also use them for research into the acquisition of German as a second language and analyze the big amount of spontaneous speech gathered from two Russian girls with the help of the CLAN programs. When in the spring of 1997, Steven Gillis from the University of Antwerp (in collaboration with Gert Durieux) developed a lexicon-based automatic coding system based on the CLAN program MOR and suitable for coding languages with richer morphologies than English, such as Modern Greek. Coding huge amounts of data then became much quicker and more comfortable so that I decided to adopt this system for German as well. The paper "Working with the CHILDES Tools" is based on two earlier manuscripts which have grown out of my research on Greek child language and the many CHILDES workshops taught in Germany, Greece, Portugal, and Brazil over the years. Its contents have now been adapted to the requirements of research into the acquisition of German as a second language and for use on Windows.
In this paper we show an approach to the customization of GermaNet to the German HPSG grammar lexicon developed in the Verbmobil project. GermaNet has a broad coverage of the German base vocabulary and fine-grained semantic classification; while the HPSG grammar lexicon is comparatively small und has a coarse-grained semantic classification. In our approach, we have developed a mapping algorithm to relate the synsets in GermaNet with the semantic sorts in HPSG. The evaluation result shows that this approach is useful for the lexical extension of our deep grammar development to cope with real-world text understanding.
Ein einer Äußerung können Nullpronomina aus mehreren [...] Gruppen vorkommen. Die [...] Gruppen können auf die Ebenen eines Schicht-Dialogmodells bezogen werden; andererseits können sie Hinweise geben, welche Informationen in einem Dialogmodell verfügbar sein sollten. Dies wird in der Folgezeit genauer zu untersuchen sein. Im folgenden werden die genannten Typen von Nullpronomina genauer dargestellt und Lösungsverfahren zum Auffinden der Referenten genannt.
Die Entwicklung eines individuellen Standards „vom grünen Tisch“ führt selten zu zufriedenstellenden Ergebnissen. Bei der automatischen Prüfung stellt man schnell fest, dass die „ausgedachten“ Regeln einer systematischen Anwendung nicht standhalten. Bei der Implementierung solcher Richtlinien stellt man fest, dass sie oft zu wenig konkret formuliert sind, wie z.B. „formulieren Sie Handlungsanweisungen knapp und präzise“. Wie jedoch kann ein Standard entwickelt werden, der zu einem Unternehmen, seiner Branche und Zielgruppen passt und für die automatische Prüfung implementiert werden kann? Sprachtechnologie hilft effizient bei der Entwicklung individueller Richtlinien. Durch Datenanalyse, Satzcluster und Parametrisierung entsteht ein textspezifischer individueller Standard. Ist damit aber der Gegensatz von Kreativität und Standardisierung aufgehoben?
Japanese is often taken to be strictly head-final in its syntax. In our work on a broad-coverage, precision implemented HPSG for Japanese, we have found that while this is generally true, there are nonetheless a few minor exceptions to the broad trend. In this paper, we describe the grammar engineering project, present the exceptions we have found, and conclude that this kind of phenomenon motivates on the one hand the HPSG type hierarchical approach which allows for the statement of both broad generalizations and exceptions to those generalizations and on the other hand the usefulness of grammar engineering as a means of testing linguistic hypotheses.
We present a broad coverage Japanese grammar written in the HPSG formalism with MRS semantics. The grammar is created for use in real world applications, such that robustness and performance issues play an important role. It is connected to a POS tagging and word segmentation tool. This grammar is being developed in a multilingual context, requiring MRS structures that are easily comparable across languages.
In this text, we describe the development of a broad coverage grammar for Japanese that has been built for and used in different application contexts. The grammar is based on work done in the Verbmobil project (Siegel 2000) on machine translation of spoken dialogues in the domain of travel planning. The second application for JACY was the automatic email response task. Grammar development was described in Oepen et al. (2002a). Third, it was applied to the task of understanding material on mobile phones available on the internet, while embedded in the project DeepThought (Callmeier et al. 2004, Uszkoreit et al. 2004). Currently, it is being used for treebanking and ontology extraction from dictionary definition sentences by the Japanese company NTT (Bond et al. 2004).
Das Problem des Transfers in der maschinellen Übersetzung von Japanisch nach Englisch ist fehlende Information über Numerus und Definitheit im Japanischen, die für die Wahl der englischen Artikel und die Nomenmarkierung gebraucht wird. Obwohl dieses Problem signifikant ist, beschäftigt sich die Forschungsliteratur kaum damit. [...] Wir bsaieren unsere Untersuchungen auf experimentell erhobenen Daten aus einem Experiment über deutsch-japanische gedolmetschte Terminaushandlungsdialoge [...]. Auf diese Weise können Phänomene bestimmt werden, die für die Domäne von VERBMOBIL relevant sind. Wir sehen unser Vorgehen in Übereinstimmung mit dem 'Sublanguage'-Ansatz [...].
Eins der signifikanten Probleme in der maschinellen Übersetzung japanische in deutsche Sprache ist die fehlende Information und Definitheit im japanischen Analyse-Output. Eine effiziente Lösung dieses Problems ist es, die Suche nach der relevanten Information in den Transfer zu integrieren. Transferregeln werden mit Präferenzregeln und Default-Regeln kombiniert. Dadurch wird Information über lexikalische Restriktionen der Zielsprache, über die Domäne und über den Diskurs zugänglich.
Die Domäne in VERBMOBIL sind Terminaushandlungsdialoge. Für die Syntax bedeutet das zunächst, daß die Sytnax sich an gesprochener Sprache orientieren muß. Das beinhaltet Nullanaphern, Phrasen, die auf die Kommunikationssituation bezogen sind und Phrasen, die für geschriebene Sprache als nicht wohlgeformt bezeichnet werden. Weitergehend gibt es einige domänenspezifische syntaktische besonderheiten, wie zum Biepsiel die Realisierung von Zeitangaben.
We present a solution for the representation of Japanese honorifical information in the HPSG framework. Basically, there are three dimensions of honorification. We show that a treatment is necessary that involves both the syntactic and the contextual level of information. The japanese grammar is part of a machine translation system.
Preferences and defaults for definiteness and number in japanese to german machine translation
(1996)
A significant problem when translating Japanese dialogues into German is the missing information on number and definiteness in the Japanese analysis output. The integration of the search for such information into the transfer process provides an efficient solution. General transfer includes conditions to make it possible to consider external knowledge. Thereby, grammatical and lexical knowledge of the source language, knowledge of lexical restrictions on the target language, domain knowledge and discourse knowledge are accessible.
A comprehensive investigation of Japanese particle was missing up to now. General implications were set up without the fact that a comprehensive analysis was carried out. [...] We offer a lexicalist treatment of the problem. Instead of assuming different phrase structure rules we state a type hierarchy of Japanese particles. This makes a uniform treatment of phrase structure as well as a differentiation of subcategorization patterns possible.
Particles fullfill several distinct central roles in the Japanese language. They can mark arguments as well as adjuncts, can be functional or have semantic functions. There is, however, no straightforward matching from particles to functions, as, e.g., 'ga' can mark the subject, the object or the adjunct of a sentence. Particles can cooccur. Verbal arguments that could be identified by particles can be eliminated in the Japanese sentence. And finally, in spoken language particles are often omitted. A proper treatment of particles is thus necessary to make an analysis of Japanese sentences possible. Our treatment is based on an empirical investigation of 800 dialogues. We set up a type hierarchy of particles motivated by their subcategorizational and modificational behaviour. This type hierarchy is part of the Japanese syntax in VERBMOBIL.
Sprachtechnologie für übersetzungsgerechtes Schreiben am Beispiel Deutsch, Englisch, Japanisch
(2009)
Wir [...] haben uns zur Aufgabe gesetzt, Wege zu finden, wie linguistisch basierte Software den Prozess des Schreibens technischer Dokumentation unterstützen kann. Dabei haben wir einerseits die Schwierigkeiten im Blick, die japanische und deutsche Autoren (und andere Nicht-Muttersprachler des Englischen) beim Schreiben englischer Texte haben. Besonders japanische Autoren haben mit Schwierigkeiten zu kämpfen, weil sie hochkomplexe Ideen in einer Sprache ausdrücken müssen, die von Informationsstandpunkt her sehr unterschiedlich zu ihrer Muttersprache ist. Andererseits untersuchen wir technische Dokumentation, die von Autoren in ihrer Muttersprache geschrieben wird. Obwohl hier die fremdsprachliche Komponente entfällt, ist doch auch erhebliches Verbesserungspotential vorhanden. Das Ziel ist hier, Dokumente verständlich, konsistent und übersetzungsgerecht zu schreiben. Der fundamentale Ansatz in der Entwicklung linguistisch-basierter Software ist, dass gute linguistische Software auf Datenmaterial basiert und sich an den konkreten Zielen der besseren Dokumentation orientiert.
Der Übersetzungsprozess der Technischen Dokumentation wird zunehmend mit Maschineller Übersetzung (MÜ) unterstützt. Wir blicken zunächst auf die Ausgangstexte und erstellen automatisch prüfbare Regeln, mit denen diese Texte so editiert werden können, dass sie optimale Ergebnisse in der MÜ liefern. Diese Regeln basieren auf Forschungsergebnissen zur Übersetzbarkeit, auf Forschungsergebnissen zu Translation Mismatches in der MÜ und auf Experimenten.
In der folgenden Darstellung geht es einerseits darum, an Beispielen aufzuzeigen, inwiefern die schweizerdeutschen Mundarten und die deutsche Standardsprache in Lautung, Formenbildung, Satzbau und Wortschatz auseinandergehen können, andererseits aber immer auch um das Aufweisen von Gemeinsamkeiten. Oft werden nämlich bestimmte Erscheinungen des dialektalen Sprachbaus vorschnell als Eigenarten der Mundart verstanden, obwohl dieselben Erscheinungen auch im gesprochenen Hochdeutschen anzutreffen sind. Somit liegen also häufig nicht Unterschiede zwischen Mundart und Standardsprache vor, sondern Unterschiede zwischen gesprochener Sprache und geschriebener Sprache. [vollständige Überarbeitung für eine zweite Auflage]
In terms of their functions and issues, the use of selection posters is possible in language teaching. Therefore, the present study aims to investigate the didactic potential of selection posters in German language teaching. Because of this reason, with this study, it is tried to show that the selection posters can be dealt with as materials in the courses in German Language teaching, which can be used parallel to the needs and interests. Accordingly, the alternative ways or approaches are tried to be made concrete throughout the courses. Consequently, the selection posters constitutes a wide range in German language teaching in terms of local culture, vocabulary knowledge, the processes of linguistic studies, visualization, authenticity, actuality, and spoken and written studies.