OPUS 4 | Linguistik

Atlas kao arhiv vrijednih podataka (Općeslavenski lingvistički atlas: Fonetsko-gramatička serija, Tom 4a, Refleksi *ъ i *ь, Zagreb, 2006.) (2007)

Matas Ivanković, Ivana

Hrvatske koordinativne složenice (2010)

Marković, Ivan

Prema opisima u suvremenim hrvatskim gramatikama dalo bi se zaključiti da hrvatski koordinativne složenice ili ne poznaje ili da ih je toliko malo da ne traže opis. U članku se podsjeća da je u starijim gramatikama o njima bilo riječi, a da svojom suvremenom količinom i različitim ostvarajima (imeničke, pridjevske, priložne, sa spojnicima -o- i -0-) gramatički opis itekako zaslužuju. Pokazuje se zbog kojih se svojih odlika takve složenice mogu smatrati riječima, a ne spojevima riječi, sintagmama. Na primjeru jezika Anke Žagar pokazuje se da model koordinativnih složenica kao potencija može unutar poezije poprimiti i jezičnostvaralačke inačice.

Tri nehrvatske tvorbe : infiksacija, reduplikacija, fuzija (2009)

Marković, Ivan

Rječotvorni načini hrvatskoga jezika temelje se na ulančavanju morfema. U radu se opisuju tri tvorbena načina kojih nema u autohtonu, naslijeđenu hrvatskom leksiku – jedan koji se također temelji na morfemskoj raščlambi (infiksacija), dva kojima su temelji drugačiji (reduplikacija i leksička fuzija). Rad želi troje: i) istaknuti pojedine nedosljednosti postojećih opisa hrvatske morfologije, ii) opisati pojedinačne pozajmljene i domaće hrvatske lekseme i konstrukcije u kojima se o tim trima tvorbama može govoriti; iii) predvidjeti mogu li se neautohtoni tvorbeni načini i u kojoj mjeri importirati iz stranih jezika, danas ponajprije (jedino) engleskoga.

Hrvatski posvojni pridjev kao antecedent relativnoj zamjenici (2008)

Marković, Ivan

Razmatra se mogućnost hrvatskoga posvojnog pridjeva da bude antecedent relativnoj zamjenici, mogućnost koja se u slavenskim jezicima sve više gubi, odnosno mjesto posvojnoga pridjeva u toj funkciji zauzima genitiv. Potvrdama se pokazuje da ta mogućnost u pisanome hrvatskome (još) postoji. Provedena anketa s izvornim govornicima pokazuje ipak da takve konstrukcije kao prihvatljive ovjerava tek manji dio suvremenih govornika. Analiziraju se tipološki neobična svojstva relativnih rečenica s posvojnim pridjevom kao antecedentom, osobito to da se u njima posvojni pridjev vlada kao padežni oblik imenice, a ne njezin derivat. Ključne riječi: posvojni pridjev, antecedent, relativna rečenica, genitiv, slavenski jezici

Latentno posuđivanje u hrvatskome i drugim jezicima : posljedice i otpori (2009)

Margić, Branka Drljača

Iako se prevedenicama aktiviraju vlastite izražajne mogućnosti jezika, one su također predmet purističkih reakcija. Cilj je rada analizirati latentni utjecaj engleskoga jezika na različite jezične razine kao pojavu koja je prisutna u hrvatskome i u drugim europskim jezicima. Primjeri pokazuju da se radi o rasprostranjenoj pojavi koja proizlazi iz doslovnoga i nemarnoga prijevoda, nepoznavanja norme vlastitoga jezika i pomodnoga slijeda engleske jezične norme.

Mjesni govor Kacane (2009)

Mandić, David ; Pliško, Lina

U članku su prikazane alijetetne, alteritetne te arealne jezične značajke mjesnoga govora Kacane, koja teritorijalno pripada Gradu Vodnjanu. Prema rezultatima istraživanja, taj idiom pripada jugozapadnome istarskom ili štakavsko-čakavskome dijalektu. Jezične značajke Kacane jednake su jezičnim značajkama susjednih Orbanića i drugih dosad istraženih govora Marčanštine te onih južne podskupine barbanskih mjesnih govora, što navodi na zaključak da se krak govora takvih jezičnih značajki proteže dalje prema zapadu.

Dijalekti u Gorskom kotaru (2010)

Malnar, Marija

U Gorskome kotaru govori se svim našim narječjima, kajkavskim, štokavskim i čakavskim, ali rijetki su dijalektolozi koji ih istražuju. U radu se iznosi pregled osnovnih fonoloških i morfoloških karakteristika zabilježenih u dosadašnjim istraživanjima na tom području. Uz zabilježene potvrde promatranih osobina, radu je priložen fonološki zapis jednoga goranskoga idioma.

Filološki i kulturološki događaj (Šimun Kožičić Benja: Knjižice od žitija rimskih arhijerejov i cesarov, Rijeka, 1531. – Knjiga 1: Pretisak; Knjiga 2: Latinička transkripcija glagoljskoga teksta, Priredila Anica Nazor, Sveučilišna knjižnica Rijeka, Rijeka, 2007.) (2007)

Malić, Dragica

Mediengestützter Deutschunterricht im türkischen universitären Bereich : eine Bestandsaufnahme (2010)

Maden, Sevinç Sakarya ; Çelik, Sevil

A trend in nature of a permanent increase towards multimedia lifestyle has arisen in all stratas of the society. Thus, rather than using written course-books, publishing houses prefer to encourage use of multimedia which are dependent to course-book or which are independent of course-book and language learners prefer to learn with multimedia. Thus it is encouraged that courses are supported in that manner. This study aims to examine scope and limits of computer aided German teaching which is flourishing as a foreign language within Turkey university education recently. This study has been applied in preparatory classes of departments which provide four-year education. Results of a survey on use of multimedia dependent on course-book or independent of course-book within courses within Turkey university education has been given within scope of this study. Evidences on competence of German teachers and learners in use of multimedia has been given and have been visualized through use of graphics. Problems of multimedia aided German courses and solutions offers will be submitted.

Enatthembo : an appraisal of linguistic and socio-linguistic factors (2010)

Lyndon, Christopher ; Lyndon, Ada

Govorničke i stilske figure u poeziji i putopisima fra Ivana Franje Jukića (2010)

Lukenda, Marko

U radu je ponuđena raščlamba stilskih i govorničkih figura u poeziji i u putopisima fra Ivana Franje Jukića, angažiranoga franjevačkoga pisca i borca za političku samostalnsot Bosne. Autor je utvrdio da Jukić u svoj književni izraz unosi elemente narodnih govora, što se posebno zapaža u uporabi pučkih fraza i kolokacija. S druge strane, izbor tzv. knjiških figura otkriva utjecaj franjevačke tradicije, posebno jezika starijih franjevačkih ljetopisa.

Stvarnost jezika, moć medija i uloga jezikoslovca (Lana Hudeček, Milica Mihaljević: Jezik medija, publicistički funkcionalni stil, Hrvatska sveučilišna naklada, Zagreb, 2009.) (2009)

Lukenda, Marko

Uzduž i poprijeko po Mikaljinu Blagu : (Darija Gabrić-Bagarić: Na ishodištu hrvatske leksikografije. Trojezični rječnik Blago jezika slovinskoga Jakova Mikalje, 1649./1651.) (2010)

Lovrić Jović, Ivana

Novo ruho stare gramatike (Jakov Mikalja: Gramatika tali(j)anska ukratko ili kratak nauk za učiti latinski jezik. Pretisak. Transkripcija, studija i popratni tekstovi: Darija Gabrić-Bagarić; i Marijana Horvat, Institut za hrvatski jezik i jezikoslovlje, Zagreb, 2008.) (2008)

Lovrić Jović, Ivana

Morfološka svojstva jezika hrvatskih dubrovačkih oporuka iz 17. i 18. stoljeća (2008)

Lovrić Jović, Ivana

Rad je nastao iz potrebe da se opiše dubrovački pučki govor 17. i 18. st. Pri morfološkoj je analizi važno uzeti u obzir da se opisuje jezično razdoblje i područje podudarno s početkom formiranja današnjega standardnog jezika. Analiza postaje svrhovitom usporedi li se s rezultatima jezičnih studija razdobljā koja su joj prethodila i slijedila, do današnjega vremena.

Determinierer im Erwerb des Deutschen als Zweitsprache : eine Fallstudie (2007)

Loll, Annegret

Die zielsprachliche Verwendung des Artikels als grammatikalisiertem Mittel der NP-Determination im Deutschen stellt im Zweitspracherwerb besonders für Deutschlernende mit einer artikellosen Muttersprache eine große Schwierigkeit dar. Die vorliegende Arbeit untersucht die NP-Determination auf der Basis eines Spontansprachkorpus, welches Erwerbsdaten einer achtjährigen russischen Deutschlernenden in einer frühen und einer späten Erwerbsphase liefert. Das Ziel der Untersuchung ist, Erkenntnisse über Entwicklungsverlauf, Transferphänomene und insbesondere referenzsemantische und phonologische Determinanten der Artikelwahl zu gewinnen.

Pavao Ritter Vitezović kao leksikograf : (Pavao Ritter Vitezović: Lexicon Latino-Illyricum, svezak drugi; Hrvatsko-latinski rječnik, svezak treći) (2010)

Lisac, Josip

Niederdeutsch : Der Dialekt als identitätsstiftendes Moment (2010)

Lippmann, Enrico

This article adresses one function of dialects showing their importance of controlling everyday language. On the example of Low German, a vernacular spoken in Northern Germany, the function of identity is shown and explained. Firstly the understanding of biography is given, followed by an overview about the research undertaking about biographical studies in linguistics, especially in dialectology and Low German philology. The main part concerns the exemplary analysis of an interview of a dialect speaker. The aim of the article is to show in detail the identity function of dialects and the chances qualitive methods can contribute to linguistic researches.

Factorizing complementation in a TT-MCTAG for German (2008)

Lichte, Timm ; Kallmeyer, Laura

TT-MCTAG lets one abstract away from the relative order of co-complements in the final derived tree, which is more appropriate than classic TAG when dealing with flexible word order in German. In this paper, we present the analyses for sentential complements, i.e., wh-extraction, thatcomplementation and bridging, and we work out the crucial differences between these and respective accounts in XTAG (for English) and V-TAG (for German).

Dobar savjet zlata vrijedi : (Lana Hudeček, Milica Mihaljević, Luka Vukojević: Jezični savjeti, Institut za hrvatski jezik i jezikoslovlje, Zagreb, 2010.) (2010)

Lewis, Kristian

Od munje i munjka do muštuluka (Артур Рафаэлович Багдасаров: Новый хорватско-русский словарь, Воентехиниздат, Москва, 2007.) (2007)

Lewis, Kristian

On the perspectivization of a recipient role - cross-linguistic results from a speech production experiment on GET-passives in German, Dutch and Luxembourgish (2009)

Lenz, Alexandra N.

The focus of this paper is the perspectivization of thematic roles generally and the recipient role specifically. Whereas perspective is defined here as the representation of something for someone from a given position (Sandig 1996: 37), perspectivization refers to the verbalization of a situation in the speech generation process (Storrer 1996: 233). In a prototypical act of giving, for example, the focus of perception (the attention of the external observer) may be on the person who gives (agent), the transferred object (patient) or the person who receives the transferred object (recipient). The languages of the world provide differing linguistic means to perspectivize such an act of giving, or better: to perspectivize the participants of such an action. In this article, the linguistic means of three selected continental West Germanic languages –German, Dutch and Luxembourgish– will be taken into consideration, with an emphasis on the perspectivization of the recipient role.

Speech transcription using MED (2001)

Lehmann, Katrin

MED (Media EDitor) is a program designed to facilitate the transcription of digitized soundfiles into textfiles. It was written by Hans Drexler and Daan Broeder, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands. [...] The aim of MED is to facilitate the transcription of sound into text using a single program. It works on the principle of the coexistence and interaction of two basic elements, the waveform display window and the text window. [...] This means that you no longer need to use both a sound editor and a word processor at the same time in order to transcribe digitized speech files. Instead, you can directly type the sound you hear (and see) via MED into the text window. Furthermore, you can directly link sound portions of the waveform display window to text portions of the text window, so that you can easily locate and listen to the original source of your transcription once the links have been set. In this function the waveform display window and the text window virtually interact with each other.

Eine kritisch-kontrastive Darstellung der Kognitiven Linguistik (2010)

Kınsız, Mustafa ; Demir, Kemal

This article attempts a brief introduction on the topic of cognitive sciences. By emphasizing cognitive linguistics, which separates in two positions will be part of the cognitive Sciences expressed with their linguistic function and is the heart matter, stands for a criticism about their lack of diagnostics. These positions of cognitive linguistics, whose paper are the neuro-linguistics and the cognitive linguistics, are presented in detail and both cognitively linguistic point of views are questioned for their scientific validity. Cognitive Linguistics is a field of cognitive science understood. Cognitive science tries with their research on Imitate human brain, which has arisen from this area, and also Artificial Intelligent researches in which the brain researchers with their colleagues from the field of computer technology try to develop artificialintelligence as an objective. The contribution of the linguistic component directs the Cognitive Linguistics in their research.

Evaluating POS tagging under sub-optimal conditions : or: does meticulousness pay? (2000)

Kübler, Sandra ; Wagner, Andreas

In this paper, we investigate the role of sub-optimality in training data for part-of-speech tagging. In particular, we examine to what extent the size of the training corpus and certain types of errors in it affect the performance of the tagger. We distinguish four types of errors: If a word is assigned a wrong tag, this tag can belong to the ambiguity class of the word (i.e. to the set of possible tags for that word) or not; furthermore, the major syntactic category (e.g. "N" or "V") can be correctly assigned (e.g. if a finite verb is classified as an infinitive) or not (e.g. if a verb is classified as a noun). We empirically explore the decrease of performance that each of these error types causes for different sizes of the training set. Our results show that those types of errors that are easier to eliminate have a particularly negative effect on the performance. Thus, it is worthwhile concentrating on the elimination of these types of errors, especially if the training corpus is large.

Towards a dependency-oriented evaluation for partial parsing (2002)

Kübler, Sandra ; Telljohann, Heike

Quantitative evaluation of parsers has traditionally centered around the PARSEVAL measures of crossing brackets, (labeled) precision, and (labeled) recall. However, it is well known that these measures do not give an accurate picture of the quality of the parsers output. Furthermore, we will show that they are especially unsuited for partial parsers. In recent years, research has concentrated on dependencybased evaluation measures. We will show in this paper that such a dependency-based evaluation scheme is particularly suitable for partial parsers. TüBa-D, the treebank used here for evaluation, contains all the necessary dependency information so that the conversion of trees into a dependency structure does not have to rely on heuristics. Therefore, the dependency representations are not only reliable, they are also linguistically motivated and can be used for linguistic purposes.

A testsuite for testing parser performance on complex German grammatical constructions [TePaCoC - a corpus for testing parser performance on complex German grammatical constructions] (2009)

Kübler, Sandra ; Rehbein, Ines ; Genabith, Josef van

Traditionally, parsers are evaluated against gold standard test data. This can cause problems if there is a mismatch between the data structures and representations used by the parser and the gold standard. A particular case in point is German, for which two treebanks (TiGer and TüBa-D/Z) are available with highly different annotation schemes for the acquisition of (e.g.) PCFG parsers. The differences between the TiGer and TüBa-D/Z annotation schemes make fair and unbiased parser evaluation difficult [7, 9, 12]. The resource (TEPACOC) presented in this paper takes a different approach to parser evaluation: instead of providing evaluation data in a single annotation scheme, TEPACOC uses comparable sentences and their annotations for 5 selected key grammatical phenomena (with 20 sentences each per phenomena) from both TiGer and TüBa-D/Z resources. This provides a 2 times 100 sentence comparable testsuite which allows us to evaluate TiGer-trained parsers against the TiGer part of TEPACOC, and TüBa-D/Z-trained parsers against the TüBa-D/Z part of TEPACOC for key phenomena, instead of comparing them against a single (and potentially biased) gold standard. To overcome the problem of inconsistency in human evaluation and to bridge the gap between the two different annotation schemes, we provide an extensive error classification, which enables us to compare parser output across the two different treebanks. In the remaining part of the paper we present the testsuite and describe the grammatical phenomena covered in the data. We discuss the different annotation strategies used in the two treebanks to encode these phenomena and present our error classification of potential parser errors.

Why is German dependency parsing more reliable than constituent parsing? (2006)

Kübler, Sandra ; Prokic, Jelena

In recent years, research in parsing has extended in several new directions. One of these directions is concerned with parsing languages other than English. Treebanks have become available for many European languages, but also for Arabic, Chinese, or Japanese. However, it was shown that parsing results on these treebanks depend on the types of treebank annotations used. Another direction in parsing research is the development of dependency parsers. Dependency parsing profits from the non-hierarchical nature of dependency relations, thus lexical information can be included in the parsing process in a much more natural way. Especially machine learning based approaches are very successful (cf. e.g.). The results achieved by these dependency parsers are very competitive although comparisons are difficult because of the differences in annotation. For English, the Penn Treebank has been converted to dependencies. For this version, Nivre et al. report an accuracy rate of 86.3%, as compared to an F-score of 92.1 for Charniaks parser. The Penn Chinese Treebank is also available in a constituent and a dependency representations. The best results reported for parsing experiments with this treebank give an F-score of 81.8 for the constituent version and 79.8% accuracy for the dependency version. The general trend in comparisons between constituent and dependency parsers is that the dependency parser performs slightly worse than the constituent parser. The only exception occurs for German, where F-scores for constituent plus grammatical function parses range between 51.4 and 75.3, depending on the treebank, NEGRA or TüBa-D/Z. The dependency parser based on a converted version of Tüba-D/Z, in contrast, reached an accuracy of 83.4%, i.e. 12 percent points better than the best constituent analysis including grammatical functions.

Memory-based vocalization of Arabic (2008)

Kübler, Sandra ; Mohamed, Emad

The problem of vocalization, or diacritization, is essential to many tasks in Arabic NLP. Arabic is generally written without the short vowels, which leads to one written form having several pronunciations with each pronunciation carrying its own meaning(s). In the experiments reported here, we define vocalization as a classification problem in which we decide for each character in the unvocalized word whether it is followed by a short vowel. We investigate the importance of different types of context. Our results show that the combination of using memory-based learning with only a word internal context leads to a word error rate of 6.64%. If a lexical context is added, the results deteriorate slowly.

How to compare treebanks (2008)

Kübler, Sandra ; Maier, Wolfgang ; Rehbein, Ines ; Versley, Yannick

Recent years have seen an increasing interest in developing standards for linguistic annotation, with a focus on the interoperability of the resources. This effort, however, requires a profound knowledge of the advantages and disadvantages of linguistic annotation schemes in order to avoid importing the flaws and weaknesses of existing encoding schemes into the new standards. This paper addresses the question how to compare syntactically annotated corpora and gain insights into the usefulness of specific design decisions. We present an exhaustive evaluation of two German treebanks with crucially different encoding schemes. We evaluate three different parsers trained on the two treebanks and compare results using EVALB, the Leaf-Ancestor metric, and a dependency-based evaluation. Furthermore, we present TePaCoC, a new testsuite for the evaluation of parsers on complex German grammatical constructions. The testsuite provides a well thought-out error classification, which enables us to compare parser output for parsers trained on treebanks with different encoding schemes and provides interesting insights into the impact of treebank annotation schemes on specific constructions like PP attachment or non-constituent coordination.

Combining dependency parsing with PP attachment (2007)

Kübler, Sandra ; Ivanova, Steliana ; Klett, Eva

Prepositional phrase (PP) attachment is one of the major sources for errors in traditional statistical parsers. The reason for that lies in the type of information necessary for resolving structural ambiguities. For parsing, it is assumed that distributional information of parts-of-speech and phrases is sufficient for disambiguation. For PP attachment, in contrast, lexical information is needed. The problem of PP attachment has sparked much interest ever since Hindle and Rooth (1993) formulated the problem in a way that can be easily handled by machine learning approaches: In their approach, PP attachment is reduced to the decision between noun and verb attachment; and the relevant information is reduced to the two possible attachment sites (the noun and the verb) and the preposition of the PP. Brill and Resnik (1994) extended the feature set to the now standard 4-tupel also containing the noun inside the PP. Among many publications on the problem of PP attachment, Volk (2001; 2002) describes the only system for German. He uses a combination of supervised and unsupervised methods. The supervised method is based on the back-off model by Collins and Brooks (1995), the unsupervised part consists of heuristics such as ”If there is a support verb construction present, choose verb attachment”. Volk trains his back-off model on the Negra treebank (Skut et al., 1998) and extracts frequencies for the heuristics from the ”Computerzeitung”. The latter also serves as test data set. Consequently, it is difficult to compare Volk’s results to other results for German, including the results presented here, since not only he uses a combination of supervised and unsupervised learning, but he also performs domain adaptation. Most of the researchers working on PP attachment seem to be satisfied with a PP attachment system; we have found hardly any work on integrating the results of such approaches into actual parsers. The only exceptions are Mehl et al. (1998) and Foth and Menzel (2006), both working with German data. Mehl et al. report a slight improvement of PP attachment from 475 correct PPs out of 681 PPs for the original parser to 481 PPs. Foth and Menzel report an improvement of overall accuracy from 90.7% to 92.2%. Both integrate statistical attachment preferences into a parser. First, we will investigate whether dependency parsing, which generally uses lexical information, shows the same performance on PP attachment as an independent PP attachment classifier does. Then we will investigate an approach that allows the integration of PP attachment information into the output of a parser without having to modify the parser: The results of an independent PP attachment classifier are integrated into the parse of a dependency parser for German in a postprocessing step.

Parsing coordinations (2009)

Kübler, Sandra ; Hinrichs, Erhard ; Maier, Wolfgang ; Klett, Eva

The present paper is concerned with statistical parsing of constituent structures in German. The paper presents four experiments that aim at improving parsing performance of coordinate structure: 1) reranking the n-best parses of a PCFG parser, 2) enriching the input to a PCFG parser by gold scopes for any conjunct, 3) reranking the parser output for all possible scopes for conjuncts that are permissible with regard to clause structure. Experiment 4 reranks a combination of parses from experiments 1 and 3. The experiments presented show that n- best parsing combined with reranking improves results by a large margin. Providing the parser with different scope possibilities and reranking the resulting parses results in an increase in F-score from 69.76 for the baseline to 74.69. While the F-score is similar to the one of the first experiment (n-best parsing and reranking), the first experiment results in higher recall (75.48% vs. 73.69%) and the third one in higher precision (75.43% vs. 73.26%). Combining the two methods results in the best result with an F-score of 76.69.

Is it really that difficult to parse German? (2006)

Kübler, Sandra ; Hinrichs, Erhard ; Maier, Wolfgang

This paper presents a comparative study of probabilistic treebank parsing of German, using the Negra and TüBa-D/Z treebanks. Experiments with the Stanford parser, which uses a factored PCFG and dependency model, show that, contrary to previous claims for other parsers, lexicalization of PCFG models boosts parsing performance for both treebanks. The experiments also show that there is a big difference in parsing performance, when trained on the Negra and on the TüBa-D/Z treebanks. Parser performance for the models trained on TüBa-D/Z are comparable to parsing results for English with the Stanford parser, when trained on the Penn treebank. This comparison at least suggests that German is not harder to parse than its West-Germanic neighbor language English.

From chunks to function-argument structure : a similarity-based approach (2001)

Kübler, Sandra ; Hinrichs, Erhard

Chunk parsing has focused on the recognition of partial constituent structures at the level of individual chunks. Little attention has been paid to the question of how such partial analyses can be combined into larger structures for complete utterances. Such larger structures are not only desirable for a deeper syntactic analysis. They also constitute a necessary prerequisite for assigning function-argument structure. The present paper offers a similaritybased algorithm for assigning functional labels such as subject, object, head, complement, etc. to complete syntactic structures on the basis of prechunked input. The evaluation of the algorithm has concentrated on measuring the quality of functional labels. It was performed on a German and an English treebank using two different annotation schemes at the level of function argument structure. The results of 89.73% correct functional labels for German and 90.40%for English validate the general approach.

TüSBL : a similarity-based chunk parser for robust syntactic processing (2001)

Kübler, Sandra ; Hinrichs, Erhard

Chunk parsing has focused on the recognition of partial constituent structures at the level of individual chunks. Little attention has been paid to the question of how such partial analyses can be combined into larger structures for complete utterances. The TüSBL parser extends current chunk parsing techniques by a tree-construction component that extends partial chunk parses to complete tree structures including recursive phrase structure as well as function-argument structure. TüSBLs tree construction algorithm relies on techniques from memory-based learning that allow similarity-based classification of a given input structure relative to a pre-stored set of tree instances from a fully annotated treebank. A quantitative evaluation of TüSBL has been conducted using a semi-automatically constructed treebank of German that consists of appr. 67,000 fully annotated sentences. The basic PARSEVAL measures were used although they were developed for parsers that have as their main goal a complete analysis that spans the entire input.This runs counter to the basic philosophy underlying TüSBL, which has as its main goal robustness of partially analyzed structures.

Robustes chunkparsing mit variabler Analysetiefe (2000)

Kübler, Sandra ; Hinrichs, Erhard

Das Chunkparsing bietet einen besonders vielversprechenden Ansatz zum robusten, partiellen Parsing mit dem Ziel einer breiten Datenabdeckung. Ziel beim Chunkparsing ist eine partielle, nicht-rekursive syntaktische Struktur. Dieser extrem effiziente Parsing-Ansatz läßt sich als Kaskade endlicher Transducer realisieren. In diesem Beitrag wird TüSBL vorgestellt, ein System, bei dem die Eingabe aus spontaner, gesprochener Spache besteht, die dem Parser in Form eines Worthypothesengraphen aus einem Spracherkenner zur Verfügung gestellt wird. Chunkparsing ist für eine solche Anwendung besonders geeignet, da es fragmentarische oder nicht wohlgeformte Äußerungen robust behandeln kann. Des weiteren wird eine Baumkonstruktionskomponente vorgestellt, die die partiellen Chunkstrukturen zu vollständigen Bäumen mit grammatischen Funktionen erweitert. Das System wird anhand manuell überprüfter Systemeingaben evaluiert, da sich die üblichen Evaluationsparameter hierfür nicht eignen.

Braucht Nominalphrasenerkennung linguistisches Wissen? (2001)

Kübler, Sandra

Maschinelles Lernen wird häufig zur effzienten Annotation großer Datenmengen eingesetzt. Die Forschung zu maschinellen Lernverfahren beschränkt sich i.a. darauf unterschiedliche Lernverfahren zu vergelichen oder die optimale größe der Trainingsdaten zu bestimmen. Bisher wurde jedoch nicht untersucht, in wie weit sich linguistisches Wissen bei der Aufgabendefinition positiv auswirken kann. Dies soll hier anhand des Lernens von Base-Nominalphrasen mit drei unterschiedlichen Definitionen untersucht werden. Die Definitionen unterscheiden sich im Grad der linguistisch motivierten Erweiterungen, die zu einer eher praktisch motivierten ersten Definition hinzu kamen. Die Untersuchungen ergaben, dass sich die Anzahl der falsch klasssifizierten Wörter um ein Drittel reduzieren lässt.

How do treebank annotation schemes influence parsing results? : or how not to compare apples and oranges (2005)

Kübler, Sandra

In the last decade, the Penn treebank has become the standard data set for evaluating parsers. The fact that most parsers are solely evaluated on this specific data set leaves the question unanswered how much these results depend on the annotation scheme of the treebank. In this paper, we will investigate the influence which different decisions in the annotation schemes of treebanks have on parsing. The investigation uses the comparison of similar treebanks of German, NEGRA and TüBa-D/Z, which are subsequently modified to allow a comparison of the differences. The results show that deleted unary nodes and a flat phrase structure have a negative influence on parsing quality while a flat clause structure has a positive influence.

The PaGe 2008 shared task on parsing German (2008)

Kübler, Sandra

The ACL 2008 Workshop on Parsing German features a shared task on parsing German. The goal of the shared task was to find reasons for the radically different behavior of parsers on the different treebanks and between constituent and dependency representations. In this paper, we describe the task and the data sets. In addition, we provide an overview of the test results and a first analysis.

Towards case-based parsing : are chunks reliable indicators for syntax trees? (2006)

Kübler, Sandra

This paper presents an approach to the question whether it is possible to construct a parser based on ideas from case-based reasoning. Such a parser would employ a partial analysis of the input sentence to select a (nearly) complete syntax tree and then adapt this tree to the input sentence. The experiments performed on German data from the Tüba-D/Z treebank and the KaRoPars partial parser show that a wide range of levels of generality can be reached, depending on which types of information are used to determine the similarity between input sentence and training sentences. The results are such that it is possible to construct a case-based parser. The optimal setting out of those presented here need to be determined empirically.

Learning a lexicalized grammar for German (1998)

Kübler, Sandra

In syntax, the trend nowadays is towards lexicalized grammar formalisms. It is now widely accepted that dividing words into wordclasses may serve as a laborsaving mechanism - but at the same time, it discards all detailed information on the idiosyncratic behavior of words. And that is exactly the type of information that may be necessary in order to parse a sentence. For learning approaches, however, lexicalized grammars represent a challenge for the very reason that they include so much detailed and specific information, which is difficult to learn. This paper will present an algorithm for learning a link grammar of German. The problem of data sparseness is tackled by using all the available information from partial parses as well as from an existing grammar fragment and a tagger. This is a report about work in progress so there are no representative results available yet.

Parsing without grammar - using complete trees instead (2003)

Kübler, Sandra

The definition of similarity between sentences is formulated on the levels of words, POS tags, and chunks (Abney 91; Abney 96). The evaluation of this approach shows that while precision and recall based on the PARSEVAL measures (Black et al. 91) do not reach state of the art Parsers yet (F1=87.19 on syntactic constituents, F1=77.78 including functionargument structure), the parser shows a very reliable performance where function-argument structure is concerned (F1=96.52). The lower F-scores are very often due to unattached constituents.

Unterricht mit lernungewohnten Teilnehmern (2010)

Köse, Nuray

Integration and social advancement in our time without a solid language skills are no longer possible. What has not been done for decades, they now try through the integration abroad and in Germany make up very successful. But German is unfortunately only the first, though perhaps the most important step for a successful integration. The next question should now be: Lack of integration in spite of good knowledge of German - why?

Suglasnički sustav južnomoslavačkih kajkavskih govora (2009)

Kuzmić, Martina

U radu se na temelju vlastitih terenskih istraživanja i literature prikazuje suglasnički sustav južnomoslavačkih kajkavskih govora, njegov inventar, distribucija i podrijetlo, na primjeru triju govora – Kutinskoga Sela, Osekova i Okešinca. Prikazuju se zajedničke i razlikovne značajke triju navedenih govora. Južnomoslavački kajkavski govori pripadaju južnomoslavačkomu ili donjolonjskomu dijalektu.

Deklinacija brojeva dva, oba, tri i četiri u kajkavskim pravnim tekstovima od 16. do 18. Stoljeća (2007)

Kuzmić, Boris ; Kuzmić, Martina

Autori se u članku bave deklinacijom brojeva dva, oba, tri i četiri u kajkavskim tekstovima pravne regulative od 16. do 18. stoljeća. Kao korpus za jezičnu analizu uzimaju 23 teksta iz 16. st., 40 tekstova iz 17. st. i 19 tekstova iz 18. st. U jezičnoj se analizi posebna pažnja posvećuje usporedbi između oblika dvojine i množine u deklinaciji brojeva dva i oba, kao i razvoju množinskih oblika u deklinaciji brojeva tri i četiri. Autori navode sve zabilježene oblike brojeva dva, oba, tri i četiri, uspoređuju njihovu pojavnost u različitom vremenskom presjeku i na temelju rezultata jezične analize nude deklinacijski tip navedenih brojeva. Deklinacija brojeva u kosim padežima promatra se s obzirom na to jesu li navedeni brojevi dijelom prijedložnih ili neprijedložnih izraza, a posebno je pitanje učestalosti indeklinabilnih oblika.

O čakavštini s dušom i znanjem : (Josip Lisac: Hrvatska dijalektologija 2. Čakavsko narječje. Golden marketing–Tehnička knjiga, Zagreb, 2009.) (2009)

Kurtović Budja, Ivana

Der Deutsche Familiennamenatlas als Inspirationsquelle : Jürgen - Udolph - Sechzig - Fünf (2009)

Kunze, Konrad ; Nübling, Damaris

Mit der Möglichkeit, anhand digitaler Telefonanschlüsse Familiennamen nach Bestand, Trägerzahl und räumlicher Verbreitung mit großer Genauigkeit zu erfassen, hat eine neue Epoche der Anthroponomastik begonnen. Der Schatz von 850661 verschiedenen Familiennamen, die im Jahre 2005 in 28205713 privaten Festnetzanschlüssen registriert waren, ist immens, und die Fragestellungen zu seiner Erforschung sind in ihrer Ausrichtung und in ihrer Anzahl unerschöpflich. In dieser Situation ergaben sich vordringlich zwei Aufgaben: Erstens musste angesichts der von Jahr zu Jahr wachsenden Bevölkerungsmobilität, angesichts der Auswirkung neuerer Namengesetzgebung und angesichts der schnell zunehmenden Ablösung lokalisierter Festnetzanschlüsse durch Mobiltelefone der Namenbestand spätestens jetzt aufgrund der zuverlässigsten Quelle und in legitim nutzbarer Weise gesichert und archiviert werden. Die geschichtlich gewachsenen Namenlandschaften sind gerade noch, und zwar in erstaunlicher Stabilität, erhalten. Die Daten wurden nach Klärung der Datenschutzfragen von der Deutschen Telekom auf Stand Juni 2005 dem Deutschen Familiennamenatlas zur Verfügung gestellt und ihre Nutzung zur namenkundlichen Forschung mit Vertrag vom 28.06.2005 geregelt.

Der Deutsche Familiennamenatlas (DFA) : Konzept, Konturen, Kartenbeispiele (2007)

Kunze, Konrad ; Nübling, Damaris

Die Familiennamen sind als einziger Bereich der europäischen Sprachen in ihrer ausgeprägten räumlichen Vielfalt noch höchst unzureichend erfasst. Noch sind die geschichtlich gewachsenen Namenlandschaften in erstaunlicher Stabilität erhalten. Sie werden im Bereich der Bundesrepublik Deutschland durch den seit 2005 in Kooperation der Universitäten Freiburg und Mainz in Angriff genommenen und durch die DFG geförderten 'Deutschen Familiennamenatlas' (OFA) auf der Basis von Telefonanschlüssen (Stand 2005) dokumentiert. Im vorliegenden Beitrag werden Vorarbeiten, Ziele, Gesamtanlage des Projekts, Systematik und Repräsentativität der Themenauswahl in den beiden Hauptteilen (grammatischer und lexikalischer Teil) sowie Kriterien und Methoden der inhaltlichen Konzipierung und formalen Gestaltung der Karten und Kommentare vorgestellt und begründet. Aus den genannten Vorarbeiten werden auch schon Perspektiven künftiger Auswertung der in den Datenbanken archivierten Materialien und der im Atlas exemplarisch dokumentierten Strukturen der Namenlandschaften ersichtlich.

Die Kenning als typische Stilfigur der germanischen und keltischen Dichtersprache (1930)

Krause, Wolfgang

Filologija sjećanja i uronjenost u prostor (Zadarski filološki dani II. Zbornik radova. Petar Zoranić i njegovi suvremenici. Slavenski prostori u putopisnoj literaturi i književnosti. Znanstveni rad akademika Dalibora Brozovića. Urednica: Divna Mrdeža Antonina, Sveučilište u Zadru / Odjel za kroatistiku i slavistiku, Zadar, 2009.) (2009)

Kramarić, Martina

Profesoru na dar : (Kroatologija, časopis za hrvatsku kulturu Hrvatskih studija Sveučilišta u Zagrebu, 2010., br. 1, 366 str.) (2010)

Kovačić, Mislav

Parni prijedlozi (2007)

Kovačević, Barbara ; Matas Ivanković, Ivana

U radu će biti obrađeni prijedlozi koji se često pojavljuju u paru: od i do, iz i u, s i na. Učestalom uporabom u tim parovima navedeni prijedlozi nadilaze svoja pojedinačna primarna semantička i sintaktička obilježja. Na njihovu strukturnu i semantičku cjelovitost upućuje i frazeološka uporaba.

Dugo iščekivan frazeološki dvojezičnik (Dalibor Vrgoč, Željka Fink Arsovski: Hrvatsko-engleski frazeološki rječnik, Naklada Ljevak, Zagreb, 2008.) (2008)

Kovačević, Barbara

Vježbenica jezične raščlambe : (Anđela Frančić, Boris Kuzmić: Jazik horvatski, Jezične raščlambe starih hrvatskih tekstova, Hrvatska sveučilišna naklada, Zagreb, 2009.) (2010)

Klinčić, Ivana

An HSPG-to-CFG Approximation of Japanese (2000)

Kiefer, Bernd ; Krieger, Hans-Ulrich ; Siegel, Melanie

We present a simple approximation method for turning a Head-Driven Phrase Structure Grammar into a context-free grammar. The approximation method can be seen as the construction of the least fixpoint of a certain monotonic function. We discuss an experiment with a large HPSG for Japanese.

Vom Deutschen Leben (IV) : eher unsystematische Anmerkungen zur deutsch-russischen Germanistik und zur Germanistik in Russland (2008)

Kelletat, Andreas F.

„Deutsch-russische Germanistik“ hat uns Dirk Kemper über das Programm dieser DAAD-Tagung hier an der RGGU in Moskau geschrieben und ein wenig hab ich gestutzt, was das denn für eine neue Teildisziplin unseres Faches Germanistik sein möchte, diese „Deutsch-russische Germanistik“.

Sarajevski žargon (Narcis Saračević: Rječnik sarajevskog žargona – prilog leksikografiji bosanskoga jezika, 2. izd.; Vrijeme, Zenica, 2007.) (2007)

Kekez, Stipe

Dalmatinska zagora rediviva – jezični prinos u sve većoj promidžbi Dalmatinske zagore (Ivica Gusić i Filip Gusić: Rječnik govora Dalmatinske zagore i zapadne Hercegovine, vlastita naklada, Zagreb, 2004.) (2007)

Kekez, Stipe

Wie durch gemeinsame, zielorientierte Projekte die Kooperation und der Austausch nicht zur Einbahnstraße werden (2008)

Karpov, Anatolij S.

Tief im Osten, gleichsam „am Rande der Welt“, in der Republik Burjatien (Russische Föderation), hinter dem Baikalsee gelegen und viele tausend Kilometer von europäischen Großstädten entfernt, hat der Erwerb der deutschen Sprache einen hohen Stellenwert – insbesondere für Deutschlehrer, Deutschlehrerausbilder und Deutschstudierende.

Wiederholungen im Konversationsunterricht als sprachliches Handeln (2010)

Karasu, Gönül

The interest of this work devotes itself to the repeating linguistic actions of the students in the DaF conversation lessons. Repetitions in the lesson discourse are functionally different than repetitions in the daily discourse. The support of repetitions by the students in the class discourse is tried to be demonstrated here on the basis of examples. Recordings from the DaF conversation lessons were transcribed and reconstructed according to Hiat. The kinds of the repetitions and their functions in these DaF conversation lessons are limited with this study. The findings of the study should be concerned consciously in order to accomplish a better understanding and reacting to these repeating actions of the students like inquiry, correction, confirmation, precautionary self-control, verification and confirmation in the conversation lessons –most of which are accomplished by the students for a certain aim however unconsciously.

Naglasak imeničnih i-osnovâ u Orubici (2010)

Kapović, Mate

U članku se podastire građa imeničnih i-osnova prikupljena terenskim istraživanjem u selu Orubica u zapadnoj Posavini. Kratko se predstavlja arhaičan orubički staroštokavski govor te se analiziraju neki naglasni i morfološki aspekti i-sklonidbe u Orubici.

Nezapaženi ulomci "Muke Isuhrstove" (1514.) iz petrogradske Berciceve zbirke (2008)

Kapetanović, Amir

Za svojega kratkoga boravka u Petrogradu 1912. I. Milcetic; opisao je bogatu Bercicevu zbirku glagoljickih rukopisa i tiskanih knjiga iz Ruske nacionalne knjižnice, ali nije stigao podrobno prouciti svaki sastavni dio Berciceve grade. U Milceticevu opisu kodeksa br. 1 (Klimantovicev zbornik, 1514.) spominje se prolog Muke, ali se ne upozorava da se u nastavku toga prologa nalazi ulomak iz srednjovjekovnoga prikazanja sa scenom Judine izdaje Isusa. Ta je scena u srednjem vijeku ponajviše uznemiravala puk jer se tada od svega najviše mrzila laž, izdaja i prijetvornost. U radu se opisuju i prvi put objavljuju ulomci nepoznate redakcije Muke Isuhrstove iz petrogradske Berciceve zbirke (sign. Bc 1), koji predstavljaju za sada najstariji zapisani prolog i scenu hrvatskoga srednjovjekovnoga prikazanja pasionske tematike. Stihovi ulomaka usporeduju se s mladom ciklickom Mukom Spasitelja našega iz glagoljickoga Zbornika prikazanja (1556.), s kojom se u korpusu hrvatskoga srednjovjekovnoga pjesništva ti ulomci najviše podudaraju.

Tree-local MCTAG with shared nodes : an analysis of word order variation in German and Korean (2004)

Kallmeyer, Laura ; Yoon, SinWon

Tree Adjoining Grammars (TAG) are known not to be powerful enough to deal with scrambling in free word order languages. The TAG-variants proposed so far in order to account for scrambling are not entirely satisfying. Therefore, an alternative extension of TAG is introduced based on the notion of node sharing. Considering data from German and Korean, it is shown that this TAG-extension can adequately analyse scrambling data, also in combination with extraposition and topicalization.

The TUSNELDA annotation standard : an XML encoding standard for multilingual corpora supporting various aspects of linguistic research (2000)

Kallmeyer, Laura ; Wagner, Andreas

This paper proposes a corpus encoding standard that meets the needs of linguistic research using a variety of linguistic data structures. The standard was developed in SFB 441, a research project at the University of Tuebingen. The principal concern of SFB 441 are the empirical data structures which feed into linguistic theory building. SFB 441 consists of several projects, most of which are building corpora to empirically investigate various linguistic phenomena in various languages (e.g. modal verbs in German, forms of address and politeness in Russian). These corpora will form the components of the "Tuebingen collection of reusable, empirical, linguistic data structures (TUSNELDA)". The TUSNELDA annotation standard aims at providing a uniform encoding scheme for all subcorpora and texts of TUSNELDA such that they can be processed with uniform standardized tools. To guarantee maximal reusability we use XML for encoding. Previous SGML standards for text encoding were provided by the Text Encoding Initiative (TEI) and the Expert Advisory Group on Language Engineering Standards (Corpus Encoding Standard, CES). The TUSNELDA standard is based on TEI and XCES (XML version of CES) but takes into account the specific needs of the SFB projects, i.e. the peculiarities of the examined languages and linguistic phenomena.

LTAG analysis for pied-piping and stranding of wh-phrases (2004)

Kallmeyer, Laura ; Scheffler, Tatjana

In this paper we propose a syntactic and semantic analysis of complex questions. We consider questions involving pied piping and stranding and we propose elementary trees and semantic representations that allow to account for both constructions in a uniform way.

A polynomial-time parsing algorithm for TT-MCTAG (2009)

Kallmeyer, Laura ; Satta, Giorgio

This paper investigates the class of Tree-Tuple MCTAG with Shared Nodes, TT-MCTAG for short, an extension of Tree Adjoining Grammars that has been proposed for natural language processing, in particular for dealing with discontinuities and word order variation in languages such as German. It has been shown that the universal recognition problem for this formalism is NP-hard, but so far it was not known whether the class of languages generated by TT-MCTAG is included in PTIME. We provide a positive answer to this question, using a new characterization of TT-MCTAG.

LTAG semantics with semantic unification (2004)

Kallmeyer, Laura ; Romero, Maribel

This paper sets up a framework for LTAG (Lexicalized Tree Adjoining Grammar) semantics that brings together ideas from different recent approaches addressing some shortcomings of TAG semantics based on the derivation tree. Within this framework, several sample analyses are proposed, and it is shown that the framework allows to analyze data that have been claimed to be problematic for derivation tree based LTAG semantics approaches.

Quantifier scope in German : an MCTAG analysis (2006)

Kallmeyer, Laura ; Romero, Maribel

Relative quantifier scope in German depends, in contrast to English, very much on word order. The scope possibilities of a quantifier are determined by its surface position, its base position and the type of the quantifier. In this paper we propose a multicomponent analysis for German quantifiers computing the scope of the quantifier, in particular its minimal nuclear scope, depending on the syntactic configuration it occurs in.

Reflexives and reciprocals in LTAG (2007)

Kallmeyer, Laura ; Romero, Maribel

This paper presents an LTAG analysis of reflexives like himself and reciprocals like each other. These items need to find a c-commanding antecedent from which they retrieve (part of) their own denotation and with which they syntactically agree. The relation between anaphoric item and antecendent must satisfy the following important locality conditions (Chomsky (1981)).

Constraint-based computational semantics : a comparison between LTAG and LRS (2006)

Kallmeyer, Laura ; Richter, Frank

This paper compares two approaches to computational semantics, namely semantic unification in Lexicalized Tree Adjoining Grammars (LTAG) and Lexical Resource Semantics (LRS) in HPSG. There are striking similarities between the frameworks that make them comparable in many respects. We will exemplify the differences and similarities by looking at several phenomena. We will show, first of all, that many intuitions about the mechanisms of semantic computations can be implemented in similar ways in both frameworks. Secondly, we will identify some aspects in which the frameworks intrinsically differ due to more general differences between the approaches to formal grammar adopted by LTAG and HPSG.

On the relation between multicomponent tree adjoining grammars with tree tuples (TT-MCTAG) and range concatenation grammars (RCG) (2008)

Kallmeyer, Laura ; Parmentier, Yannick

This paper investigates the relation between TT-MCTAG, a formalism used in computational linguistics, and RCG. RCGs are known to describe exactly the class PTIME; simple RCG even have been shown to be equivalent to linear context-free rewriting systems, i.e., to be mildly context-sensitive. TT-MCTAG has been proposed to model free word order languages. In general, it is NP-complete. In this paper, we will put an additional limitation on the derivations licensed in TT-MCTAG. We show that TT-MCTAG with this additional limitation can be transformed into equivalent simple RCGs. This result is interesting for theoretical reasons (since it shows that TT-MCTAG in this limited form is mildly context-sensitive) and, furthermore, even for practical reasons: We use the proposed transformation from TT-MCTAG to RCG in an actual parser that we have implemented.

Convertir des grammaires darbres adjoints à composantes multiples avec tuples d’arbres (TT-MCTAG) en grammaires à concaténation d’intervalles (RCG) (2008)

Kallmeyer, Laura ; Parmentier, Yannick

Cet article étudie la relation entre les grammaires darbres adjoints à composantes multiples avec tuples darbres (TT-MCTAG), un formalisme utilisé en linguistique informatique, et les grammaires à concaténation dintervalles (RCG). Les RCGs sont connues pour décrire exactement la classe PTIME, il a en outre été démontré que les RCGs « simples » sont même équivalentes aux systèmes de réécriture hors-contextes linéaires (LCFRS), en dautres termes, elles sont légèrement sensibles au contexte. TT-MCTAG a été proposé pour modéliser les langages à ordre des mots libre. En général ces langages sont NP-complets. Dans cet article, nous définissons une contrainte additionnelle sur les dérivations autorisées par le formalisme TT-MCTAG. Nous montrons ensuite comment cette forme restreinte de TT-MCTAG peut être convertie en une RCG simple équivalente. Le résultat est intéressant pour des raisons théoriques (puisqu’il montre que la forme restreinte de TT-MCTAG est légèrement sensible au contexte), mais également pour des raisons pratiques (la transformation proposée ici a été utilisée pour implanter un analyseur pour TT-MCTAG).

Un algorithme d'analyse de type earley pour grammaires à concaténation d'intervalles (2009)

Kallmeyer, Laura ; Maier, Wolfgang ; Parmentier, Yannick

Nous présentons ici différents algorithmes d’analyse pour grammaires à concaténation d’intervalles (Range Concatenation Grammar, RCG), dont un nouvel algorithme de type Earley, dans le paradigme de l’analyse déductive. Notre travail est motivé par l’intérêt porté récemment à ce type de grammaire, et comble un manque dans la littérature existante.

An earley parsing algorithm for range concatenation grammars (2009)

Kallmeyer, Laura ; Maier, Wolfgang ; Parmentier, Yannick

We present a CYK and an Earley-style algorithm for parsing Range Concatenation Grammar (RCG), using the deductive parsing framework. The characteristic property of the Earley parser is that we use a technique of range boundary constraint propagation to compute the yields of non-terminals as late as possible. Experiments show that, compared to previous approaches, the constraint propagation helps to considerably decrease the number of items in the chart.

TuLiPA : towards a multi-formalism parsing environment for grammar engineering (2008)

Kallmeyer, Laura ; Lichte, Timm ; Maier, Wolfgang ; Parmentier, Yannick ; Dellert, Johannes ; Evang, Kilian

In this paper, we present an open-source parsing environment (Tübingen Linguistic Parsing Architecture, TuLiPA) which uses Range Concatenation Grammar (RCG) as a pivot formalism, thus opening the way to the parsing of several mildly context-sensitive formalisms. This environment currently supports tree-based grammars (namely Tree-Adjoining Grammars (TAG) and Multi-Component Tree-Adjoining Grammars with Tree Tuples (TT-MCTAG)) and allows computation not only of syntactic structures, but also of the corresponding semantic representations. It is used for the development of a tree-based grammar for German.

Developing a TT-MCTAG for German with an RCG-based parser (2008)

Kallmeyer, Laura ; Lichte, Timm ; Maier, Wolfgang ; Parmentier, Yannick ; Dellert, Johannes

Developing linguistic resources, in particular grammars, is known to be a complex task in itself, because of (amongst others) redundancy and consistency issues. Furthermore some languages can reveal themselves hard to describe because of specific characteristics, e.g. the free word order in German. In this context, we present (i) a framework allowing to describe tree-based grammars, and (ii) an actual fragment of a core multicomponent tree-adjoining grammar with tree tuples (TT-MCTAG) for German developed using this framework. This framework combines a metagrammar compiler and a parser based on range concatenation grammar (RCG) to respectively check the consistency and the correction of the grammar. The German grammar being developed within this framework already deals with a wide range of scrambling and extraction phenomena.

Factoring predicate argument and scope semantics : underspecified semantics with LTAG (1999)

Kallmeyer, Laura ; Joshi, Aravind K.

This paper proposes a compositional semantics for lexicalized tree adjoining grammars (LTAG). Tree-local multicompnent derivations allow seperation of semantiv contribution of a lexical item into one component contributing to the predicate argument structure and second a component contributing to scope semantics. Based on this idea a syntx-semantics interface is presented where the compositional semantics depends only on the derivation structure. It is shown that the derivation structure allows an appropriate amount of underspecification. This is illustrated by investigating underspecified representations for quantifier scpoe ambiguities and related phenomena such as adjunct scope and island constraints.

Factoring Predicate Argument and Scope Semantics : underspecified Semantics with LTAG (2003)

Kallmeyer, Laura ; Joshi, Aravind K.

In this paper we propose a compositional semantics for lexicalized tree-adjoining grammar (LTAG). Tree-local multicomponent derivations allow separation of the semantic contribution of a lexical item into one component contributing to the predicate argument structure and a second component contributing to scope semantics. Based on this idea a syntax-semantics interface is presented where the compositional semantics depends only on the derivation structure. It is shown that the derivation structure (and indirectly the locality of derivations) allows an appropriate amount of underspecification. This is illustrated by investigating underspecified representations for quantifier scope ambiguities and related phenomena such as adjunct scope and island constraints.

Comparing lexicalized grammar formalisms in an empirically adequate way : the notion of generative attachment capacity (2006)

Kallmeyer, Laura

The work presented here addresses the question of how to determine whether a grammar formalism is powerful enough to describe natural languages. The expressive power of a formalism can be characterized in terms of i) the string languages it generates (weak generative capacity (WGC)) or ii) the tree languages it generates (strong generative capacity (SGC)). The notion of WGC is not enough to determine whether a formalism is adequate for natural languages. We argue that even SGC is problematic since the sets of trees a grammar formalism for natural languages should be able to generate is difficult to determine. The concrete syntactic structures assumed for natural languages depend very much on theoretical stipulations and empirical evidence for syntactic structures is rather hard to obtain. Therefore, for lexicalized formalisms, we propose to consider the ability to generate certain strings together with specific predicate argument dependencies as a criterion for adequacy for natural languages.

A descriptive characterization of multicomponent tree adjoining grammars (2005)

Kallmeyer, Laura

Multicomponent Tree Adjoining Grammars (MCTAG) is a formalism that has been shown to be useful for many natural language applications. The definition of MCTAG however is problematic since it refers to the process of the derivation itself: a simultaneity constraint must be respected concerning the way the members of the elementary tree sets are added. Looking only at the result of a derivation (i.e., the derived tree and the derivation tree), this simultaneity is no longer visible and therefore cannot be checked. I.e., this way of characterizing MCTAG does not allow to abstract away from the concrete order of derivation. Therefore, in this paper, we propose an alternative definition of MCTAG that characterizes the trees in the tree language of an MCTAG via the properties of the derivation trees the MCTAG licences.

Scrambling in german and the non-locality of local TDGs (2000)

Kallmeyer, Laura

Existing analyses of German scrambling phenomena within TAG-related formalisms all use non-local variants of TAG. However, there are good reasons to prefer local grammars, in particular with respect to the use of the derivation structure for semantics. Therefore this paper proposes to use local TDGs, a TAG-variant generating tree descriptions that shows a local derivation structure. However the construction of minimal trees for the derived tree descriptions is not subject to any locality constraint. This provides just the amount of non-locality needed for an adequate analysis of scrambling. To illustrate this a local TDG for some German scrambling data is presented.

Local tree description grammars (1997)

Kallmeyer, Laura

A lot of interest has recently been paid to constraint-based definitions and extensions of Tree Adjoining Grammars (TAG). Examples are the so-called quasi-trees, D-Tree Grammars and Tree Description Grammars. The latter are grammars consisting of a set of formulars denoting trees. TDGs are derivation based where in each derivation step a conjunction is built of the old formular, a formular of the grammar and additional equivalences between node names of the two formulars. This formalism is more powerfull than TAGs. TDGs offer the advantages of MC-TAG and D-Tree Grammars for natural languages and they allow underspecification. However the problem is that TDGs might be unnecessarily powerfull for natural languages. To solve this problem, in this paper, I will propose a local TDGs, a restricted version of TDGs. Local TDGs still have the advantages of TDGs but they are semilinear and therefore more appropriate for natural languages. First, the notion of the semilinearity is defined. Then local TDGs are introduced, and, finally, semilinearity of local Tree Description Languages is proven.

A hierarchy of local TDGs (1998)

Kallmeyer, Laura

Many recent variants of Tree Adoining Grammars (TAG) allow an underspecifiaction of the parent relation between nodes in a tree, i.e. they do not deal with fully specified trees as it is the case with TAGs.Such TAG variants are for example Description Tree Grammars (DTG), Unordered Vector Grammars with Dominance Links (UVG-DL), a definition of TAGs via so-called quasi trees and Tree Description Grammars (TDG. The last TAg variant, local TDG, is an extension of TAG generating Tree Descriptions. Local TDGs even allow an underspecification of the dominance relation between node names and thereby provide the possibility to generate underspecified representations for structural ambiguities such as quantifier scope ambiguities. This abstract deals with formal properties of local TDGs. A hierarchiy of local TDGs is established together with a pumping lemma for local TDGs of a certain rank.

A declarative characterization of different types of multicomponent tree adjoining grammars (2009)

Kallmeyer, Laura

Multicomponent Tree Adjoining Grammars (MCTAGs) are a formalism that has been shown to be useful for many natural language applications. The definition of non-local MCTAG however is problematic since it refers to the process of the derivation itself: a simultaneity constraint must be respected concerning the way the members of the elementary tree sets are added. Looking only at the result of a derivation (i.e., the derived tree and the derivation tree), this simultaneity is no longer visible and therefore cannot be checked. I.e., this way of characterizing MCTAG does not allow to abstract away from the concrete order of derivation. In this paper, we propose an alternative definition of MCTAG that characterizes the trees in the tree language of an MCTAG via the properties of the derivation trees (in the underlying TAG) the MCTAG licences. We provide similar characterizations for various types of MCTAG. These characterizations give a better understanding of the formalisms, they allow a more systematic comparison of different types of MCTAG, and, furthermore, they can be exploited for parsing.

A declarative characterization of different types of multicomponent tree adjoining grammars (2007)

Kallmeyer, Laura

Multicomponent Tree Adjoining Grammars (MCTAG) is a formalism that has been shown to be useful for many natural language applications. The definition of MCTAG however is problematic since it refers to the process of the derivation itself: a simultaneity constraint must be respected concerning the way the members of the elementary tree sets are added. This way of characterizing MCTAG does not allow to abstract away from the concrete order of derivation. In this paper, we propose an alternative definition of MCTAG that characterizes the trees in the tree language of an MCTAG via the properties of the derivation trees (in the underlying TAG) the MCTAG licences. This definition gives a better understanding of the formalism, it allows a more systematic comparison of different types of MCTAG, and, furthermore, it can be exploited for parsing.

Flexible composition in LTAG : quantifier scope and inverse linking (2003)

Joshi, Aravind K. ; Kallmeyer, Laura ; Romero, Maribel

This paper addresses the problem ofconstraints for relative quantifier sope, in partiular in inverse linking readings wherecertain scope orders are exluded. We show how to account for such restrictions in the Tree Adjoining Grammar (TAG) framework by adopting a notion offlexible composition. In the semantics we use for TAG we introduce quantifier sets that group quantifiers that are "glued" together in the sense that no other quantifieran scopally intervene between them. Theflexible composition approach allows us to obtain the desired quantifier sets and thereby the desiredconstraints for quantifier sope.

Linguistics in cognitive science : the state of the art (2007)

Jackendoff, Ray

The special issue of The Linguistic Review on "The Role of Linguistics in Cognitive Science" presents a variety of viewpoints that complement or contrast with the perspective offered in Foundations of Language (Jackendoff 2002a). The present article is a response to the special issue. It discusses what it would mean to integrate linguistics into cognitive science, then shows how the parallel architecture proposed in Foundations seeks to accomplish this goal by altering certain fundamental assumptions of generative grammar. It defends this approach against criticisms both from mainstream generative grammar and from a variety of broader attacks on the generative enterprise, and it reflects on the nature of Universal Grammar. It then shows how the parallel architecture applies directly to processing and defends this construal against various critiques. Finally, it contrasts views in the special issue with that of Foundations with respect to what is unique about language among cognitive capacities, and it conjectures about the course of the evolution of the language faculty.

POS tagging for German : how important is the right context? (2008)

Ivanova, Steliana ; Kübler, Sandra

Part-of-Speech tagging is generally performed by Markov models, based on bigram or trigram models. While Markov models have a strong concentration on the left context of a word, many languages require the inclusion of right context for correct disambiguation. We show for German that the best results are reached by a combination of left and right context. If only left context is available, then changing the direction of analysis and going from right to left improves the results. In a version of MBT (Daelemans et al., 1996) with default parameter settings, the inclusion of the right context improved POS tagging accuracy from 94.00% to 96.08%, thus corroborating our hypothesis. The version with optimized parameters reaches 96.73%.

Was hat NLP mit Sprachwissenschaft zu tun? (2010)

İnce, Sedat

In this study the relationship between NLP and Linguistics has been investigated. Korzybski, who is interested in the neurological aspect of language puts forth that an artificial identification has been established due to verb “to be”. The notion he developed because of this connection forms the basic idea of NLP. What Chomsky’s studies contribute to NLP are “surface - deep structure” in Generative Transformational Grammar approach. According to this we express what we utter in daily speech with surface structure, but we make them meaningful with deep structure. NLP has transformed this knowledge into various techniques and practices for a more effective communication and happier life.

"Da li, je li i li" : normativni status i raspodjela (2007)

Hudeček, Lana ; Vukojević, Luka

U radu se daje pregled problema povezanih s normativnim statusom čestičnih/vezničkih skupina da li, je li i čestice/veznika li. Pokazuje se da postoji nekoliko pogrješaka povezanih s tumačenjem normativnog statusa i raspodjele tih skupina i te čestice te se provjerava normativno pravilo prema kojemu skupinu da li treba u standardnome jeziku zamijeniti česticom li (o tome se posve pogrješno često piše kao o zamjeni da li s je li, a skupina je li, s iznimkom skupine je li da koja ima funkciju dopunskoga pitanja, u standardnome jeziku ne postoji kao čestična/veznička skupina jer je njezin prvi član uvijek 3. lice prezenta glagola biti). Određuje se normativni status skupine je li, tj. pokazuje se da je ona u hrvatskome jeziku ili zastarjela ili da pripada razgovornomu stilu. Također se provjeravaju pravila u skladu s kojima se normativni status skupine da li u izravnome pitanju razlikuje od njezina statusa u neizravnome pitanju i prema kojima se skupina da li i u standardnome jeziku pojavljuje pri izricanju potvrdnosti te u alternativnim pitanjima. Donose se uvjeti zamjenjivosti skupina da li česticom/veznikom li, tj. izdvajaju se sintaktički konteksti u kojima ta zamjena nije potrebna ili nije moguća.

Veznička sinonimija i antonimija u hrvatskoj leksikografiji (2008)

Hudeček, Lana ; Mihaljević, Milica

U radu se razmatra problem vezničke sinonimije i antonimije. O sinonimiji i antonimiji govori se u jezikoslovnoj literaturi u pravilu kad je riječ o tradicionalno punoznačnim riječima, a o sinonimiji i antonimiji tradicionalno nepunoznačnih riječi rijetko se piše. O problemu sinonimije i antonimije ostalih nepunoznačnih riječi (usklika, zamjenica, prijedloga, odnosnih priloga i čestica) autorice su već pisale u radu Sinonimija i antonimija nepunoznačnih riječi u hrvatskoj leksikografiji. Ovaj je rad svojevrsna dopuna tomu radu te se u njemu osobita pozornost posvećuje vezničkoj sinonimiji i antonimiji (koja se pojavljuje tek iznimno) u hrvatskoj leksikografiji. Razmatra se problem sintaktičke sinonimije i antonimije te pokazuje da se u hrvatskim jednojezičnim rječnicima veoma malo pozornosti poklanja donošenju sinonima i antonima uopće, a osobito uz veznike (također i uz prijedloge, zamjenice, priloge, usklike i čestice) te da se oni i u definiciji značenja i u posebnoj rubrici donose samo iznimno i nesustavno. Objašnjavaju se načela po kojima je moguće uspostaviti sinonimne i (iznimno) antonimne nizove za veznike, a koja su uspostavljena i provedena pri izradbi Školskoga rječnika hrvatskog jezika, čije su urednice autorice ovog rada, te koja se razrađuju i dopunjuju kako bi se primijenila i u hrvatskome jednosvezačnom normativnom rječniku koji se izrađuje u Institutu za hrvatski jezik i jezikoslovlje u Zagrebu.

Homonimija kao leksikografski problem (2009)

Hudeček, Lana ; Mihaljević, Milica

U radu se preispituju uobičajena određenja homonimije i kriteriji razgraničenja homonimije od srodnih pojava. Homonimiji se pristupa kao praktičnomu leksikografskom problemu te se daju konkretni primjeri leksikografske obradbe homonimnih natuknica iz Školskog rječnika hrvatskog jezika koji se izrađuje u Institutu za hrvatski jezik i jezikoslovlje.

Tko je na "frazeološkom brodu" važniji: 'mali od palube' ili 'mali od kužine'? (2008)

Hrnjak, Anita

Predmet interesa ovoga rada dvije su ustaljene sveze riječi koje su dio hrvatskoga pomorskog nazivlja, a u jeziku se često koriste s frazeološkim značenjem. U radu se analizira njihovo podrijetlo, moguća frazeološka značenja i konteksti u kojima se upotrebljavaju. Potencijalni frazemi uspoređuju se i s njihovim ruskim frazeološkim značenjskim ekvivalentima. Ključne riječi: hrvatska frazeologija; ruska frazeologija; hrvatsko pomorsko nazivlje; potencijalni frazemi mali od palube i mali od kužine

Kulinarski elementi u hrvatskoj i ruskoj frazeologiji (2007)

Hrnjak, Anita

U ovome radu analizira se dio korpusa hrvatskih i ruskih frazema s kulinarskim elementima kao komponentаma i onih koji u svom semantičkom talogu imaju sliku povezanu s jelom. Cilj rada je prikazati simbolički, metaforički i konotativni potencijal hrane kao frazeološke komponente putem analize načina izgradnje frazeološkog značenja, te istaknuti najočitije sličnosti i najzanimljivije razlike između ovakvog tipa frazeologije u hrvatskom i ruskom jeziku.

Posljedice internacionalizacije u hrvatskome jeziku (2010)

Horvat, Marijana ; Štebih Golub, Barbara

Posljedice globalizacijskih procesa vidljive su i u jeziku kao tendencija k internacionalizaciji. Internacionalizacija, ili možda preciznije angloamerikanizacija, zahvatila je sve jezike modernoga svijeta. Hrvatski jezik nije iznimka. U ovom ćemo radu spomenuti samo neke od posljedica internacionalizacije, i to na leksičkoj, tvorbenoj i semantičkoj razini.

Tvorba glagola u djelu "Svašta po malo" Blaža Tadijanovića (2008)

Horvat, Marijana ; Ramadanović, Ermina

U radu se analiziraju glagolske tvorenice u Tadijanovićevu jezikoslovnom priručniku "Svašta po malo iliti kratko složenje imena i riči u ilirski i njemački jezik" (1761.), što je nastavak cjelovitoga istraživanja Tadijanovićevih tvorbenih modela. Autorice otvaraju i neka bitna teorijska pitanja iz područja glagolske tvorbe te upozoravaju na probleme, nedosljednosti i različita tumačenja pri raščlambi tvorbenih načina koje nalazimo u relevantnoj literaturi.

O tvorbi riječi u Tadijanovićevu djelu "Svašta po malo" (2007)

Horvat, Marijana ; Ramadanović, Ermina

U radu se obrađuju načini tvorbe pridjeva, priloga, prijedloga, zamjenica i veznika na primjerima iz Tadijanovićeva djela „Svašta po malo“. Posebno se upozorava na tipove tvorba koji su neobični zbog značenja koje ima tvorenica, na tvorbu neuobičajenih tvorenica prema već postojećim modelima, na različite pristupe i tumačenja u određivanju tvorbenih načina te na odnos motiviranih i nemotiviranih riječi sa stajališta povijesne i suvremene tvorbe. Analizirani se primjeri uspoređuju s potvrdama iz „Rječnika hrvatskoga ili srpskoga jezika JAZU“.

Sintaktička svojstva zamjenica u Marulićevu i Kašićevu prijevodu De Imitatione Christi (2009)

Horvat, Marijana ; Perić Gavrančić, Sanja

U radu se analizira latinski sintaktički utjecaj pri uporabi zamjenica u Marulićevu i Kašićevu prijevodu popularnoga srednjovjekovnog djela De imitatione Christi. Istražuju se ova sintaktička svojstava zamjenica: izricanje pripadanja 3. licu s pomoću genitiva ličnih zamjenica za 3. lice, uporaba posvojne zamjenice za 1. i 2. lice u odnosu na povratno-posvojnu zamjenicu svoj, uporaba povratno-posvojne zamjenice svoj u odnosu na posvojne za 3. lice, uporaba lične zamjenice za 1. i 2. lice u odnosu na povratnu zamjenicu, uporaba odnosnih zamjenica na početku rečenice te množina srednjega roda pokaznih, relativne (koji) i neodređenih (sav, svaki) zamjenica u značenju jednine. Na temelju promatranih kategorija autorice nastoje utvrditi sličnosti i razlike tih dvaju proznih prijevodnih tekstova te objasniti prevoditeljski postupak. Ključne riječi: De imitatione Christi ; Marko Marulić ; Bartol Kašić ; uporaba zamjenica ; sintaktičke prevedenice

Mažuranićeva Slovnica iznova (Antun Mažuranić: Slovnica Hèrvatska za gimnazije i realne škole. Dio I. Rĕčoslovje. Pretisak. Predgovor: Radoslav Katičić. Pogovor: Željka Brlobaš. Institut za hrvatski jezik i jezikoslovlje, Zagreb, 2008.) (2008)

Horvat, Marijana

The language situation in Luxembourg (2008)

Horner, Kristine ; Weber, Jean Jacques

This monograph describes the overall language situation in Luxembourg, a highly multilingual country in Western Europe, from a language policy and planning perspective. The first part discusses the social and historical contexts, including major societal changes and uncertainties about the future, which are bound up with Europeanisation and the accelerated processes of globalisation. It also deconstructs the notions of Luxembourgish as a 'minority language' and French as the 'language of prestige', and describes a two-pronged language ideology that allows for either monolingual identification with Luxembourgish or trilingual identification with the languages recognised by the language law of 1984 (Luxembourgish / German / French). The second part discusses the trilingual school-system, a system in which large numbers of romanophone students are forced to go through a German-language literacy programme. The third part provides an overview of language spread in the areas of the media and literary writing. The fourth part examines language purism and tensions concerning the standardisation of Luxembourgish, as well as the debates about language requirements for citizenship. The discussion shows how language policy scholarship needs to be approached from a multidimensional perspective, that is, by taking into account dynamics on the global, regional and local levels in addition to those at the state level.

Open Access

Linguistik

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Institute

438 search hits