Linguistik
Refine
Year of publication
- 2009 (171) (remove)
Document Type
- Article (100)
- Review (16)
- Part of a Book (15)
- Preprint (13)
- Conference Proceeding (7)
- Report (7)
- Book (5)
- Part of Periodical (3)
- Doctoral Thesis (2)
- Periodical (2)
Language
Keywords
- Deutsch (40)
- Linguistik (27)
- Rezension (24)
- Phraseologie (9)
- Deutschunterricht (6)
- Pragmatik (6)
- Tschechisch (6)
- Deutsch als Fremdsprache (5)
- Kajkavisch (5)
- Literatur (5)
Institute
Manual development of deep linguistic resources is time-consuming and costly and therefore often described as a bottleneck for traditional rule-based NLP. In my PhD thesis I present a treebank-based method for the automatic acquisition of LFG resources for German. The method automatically creates deep and rich linguistic presentations from labelled data (treebanks) and can be applied to large data sets. My research is based on and substantially extends previous work on automatically acquiring wide-coverage, deep, constraint-based grammatical resources from the English Penn-II treebank (Cahill et al.,2002; Burke et al., 2004; Cahill, 2004). Best results for English show a dependency f-score of 82.73% (Cahill et al., 2008) against the PARC 700 dependency bank, outperforming the best hand-crafted grammar of Kaplan et al. (2004). Preliminary work has been carried out to test the approach on languages other than English, providing proof of concept for the applicability of the method (Cahill et al., 2003; Cahill, 2004; Cahill et al., 2005). While first results have been promising, a number of important research questions have been raised. The original approach presented first in Cahill et al. (2002) is strongly tailored to English and the datastructures provided by the Penn-II treebank (Marcus et al., 1993). English is configurational and rather poor in inflectional forms. German, by contrast, features semi-free word order and a much richer morphology. Furthermore, treebanks for German differ considerably from the Penn-II treebank as regards data structures and encoding schemes underlying the grammar acquisition task. In my thesis I examine the impact of language-specific properties of German as well as linguistically motivated treebank design decisions on PCFG parsing and LFG grammar acquisition. I present experiments investigating the influence of treebank design on PCFG parsing and show which type of representations are useful for the PCFG and LFG grammar acquisition tasks. Furthermore, I present a novel approach to cross-treebank comparison, measuring the effect of controlled error insertion on treebank trees and parser output from different treebanks. I complement the cross-treebank comparison by providing a human evaluation using TePaCoC, a new testsuite for testing parser performance on complex grammatical constructions. Manual evaluation on TePaCoC data provides new insights on the impact of flat vs. hierarchical annotation schemes on data-driven parsing. I present treebank-based LFG acquisition methodologies for two German treebanks. An extensive evaluation along different dimensions complements the investigation and provides valuable insights for the future development of treebanks.
Rječotvorni načini hrvatskoga jezika temelje se na ulančavanju morfema. U radu se opisuju tri tvorbena načina kojih nema u autohtonu, naslijeđenu hrvatskom leksiku – jedan koji se također temelji na morfemskoj raščlambi (infiksacija), dva kojima su temelji drugačiji (reduplikacija i leksička fuzija). Rad želi troje: i) istaknuti pojedine nedosljednosti postojećih opisa hrvatske morfologije, ii) opisati pojedinačne pozajmljene i domaće hrvatske lekseme i konstrukcije u kojima se o tim trima tvorbama može govoriti; iii) predvidjeti mogu li se neautohtoni tvorbeni načini i u kojoj mjeri importirati iz stranih jezika, danas ponajprije (jedino) engleskoga.
Der vorliegende Beitrag stellt eine linguistische Studie vor, die zwei nicht nah verwandte Sprachen auf dem Gebiet der Wortbildung vergleicht – das Deutsche und das Tschechische. Das Forschungsziel der Arbeit stellt die Untersuchung deutscher Determinativkomposita und der ihnen entsprechenden Wortbildungskonstruktionen im Tschechischen dar. Es wird eine relativ junge sprachwissenschaftliche Disziplin, die Korpuslinguistik, rein praktisch vorgestellt und die konkreten Ergebnisse der kontrastiven Untersuchung werden präsentiert.
Nous présentons ici différents algorithmes d’analyse pour grammaires à concaténation d’intervalles (Range Concatenation Grammar, RCG), dont un nouvel algorithme de type Earley, dans le paradigme de l’analyse déductive. Notre travail est motivé par l’intérêt porté récemment à ce type de grammaire, et comble un manque dans la littérature existante.
In this paper, we argue that difficulties in the definition of coreference itself contribute to lower inter-annotator agreement in certain cases. Data from a large referentially annotated corpus serves to corroborate this point, using a quantitative investigation to assess which effects or problems are likely to be the most prominent. Several examples where such problems occur are discussed in more detail, and we then propose a generalisation of Poesio, Reyle and Stevenson’s Justified Sloppiness Hypothesis to provide a unified model for these cases of disagreement and argue that a deeper understanding of the phenomena involved allows to tackle problematic cases in a more principled fashion than would be possible using only pre-theoretic intuitions.
The present study argues that variation across listeners in the perception of a non-native contrast is due to two factors: the listener-specic weighting of auditory dimensions and the listener-specic construction of new segmental representations. The interaction of both factors is shown to take place in the perception grammar, which can be modelled within an OT framework. These points are illustrated with the acquisition of the Dutch three-member labiodental contrast [V v f] by German learners of Dutch, focussing on four types of learners from the perception study by Hamann and Sennema (2005a).
Verbalaktion ist Körperaktion : Bemerkungen zur metaphorischen Konzeptualisierung von Sprechakten
(2009)
Tento článek pojednává o konceptuální metafoře "slovní jednání je tělesné jednání". Autor poukazuje na to, že se jedná o strukturní metaforu, spočívající za porozuměním řeči. Článek se skládá z pěti částí. Nejprve jsou prezentovány cíle a metody teorie řečových aktů (kapitoly 1-3). Za nejdůležitější je považována analýza některých slov a frází: také idiomy (kapitola čtvrtá) patří k těm vyjádřením, ve kterých se očekává výskyt konceptuálních metafor. Analýza metafory zmíněné v titulu a výhled tvoří čtvrtou cast článku (kapitoly 6 a 7). V poslední kapitole jsou prezentovány výsledky.
Eigennamen vereinen viele Besonderheiten auf sich. Dazu gehört, dass wir im Fall der Rufnamen (= Vornamen) direkten und freien Zugriff auf ein riesiges Nameninventar haben, d. h. Eltern können ihr Kind, linguistisch betrachtet ein neues Referenzobjekt, mit einem (oder mehreren) Namen eigener Wahl versehen. Darin sind sie heute vollkommen frei, d. h. die Namen werden fast nur noch nach Geschmack (Wohlklang/Euphonie, Harmonie zum Familiennamen etc.) ausgesucht. Diese sog. freie Namenwahl ist noch nicht sehr alt, etwa gut 100 Jahre. Bis ins 19. Jh. hinein galt (mehr oder weniger) die sog. gebundene Namenwahl, d.h. die Nachbenennung der Kinder nach Familienangehörigen, nach Paten, nach Heiligen, nach Herrschern und anderen Personen.