Refine
Year of publication
Document Type
- Article (181)
- Part of Periodical (69)
- Preprint (62)
- Book (37)
- Part of a Book (28)
- Conference Proceeding (26)
- Working Paper (15)
- Report (8)
- Doctoral Thesis (4)
- Other (3)
Language
- English (437) (remove)
Has Fulltext
- yes (437) (remove)
Is part of the Bibliography
- no (437) (remove)
Keywords
- Computerlinguistik (28)
- Deutsch (20)
- Syntax (16)
- Japanisch (15)
- new species (11)
- Grammatik (10)
- Multicomponent Tree Adjoining Grammar (9)
- Optimalitätstheorie (9)
- Maschinelle Übersetzung (8)
- Syntaktische Analyse (8)
Institute
- Extern (437) (remove)
"Ich mag so Wasserpfeifeladen" : the interaction of grammar and information structure in Kiezdeutsch
(2008)
This article presents linguistic features of and educational approaches to a new variety of German that has emerged in multi-ethnic urban areas in Germany: Kiezdeutsch (‘Hood German’). From a linguistic point of view, Kiezdeutsch is very interesting, as it is a multi-ethnolect that combines features of a youth language with those of a contact language. We will present examples that illustrate the grammatical productivity and innovative potential of this variety. From an educational perspective, Kiezdeutsch has also a high potential in many respects: school projects can help enrich intercultural communication and weaken derogatory attitudes. In grammar lessons, Kiezdeutsch can be a means to enhance linguistic competence by having the adolescents analyse their own language. Keywords: German, Kiezdeutsch, multi-ethnolect, migrants’ language, language change, educational proposals
In this article we examine and "exapt" Wurzel's concept of superstable markers in an innovative manner. We develop an extended view of superstability through a critical discussion of Wurzel's original definition and the status of marker-superstability versus allomorphy in Natural Morphology: As we understand it, superstability is - above and beyond a step towards uniformity - mainly a symptom for the weakening of the category affected (cf. 1.,2. and 4.). This view is exemplified in four short case studies on superstability in different grammatical categories of four Germanic languages: genitive case in Mainland Scandinavian and English (3.1), plural formation in Dutch (3.2), second person singular ending -st in German (3.3), and ablaut generalisation in Luxembourgish (3.4).
In order to understand the specific structures and features of the German surnames the most important facts about their emergence and history should be outlined and, at the same time, be compared with the Swedish surnames because there are considerable differences (for further details cf. Nubling 1997 a, b). First of all, surnames in Germany emerged rather early, with the first instances occurring in the 11th century in southern Germany; by the 16th century surnames were common all over Germany. Differences are related to geography (from south to north), social class (from the upper to the lower classes) und urban versus rural areas.
This article examines the expression of natural gender in Icelandic nouns denoting human beings. Particular attention will be paid to the system's symmetry with regards to nouns denoting women and men. Our society consists more or less exactly of half women and half men. One would therefore assume that systems for terms denoting persons would also be symmetrically organised. Yet this assumption could not be further from the truth, and not just in single isolated cases, but in many languages: I will attempt to show that Icelandic has numerous methods for referring to women, but also many barriers and idiosyncrasies.
Extremely short verbs can be found in various Genn::.,nic languages and dialects; the sterns of these verbs do not have a fInal consonant «C-)C-V), and they always have a monosyllabic infinitive and usually monosyllabic fInite forms as weIl. Examples for these 'kinds of short verbs are Swiss Gennan hä 'to have', gö 'to go', g~ 'to give', n~ 'to take' which correspond to the Swedish verbs ha, gä, ge and tao The last example shows that such short verb formations also occur with verbs having (nearly) identical meanings but which do not share the same etymology. Apart from their shortness, these verbs are characterized by a high degree of irregularity, often even by suppletion, which sometimes develops contrary to regular sound laws. Furthermore they are among the most-used verbs and often tend towards grammaticalization. The present paper compares the short verbs of seven Germanic languages; in addition, it describes their various ways of development and strategies of differentiation. Moreover, it examines the question of why some languages and dialects (e.g. Swiss German, Frisian, Swedish, Norwegian) have many short verbs while others (New High German, Icelandic, Faroese) only have few, the paper discusses the contribution of short verbs to questions concerning linguistic change and the morphological organization of languages.
German linking elements are sometimes classified as inflectional affixes, sometimes as derivational affixes, and in any case as morphological units with at least seven realisations (e.g. -s-, -es-, -(e)n-, -e-). This article seeks to show that linking elements are hybrid elements situated between morphology and phonology. On the one hand, they have a clear morphological status since they occur only within compounds (and before a very small set of suffixes) and support the listener in decoding them. On the other hand, they also have to be analysed on the phonological level, as will be shown in this article. Thus, they are marginal morphological units on the pathway to phonology (including prosodics). Although some alloforms can sometimes be considered former inflectional endings and in some cases even continue to demonstrate some inflectional behaviour (such as relatedness to gender and inflection class), they are on their way to becoming markers of ill-formed phonological words. In fact, linking elements, above all the linking -s-, which is extremely productive, help the listener decode compounds containing a bad phonological word as their first constituent, such as Geburt+s+tag ‘birthday’ or Religion+s+unterricht ‘religious education’. By marking the end of a first constituent that differs from an unmarked monopedal phonological word, the linking element aids the listener in correctly decoding and analysing the compound. German compounds are known for their length and complexity, both of which have increased over time—along with the occurrence of linking elements, especially -s-. Thus, a profound instance of language change can be observed in contemporary German, one indicating its typological shift from syllable language to word language.
A population of wild Rattus rattus living in the roofs of the laboratory buildings was studied by supplying food every evening and watching the behaviour of the animals at the feeding place. Some observations were also made on caged animals. The rats were predominantly of the black rattus variety but white-bellied greys appeared now and then. In breeding tests the grey colour behaved as though determined by a single recessive gene. The study covered two periods of approximately 9 months each, separated by an interval of 3 months during which a reduced quantity of food was provided and the rat population underwent a major decline. During the two periods of richer feeding the population first increased and then stabilized at a level where the animals remained in good condition and there was no starvation. In the first 9-month period, stabilization was achieved by emigration of young adults who colonized neighbouring buildings. Towards the end of the second period, stabilization was achieved by limitation of breeding. The rats accepted a wide variety of foods, including meat, and a number of instances of predation were seen. Small vertebrates as well as insects were killed and eaten. Small pieces of food were usually eaten in situ but large bits were taken up to the nests in the roof. Such differential treatment in relation to size may be a factor of some importance in the evolution of hoarding. The rats visiting the feeding place formed a unit with a definite social structure. A single dominant male and never more than one, was always present and in certain circumstances a linear male hierarchy was formed. There were usually two or three mutually tolerant top ranking females who were subordinate to the top male but dominant to all other members of the group. Within the group attacks were directed downwards in the social scale. An attacked subordinate either fled or appeased and serious fights therefore did not develop. The most essential component of the appease. ment appeared to be a mouth to mouth contact which may be derived from the infantile pattern of 'mouth suckling'. Appeasement permitted superior rats to maintain their status without the necessity of carrying attacks on subordinates to the point where actual hurt was inflicted. A group territory round the feeding place was defended against interlopers. Both sexes took part in chasing out intruders but since males showed inhibition in attacking females, the exclusion of strange females was due principally to the activities of the home females. The point at which pursuit of an intruder stopped was regarded as the territorial boundary. This was also the limit beyond which a group member would not allow himself to be chased but it was not a prison wall. When agonistic tendencies were not aroused the animals no longer always I turned back at the boundary and foraging beyond its limits allowed them to become familiar with an area larger than the territory. Although intruders were normally driven out, it was occasionally possible for a particularly determined animal of either sex to force its way in and ultimately become a member of the group. The patterns of behaviour seen are described, particularly those concerned with hostile encounters and with mating. Scent marking with urine drip trails was not seen but adults of both sexes marked by rubbing the cheeks and ventral surface on branches. The circumstances in which tooth gnashing was heard suggest that this behaviour is not a form of threat but a response to unfamiliar auditory or visual stimuli. There was some evidence that it functioned as an alarm signal within the group. Pilo-erection and a gait or posture with the hind legs much extended ('stegosauring') are considered to function as threats. Pilo-erection occurred in situations where there was little to suggest conflict and is considered to represent a form of threat which has undergone emancipation. Various forms of displacement and ambivalent behaviour were seen. Rapid vibration of the tail occurred in thwarting situations, either during mating or when a defeated opponent suddenly vanished. There was no evidence that it acted as a signal. The common form of amicable behaviour was social grooming. Another amicable action was sitting together with the bodies in contact. Animals reared in cages remained shy and wary and even hand reared young developed the usual alarm responses to movement and noises. Females had their first litters at ages of 3 to 5 months. For first litters gestation periods were 21 to 22 days but in females that were simultaneously lactating they ranged from 23 to 29 days. Eight was the commonest litter number and ten the highest recorded. At birth the tail is very much shorter than the body but has outstripped it by the time the youngster emerges from the nest. This was found to be the result of a period of extremely rapid tail growth immediately preceding emergence. In Rattus norvegicus the peak in tail growth rate was found to be later and less striking. The difference is interpreted as related to the importance of the tail in climbing in the more arboreal R. rattus. During the second week of life an edge response (retreat from a declivity) and a clinging response made their appearance: these have the function of preventing accidental falls from a nest situated above ground level. Mouth suckling was seen only during a period of a few days towards the end of lactation. Play developed within a few days of emergence from the nest: locomotor and fighting play were the common types. Older animals occasionally joined in play with the young. In problem solving tests, first solutions were not insightful but once a solution had been found, the successful technique was at once adopted and subsequently perfected. There was no evidence of learning by imitation but the rats did learn from each other's behaviour that food could be obtained at a certain location and thus the solution of a problem by one rat accelerated its independent solution by others. The reasons for the differences between the behaviour of the free living population and the caged animals studied by other authors are discussed.
The impact of naval sonar on beaked whales is of increasing concern. In recent years the presence of gas and fat embolism consistent with decompression sickness (DCS) has been reported through postmortem analyses on beaked whales that stranded in connection with naval sonar exercises. In the present study, we use basic principles of diving physiology to model nitrogen tension and bubble growth in several tissue compartments during normal div ng behavior and for several hypothetical dive profiles to assess the risk of DCS. Assuming that normal diving does not cause nitrogen tensions in excess of those shown to be safe for odontocetes, the modeling indicates that repetitive shallow dives, perhaps as a consequence of an extended avoidance reaction to sonar sound, can indeed pose a risk for DCS and that this risk should increase with the duration of the response. If the model is correct, then limiting the duration of sonar exposure to minimize the duration of any avoidance reaction therefore has the potential to reduce the risk of DCS.
Notes on irish plants
(1909)
The siliceous claystone and chert lithologic units of the Triassic-Jurassic chert-clastic sequence are well exposed in the Inuyama, Mt. Kinkazan and Hisuikyo areas of the southeastern Mino Terrane. Twenty-one continuous sections from those areas were investigated in order to establish comprehensive radiolarian biozones and clarify the successive lithologic changes through the Triassic and lowest Jurassic. Twenty new radiolarian zones are established; the lowest two are assemblage zones and the others are defined by the first or last occurrence of index taxa. The definitions are as follows in chronological order: TR 0, Follicucullus Assemblage Zone (early Spathian or older); TR 1, Parentactinia nakatsugawaensis Assemblage Zone (late Spathian); TR 2A, Eptingium nakasekoi Lowest-occurrence Zone (early Anisian); TR 2B, Triassocampe coronata group Lowest-occurrence Zone (early Anisian); TR 2C, Triassocampe deweveri Lowest-occurrence Zone (late Anisian); TR 3A, Spine A2 (possiblly derived from Oertlispongus inaequispinosus) Lowest occurrence Zone (late Anisian) ; TR 3B, Yeharaia elegans group Lowest-occurrence Zone (early Ladinian); TR 4A, Muelleritortis cochleata Lowest-occurrence Zone (late Ladinian); TR 4B, Spongoserrula dehli Lowest-occurrence Zone (late Ladinian to early Carnian); TR 5A, Capnuchosphaera Lowest-occurrence Zone (early Carnian); TR 5B, Poulpus carcharus sp. nov. Lowest-occurrence Zone (early to late Carnian); TR 6A, Capnodoce- Trialatus Concurrentrange Zone (late Carnian to early Norian), TR 6B, Trialatus robustus-Lysemelas olbia gen. et sp. nov. Partial-range Zone (early Norian); TR 7, Lysemelas olbia gen. et sp. nov. Lowest-occurrence Zone (early to late Norian); TR 8A: Praemesosaturnalis multidentatus group Lowest-occurrence Zone (late Norian); TR 8B: Praemesosaturnalis pseudokahleri sp. nov. Lowest-occurrence Zone (late Norian) ; TR 8C: Skirt F (possiblly derived from Haeckelicyrtium takemurai) Lowest-occurrence Zone (late Norian to early Rhaetian); TR 8D: Haeckelicyrtium breviora sp. nov. Taxon-range Zone (early to late Rhaetian) ; JR OA: Haeckelicyrtium breviora sp. nov.-Bipedis horiae sp. nov. Partial-range Zone (Hettangian); and JR OB: Bipedis horiae sp. nov. Lowest-occurrence Zone (Hettangian/Sinemurian) . These zones are correlated to previousy established radiolarian assemblages and zones in Japan and other regions. Age assignment of the zones is also discussed on the basis of the correlation and other available chronological data. The original stratigraphic succession of the Triassic in the studied area, which ranges in age from Early Triassic to Early Jurassic, is more than 100 m in thickness and can be reconstructed in detail. The succession is subdivided into seven units based on lithologic features. Each unit was probably accumulated under a particular sedimentary condition, thus successive changes of paleoceanographic environments during Triassic time can be traced continuously. Nine new genera including Ayrtonius, Blonzella, Braginella, Bulbocampe, Enoplocampe, Lysenzelas, Parvibrachiale, Spongoxystris and Veles, and 47 new species are described herein. A comprehensive list of identified taxa is presented.
The purpose of this study of early social-cognitive development was to assess the very young child's behaviorally expressed knowledge of people's visual-attentional acts and abilities. Boys and girls (N = 60) 1, 1 1/2, 2, 2 1/2, and 3 years of age were tested in their homes with their mothers' help. Three sorts of tasks were used: 1. Percept production. The child's task was to produce a visual percept in the other. Examples include pointing to objects ("productive pointing") and a wide variety of object-showing problems. 2. Percept deprivation. The opposite, exemplified by a variety of object-hiding problems. 3. Percept diagnosis. The child's task was to determine what the other was already visually attending to, either by looking where his or her finger was pointed ("receptive pointing") or where his eyes were directed. It was found that the majority of l-year-olds produced and comprehended pointing, and would sometimes hold out a toy to show it, but did little else. The 3-year-olds were at ceiling on virtually all tasks. At 1 1/2 years, children usually showed a picture by holding it flat so that both they and the other could see it. From 2 on, they usually turned it toward the other in the adult fashion. Very few children of any age showed egocentrically - i.e., orienting the picture so only they could see it. By age 2, the children solved what were presumably novel showing problems for them: e.g., successfully showing to another a picture pasted on the inside bottom of a hollow cube. Hiding ability emerged later than showing ability but seemed well established by age 3. The role of the other's eyes in seeing appeared to be quite well understood at least by age 2-2 1/2. As examples, children of this age took the other's hands away from her or his eyes before trying to show her something, and could usually tell where she was looking from her eye orientation alone. These age trends presumably reflect important developments in the area of social interaction and communication, as well as with respect to cognition about percepts.
Forty-two chemicals were tested for their ability to induce cytogenetic change in Chinese hamster ovary cells using assays for chromosome aberrations (ABS) and sister chromatid exchanges (SCE). These chemicals were included in the National Toxicology Program's evaluation of the ability of four in vitro short-term genetic toxicity assays to distinguish between rodent carcinogens and noncarcinogens. The conclusions of this comparison are presented in Zeiger et al. [Zeiger E, Haseman JK, Shelby MD, Margolin BH, Tennant RW (1990): [Environ Molec Mutagen 16(Suppl 18): 1-14]. The in vitro cytogenetic testing was conducted at four laboratories, each using a standard protocol to evaluate coded chemicals with and without exogenous metabolic activation. Most chemicals were tested in a single laboratory; however, two chemicals, tribromomethane and p-chloroaniline, were tested at two laboratories as part of an interlaboratory comparison. Four chemicals (CI. basic red 9 HCI, 2-mercaptobenzothiazole, oxytetracycline HCI, and rotenone) were tested for SCE in one laboratory and in a different laboratory for ABS. Tetrakis(hydroxymethyl)phosphonium sulfate was tested at one laboratory and the chloride form was tested at a different laboratory. Twenty-five of the 42 chemicals tested induced SCE. Sixteen of these also induced ABS; all chemicals that induced ABS also induced SCE. There was approximately 79"10 reproducibility of results in repeat tests, thus, we conclude that this protocol is effective and reproducible in detecting ABS and SCE.
The argument that I tried to elaborate on in this paper is that the conceptual problem behind the traditional competence/performance distinction does not go away, even if we abandon its original Chomskyan formulation. It returns as the question about the relation between the model of the grammar and the results of empirical investigations – the question of empirical verification The theoretical concept of markedness is argued to be an ideal correlate of gradience. Optimality Theory, being based on markedness, is a promising framework for the task of bridging the gap between model and empirical world. However, this task not only requires a model of grammar, but also a theory of the methods that are chosen in empirical investigations and how their results are interpreted, and a theory of how to derive predictions for these particular empirical investigations from the model. Stochastic Optimality Theory is one possible formulation of a proposal that derives empirical predictions from an OT model. However, I hope to have shown that it is not enough to take frequency distributions and relative acceptabilities at face value, and simply construe some Stochastic OT model that fits the facts. These facts first of all need to be interpreted, and those factors that the grammar has to account for must be sorted out from those about which grammar should have nothing to say. This task, to my mind, is more complicated than the picture that a simplistic application of (not only) Stochastic OT might draw.
The aim of this paper is the exploration of an optimality theoretic architecture for syntax that is guided by the concept of "correspondence": syntax is understood as the mechanism of "translating" underlying representations into a surface form. In minimalism, this surface form is called "Phonological Form" (PF). Both semantic and abstract syntactic information are reflected by the surface form. The empirical domain where this architecture is tested are minimal link effects, especially in the case of "wh"-movement. The OT constraints require the surface form to reflect the underlying semantic and syntactic representations as maximally as possible. The means by which underlying relations and properties are encoded are precedence, adjacency, surface morphology and prosodic structure. Information that is not encoded in one of these ways remains unexpressed, and gets lost unless it is recoverable via the context. Different kinds of information are often expressed by the same means. The resulting conflicts are resolved by the relative ranking of the relevant correspondence constraints.
This paper argues for a particular architecture of OT syntax. This architecture hasthree core features: i) it is bidirectional, the usual production-oriented optimisation (called ‘first optimisation’ here) is accompanied by a second step that checks the recoverability of an underlying form; ii) this underlying form already contains a full-fledged syntactic specification; iii) especially the procedure checking for recoverability makes crucial use of semantic and pragmatic factors. The first section motivates the basic architecture. The second section shows with two examples, how contextual factors are integrated. The third section examines its implications for learning theory, and the fourth section concludes with a broader discussion of the advantages and disadvantages of the proposed model.
Weak function word shift
(2004)
The fact that object shift only affects weak pronouns in mainland Scandinavian is seen as an instance of a more general observation that can be made in all Germanic languages: weak function words tend to avoid the edges of larger prosodic domains. This generalisation has been formulated within Optimality Theory in terms of alignment constraints on prosodic structure by Selkirk (1996) in explaining thedistribution of prosodically strong and weak forms of English functionwords, especially modal verbs, prepositions and pronouns. But a purely phonological account fails to integrate the syntactic licensing conditions for object shift in an appropriate way. The standard semantico-syntactic accounts of object shift, onthe other hand, fail to explain why it is only weak pronouns that undergo object shift. This paper develops an Optimality theoretic model of the syntax-phonology interface which is based on the interaction of syntactic and prosodic factors. The account can successfully be applied to further related phenomena in English and German.
Dialectal variation in german 3-verb clusters : a surface-oriented optimality theoretic account
(2004)
We present data from an empirical investigation on the dialectal variation in the syntax of German 3-verb clusters, consisting of a temporal auxiliary, a modal verb, and a predicative verb. The ordering possibilities vary greatly among the dialects. Some of the orders that we found occur only under particular stress assignments. We assume that these orders fulfil an information structural purpose and that the reordering processes are changes only in the linear order of the elements which is represented exclusively at the surface syntactic level, PF (Phonetic Form). Our Optimality theoretic account offers a multifactorial perspective on the phenomenon.
German dialects vary in which of the possible orders of the verbs in a 3-verb cluster they allow. In a still ongoing empirical investigation that I am undertaking together with Tanja Schmid, University of Stuttgart (Schmid and Vogel (2004)) we already found that each of the six logically possible permutations of the 3-verb cluster in (1) can be found in German dialects.
This paper reports the results of a corpus investigation on case conflicts in German argument free relative constructions. We investigate how corpus frequencies reflect the relative markedness of free relative and correlative constructions, the relative markedness of different case conflict configurations, and the relative markedness of different conflict resolution strategies. Section 1 introduces the conception of markedness as used in Optimality Theory. Section 2 introduces the facts about German free relative clauses, and section 3 presents the results of the corpus study. By and large, markedness and frequency go hand in hand. However, configurations at the highest end of the markedness scale rarely show up in corpus data, and for the configuration at the lowest end we found an unexpected outcome: the more marked structure is preferred.
This paper is part of a research project on OT Syntax and the typology of the free relative (FR) construction. It concentrates on the details of an OT analysis and some of its consequences for OT syntax. I will not present a general discussion of the phenomenon and the many controversial issues it is famous for in generative syntax.
In offering this, the first treatise on the subjeet of Rope Manipulation and Releases I do so with the hope that it will popularize what has so far been a negleeted branch of Magic. In the feverish search for something new and not overcommon in Magic thc possibilities of this little known branoh of Magic have been overlooked. Of course, Rope Manipulation is not exactly new, but as a complete. act as treated herein it is so little seen that it is new to the public-and that is what rcally counts from the performer's point of view. ...
In linguistics and the philosophy of language, the mass/count distinction has traditionally been regarded as a bi-partition on the nominal domain, where typical instances are nouns like "beef" (mass) vs."cow" (count). In the present paper, we argue that this partition reveals a system that is based on both syntactic features and conceptual features, and present experimental evidence suggesting that the discrimination of the two kinds of features has a psychological reality.
Articulatory token-to-token variability not only depends on linguistic aspects like the phoneme inventory of a given language but also on speaker specific morphological and motor constraints. As has been noted previously (Perkell (1997), Mooshammer et al. (2004)) , speakers with coronally high "domeshaped" palates exhibit more articulatory variability than speakers with coronally low "flat" palates. One explanation for that is based on perception oriented control by the speaker. The influence of articulatory variation on the cross sectional area and consequently on the acoustics should be greater for flat palates than for domeshaped ones. This should force speakers with flat palates to place their tongue very precisely whereas speakers with domeshaped palates might tolerate a greater variability. A second explanation could be a greater amount of lateral linguo-palatal contact for flat palates holding the tongue in position. In this study both hypotheses were tested.
Marcus Stiglegger revives a lost Gothic treasure in this brief discussion of Robert Sigl's Laurin—a rare case of German genre film-making and the heir to FW Murnau's legacy. Phantastic genre cinema is very rare in contemporary Germany—especially in the 1980s, the time when Italian horror reached another peak with Dario Argento's Opera (1985). The cliché of the German "easy comedy" ruled mainstream film production at the time, and so it appeared a kind of miracle when 27-year-old writer/director Robert Sigl was awarded the Bavarian Film Prize in 1988 for his debut feature: the Gothic horror fairytale Laurin.
The problematic economic situation in most parts of Russia today is nevertheless the ideal climate for the flourishing of the arts. Especially in St. Petersburg there grows a fascinating new experimental music scene, from Moscow we receive new impulses in literature such as the poet Alina Vituchnovskaja... Russian cinema always had a good reputation, and the new generation of Russian filmmakers clearly tries to keep up with it.
This paper investigates the class of Tree-Tuple MCTAG with Shared Nodes, TT-MCTAG for short, an extension of Tree Adjoining Grammars that has been proposed for natural language processing, in particular for dealing with discontinuities and word order variation in languages such as German. It has been shown that the universal recognition problem for this formalism is NP-hard, but so far it was not known whether the class of languages generated by TT-MCTAG is included in PTIME. We provide a positive answer to this question, using a new characterization of TT-MCTAG.
We present a CYK and an Earley-style algorithm for parsing Range Concatenation Grammar (RCG), using the deductive parsing framework. The characteristic property of the Earley parser is that we use a technique of range boundary constraint propagation to compute the yields of non-terminals as late as possible. Experiments show that, compared to previous approaches, the constraint propagation helps to considerably decrease the number of items in the chart.
Multicomponent Tree Adjoining Grammars (MCTAGs) are a formalism that has been shown to be useful for many natural language applications. The definition of non-local MCTAG however is problematic since it refers to the process of the derivation itself: a simultaneity constraint must be respected concerning the way the members of the elementary tree sets are added. Looking only at the result of a derivation (i.e., the derived tree and the derivation tree), this simultaneity is no longer visible and therefore cannot be checked. I.e., this way of characterizing MCTAG does not allow to abstract away from the concrete order of derivation. In this paper, we propose an alternative definition of MCTAG that characterizes the trees in the tree language of an MCTAG via the properties of the derivation trees (in the underlying TAG) the MCTAG licences. We provide similar characterizations for various types of MCTAG. These characterizations give a better understanding of the formalisms, they allow a more systematic comparison of different types of MCTAG, and, furthermore, they can be exploited for parsing.
The classifications of the Hystricomorpha in English text-books of Zoology are based upon the one proposed by Alston in 1876 (P.Z.S. 1876, pp. 90-97), which was itself an amplification and in some particulars a modification of the arrangement suggested by Waterhouse in 1848. Alston added to the group the family Dinomyidae, which, following Peters, he placed between the Dasyproctidae and Caviidae; and the Otenodactylinae, which he ranked as a subfamily of Octodontidae. He also transferred Petromys from the Echymyina (Echinomyinae), where it was placed by Waterhouse, to the Octodontinae. ...
Buli is an Oti-Volta tone language spoken in Northern Ghana. This paper outlines the basic features of its tonal system and explores whether and in which way pitch respectively phonemic tone is approached as a means to indicate the pragmatic category of focus. Pursued are cases with focus-related surface tone changes as well as cases where pitch could help to disambiguate between broad and narrow foci. It is argued that focus is not consistently encoded by pitch or tone. Parallel findings for the closely related languages Kopen o (phonetic symbol)nni and Dagbani suggest that the apparent lack of significant prosodic focus signals in Buli might pertain to a larger group of tonal languages of the Gur family.
The present article illustrates that the specific articulatory and aerodynamic requirements for voiced but not voiceless alveolar or dental stops can cause tongue tip retraction and tongue mid lowering and thus retroflexion of front coronals. This retroflexion is shown to have occurred diachronically in the three typologically unrelated languages Dhao (Malayo-Polynesian), Thulung (Sino-Tibetan), and Afar (East-Cushitic). In addition to the diachronic cases, we provide synchronic data for retroflexion from an articulatory study with four speakers of German, a language usually described as having alveolar stops. With these combined data we supply evidence that voiced retroflex stops (as the only retroflex segments in a language) did not necessarily emerge from implosives, as argued by Haudricourt (1950), Greenberg (1970), Bhat (1973), and Ohala (1983). Instead, we propose that the voiced front coronal plosive /d/ is generally articulated in a way that favours retroflexion, that is, with a smaller and more retracted place of articulation and a lower tongue and jaw position than /t/.
The medium of (oral) language is mostly disregarded (or overlooked) in contemporary media theories. This "ignoring of language" in media studies is often accompanied by an inadequate transport model of communication, and it converges with an "ignoring of mediality" in mentalistic theories of language. In the present article it will be argued that this misleading opposition of language and media can only be overcome if one already regards oral language, not just written language, as a medium of the human mind. In my argumentation I fall back on Wittgenstein’s conception of language games to try to show how Wittgenstein’s ideas can help us to clear up the problem of the mediality of language and also to show to what extent the mentalistic conception of Chomskyan provenance cannot be adequate to the phenomenon of language.
Notes upon the emotionality of a schizophrenic patient and its relation to problems of technique
(1953)
It seems justifiable to inquire into the specific factors which make the emotionality of a schizophrenic patient different from that of other patients and to investigate to what extent this specificity of schizophrenic emotionality might require specific changes in the psychoanalytic technique. Although I do not think that this paper can really live up to the full requirements of such an ambitious undertaking, it nevertheless may contribute modestly to it. My speculations began during a phase of the treatment of a schizophrenic patient; long after her acute condition had subsided I thought I observed-within clinically pertinent areas-a specific relationship between the patient's ego structure and her emotions. It seems to me that this relationship might allow generalization in terms of a basic defect with which a schizophrenic patient has to struggle, although in various phases of the disease and of the treatment the phenomenology of schizophrenic emotionality differs unquestionably in significant aspects. However, before delving into the subject matter, a few general points must be raised in reference to the psychoanalytic theory of emotions.
The taxonomy, diversity, and distribution of the aquatic insect order Trichoptera, caddisflies, are reviewed. The order is among the most important and diverse of all aquatic taxa. Larvae are vital participants in aquatic food webs and their presence and relative abundance are used in the biological assessment and monitoring of water quality. The species described by Linnaeus are listed. The morphology of all life history stages (adults, larvae, and pupae) is diagnosed and major features of the anatomy are illustrated. Major components of life history and biology are summarized. A discussion of phylogenetic studies within the order is presented, including higher classification of the suborders and superfamilies, based on recent literature. Synopses of each of 45 families are presented, including the taxonomic history of the family, a list of all known genera in each family, their general distribution and relative species diversity, and a short overview of family-level biological features. The order contains 600 genera, and approximately 13,000 species.
In this paper, we argue that difficulties in the definition of coreference itself contribute to lower inter-annotator agreement in certain cases. Data from a large referentially annotated corpus serves to corroborate this point, using a quantitative investigation to assess which effects or problems are likely to be the most prominent. Several examples where such problems occur are discussed in more detail, and we then propose a generalisation of Poesio, Reyle and Stevenson’s Justified Sloppiness Hypothesis to provide a unified model for these cases of disagreement and argue that a deeper understanding of the phenomena involved allows to tackle problematic cases in a more principled fashion than would be possible using only pre-theoretic intuitions.
Traditionally, parsers are evaluated against gold standard test data. This can cause problems if there is a mismatch between the data structures and representations used by the parser and the gold standard. A particular case in point is German, for which two treebanks (TiGer and TüBa-D/Z) are available with highly different annotation schemes for the acquisition of (e.g.) PCFG parsers. The differences between the TiGer and TüBa-D/Z annotation schemes make fair and unbiased parser evaluation difficult [7, 9, 12]. The resource (TEPACOC) presented in this paper takes a different approach to parser evaluation: instead of providing evaluation data in a single annotation scheme, TEPACOC uses comparable sentences and their annotations for 5 selected key grammatical phenomena (with 20 sentences each per phenomena) from both TiGer and TüBa-D/Z resources. This provides a 2 times 100 sentence comparable testsuite which allows us to evaluate TiGer-trained parsers against the TiGer part of TEPACOC, and TüBa-D/Z-trained parsers against the TüBa-D/Z part of TEPACOC for key phenomena, instead of comparing them against a single (and potentially biased) gold standard. To overcome the problem of inconsistency in human evaluation and to bridge the gap between the two different annotation schemes, we provide an extensive error classification, which enables us to compare parser output across the two different treebanks. In the remaining part of the paper we present the testsuite and describe the grammatical phenomena covered in the data. We discuss the different annotation strategies used in the two treebanks to encode these phenomena and present our error classification of potential parser errors.
Distributional approximations to lexical semantics are very useful not only in helping the creation of lexical semantic resources (Kilgariff et al., 2004; Snow et al., 2006), but also when directly applied in tasks that can benefit from large-coverage semantic knowledge such as coreference resolution (Poesio et al., 1998; Gasperin and Vieira, 2004; Versley, 2007), word sense disambiguation (Mc- Carthy et al., 2004) or semantical role labeling (Gordon and Swanson, 2007). We present a model that is built from Webbased corpora using both shallow patterns for grammatical and semantic relations and a window-based approach, using singular value decomposition to decorrelate the feature space which is otherwise too heavily influenced by the skewed topic distribution of Web corpora.
Parsing coordinations
(2009)
The present paper is concerned with statistical parsing of constituent structures in German. The paper presents four experiments that aim at improving parsing performance of coordinate structure: 1) reranking the n-best parses of a PCFG parser, 2) enriching the input to a PCFG parser by gold scopes for any conjunct, 3) reranking the parser output for all possible scopes for conjuncts that are permissible with regard to clause structure. Experiment 4 reranks a combination of parses from experiments 1 and 3. The experiments presented show that n- best parsing combined with reranking improves results by a large margin. Providing the parser with different scope possibilities and reranking the resulting parses results in an increase in F-score from 69.76 for the baseline to 74.69. While the F-score is similar to the one of the first experiment (n-best parsing and reranking), the first experiment results in higher recall (75.48% vs. 73.69%) and the third one in higher precision (75.43% vs. 73.26%). Combining the two methods results in the best result with an F-score of 76.69.
Trubetzkoy's recognition of a delimitative function of phonology, serving to signal boundaries between morphological units, is expressed in terms of alignment constraints in Optimality Theory, where the relevant constraints require specific morphological boundaries to coincide with phonological structure (Trubetzkoy 1936, 1939, McCarthy & Prince 1993). The approach pursued in the present article is to investigate the distribution of phonological boundary signals to gain insight into the criteria underlying morphological analysis. The evidence from English and Swedish suggests that necessary and sufficient conditions for word-internal morphological analysis concern the recognizability of head constituents, which include the rightmost members of compounds and head affixes. The claim is that the stability of word-internal boundary effects in historical perspective cannot in general be sufficiently explained in terms of memorization and imitation of phonological word form. Rather, these effects indicate a morphological parsing mechanism based on the recognition of word-internal head constituents. Head affixes can be shown to contrast systematically with modifying affixes with respect to syntactic function, semantic content, and prosodic properties. That is, head affixes, which cannot be omitted, often lack inherent meaning and have relatively unmarked boundaries, which can be obscured entirely under specific phonological conditions. By contrast, modifying affixes, which can be omitted, consistently have inherent meaning and have stronger boundaries, which resist prosodic fusion in all phonological contexts. While these correlations are hardly specific to English and Swedish it remains to be investigated to which extent they hold cross-linguistically. The observation that some of the constituents identified on the basis of prosodic evidence lack inherent meaning raises the issue of compositionality. I will argue that certain systematic aspects of word meaning cannot be captured with reference to the syntagmatic level, but require reference to the paradigmatic level instead. The assumption is then that there are two dimensions of morphological analysis: syntagmatic analysis, which centers on the criteria for decomposing words in terms of labelled constituents, and paradigmatic analysis, which centers on the criteria for establishing relations among (whole) words in the mental lexicon. While meaning is intrinsically connected with paradigmatic analysis (e.g. base relations, oppositeness) it is not essential to syntagmatic analysis.
Popular culture is always in process; its meanings can never be identified in a text, for texts are activated, or made meaningful, only in social relations and in intertextual relations. This activation of the meaning potential of a text can occur only in the social and cultural relationship into which it enters. (Fiske, 1991a: 3)
The content of this book will explain A For various reasons Europeans and Germans left their Homeland. B How they travelled in groups and individually. C How they landed in South Australia. D The Newcomers reception in a British colony. E The treatment they received in Australia. F What the Germans and Europeans achieved in Australia.
There is a caricature of Marcel Proust in which the despairing writer is consoled by a friend saying, 'Aber, aber, mon cher Marcel, nun versuchen Sie sich doch zu erinnern, wo Sie die Zeit verloren haben.'
Literature in general, not only A La Recherche du Temps Perdu, deals with a different form of memory than that of mnemonics, in which the hints of places lead to a retrieval of what has been stored there before. Nevertheless it is difficult to pinpoint the criteria that make this difference. How does literature transcend the technologically limited sense of memory in terms of a storage and retrieval system? ...
Giulio Camillo (1480 - 1544) was as well-known in his era as Bill Gates is now. Just like Gates he cherished a vision of a universal Storage and Retrieval System, and just like Microsoft Windows, his ‘Theatre of the Memory’ was, despite constant revision, never completed. Camillo’s legendary Theatre of Memory remained only a fragment, its benefits only an option for the future. When it was finished, the user - so he predicted - would have access to the knowledge of the whole universe. On account of his promising invention, Camillo’s contemporaries called him ‘the divine’. For others, like Erasmus or the Parisian scholars, he was just a ‘quack’, but also this only shows that his reception was as strong as is the case with the computer gurus of our days. Still, Camillo was forgotten immediately after his death. No trace is left of his spectacular databank - except a short treatise which he dictated on his deathbed and which was formulated in the future tense: ‘L’Idea del Theatro’ (1550). ...
When the concept of the auteur was coined in the 1950s and 1960s, it was an initiative to clarify the obscure matters of authorship in cinema. Because a film must necessarily be a collective work, understood as the result of a large number of creative contributions, it was often unclear who the decisive power behind a certain film was, who contributed the "distinctive quality". The control will usually belong to the director, the producer or the star (or all three in combination), but what singles out a given film could also come from the cinematographer, the scriptwriter, from the author of an adapted literary work, or from traditions in the studio or in the genre. Nothing can be taken for granted about a film's authorship, it can only be decided through a thorough analysis of each film's production process, an analysis that, in most cases, will be impossible to make. ...
As editor of the next iteration of the Köchel Catalogue, I have to deal with the current (sixth) edition’s Appendix C, devoted to "Doubtful and Misattributed Works." My goal is to reduce the potentially vast dimensions of that appendix to only those works for which some connection to Mozart cannot be ruled out. In the decades since 1964, when the current edition of Köchel was published, many of the works listed in Appendix C have been convincingly attributed to other composers. Other works therein can confidently be dismissed as never having had any meaningful connection to Mozart. Yet even after removing the reattributed and trivially misattributed works from the appendix, we are left with a handful of works that may possibly have had something to do with Mozart, even if clear evidence one way or the other remains elusive. One must, of course, be cautious in removing questionable and doubtful works from the catalogue, as the present case-study will illustrate. The work under consideration, catalogued as K6 Anh. C 9.07, is an unaccompanied piece for three or four voices with the text "Venerabilis barba capucinorum." ...
Intrinsic motivation, the causal mechanism for spontaneous exploration and curiosity, is a central concept in developmental psychology. It has been argued to be a crucial mechanism for open-ended cognitive development in humans, and as such has gathered a growing interest from developmental roboticists in the recent years. The goal of this paper is threefold. First, it provides a synthesis of the different approaches of intrinsic motivation in psychology. Second, by interpreting these approaches in a computational reinforcement learning framework, we argue that they are not operational and even sometimes inconsistent. Third, we set the ground for a systematic operational study of intrinsic motivation by presenting a formal typology of possible computational approaches. This typology is partly based on existing computational models, but also presents new ways of conceptualizing intrinsic motivation. We argue that this kind of computational typology might be useful for opening new avenues for research both in psychology and developmental robotics.
This paper describes the creation and preparation of TUSNELDA, a collection of corpus data built for linguistic research. This collection contains a number of linguistically annotated corpora which differ in various aspects such as language, text sorts / data types, encoded annotation levels, and linguistic theories underlying the annotation. The paper focuses on this variation on the one hand and the way how these heterogeneous data are integrated into one resource on the other hand.
We adopt Markert and Nissim (2005)’s approach of using the World Wide Web to resolve cases of coreferent bridging for German and discuss the strength and weaknesses of this approach. As the general approach of using surface patterns to get information on ontological relations between lexical items has only been tried on English, it is also interesting to see whether the approach works for German as well as it does for English and what differences between these languages need to be accounted for. We also present a novel approach for combining several patterns that yields an ensemble that outperforms the best-performing single patterns in terms of both precision and recall.
When a statistical parser is trained on one treebank, one usually tests it on another portion of the same treebank, partly due to the fact that a comparable annotation format is needed for testing. But the user of a parser may not be interested in parsing sentences from the same newspaper all over, or even wants syntactic annotations for a slightly different text type. Gildea (2001) for instance found that a parser trained on the WSJ portion of the Penn Treebank performs less well on the Brown corpus (the subset that is available in the PTB bracketing format) than a parser that has been trained only on the Brown corpus, although the latter one has only half as many sentences as the former. Additionally, a parser trained on both the WSJ and Brown corpora performs less well on the Brown corpus than on the WSJ one. This leads us to the following questions that we would like to address in this paper: - Is there a difference in usefulness of techniques that are used to improve parser performance between the same-corpus and the different-corpus case? - Are different types of parsers (rule-based and statistical) equally sensitive to corpus variation? To achieve this, we compared the quality of the parses of a hand-crafted constraint-based parser and a statistical PCFG-based parser that was trained on a treebank of German newspaper text.
In the past, a divide could be seen between ’deep’ parsers on the one hand, which construct a semantic representation out of their input, but usually have significant coverage problems, and more robust parsers on the other hand, which are usually based on a (statistical) model derived from a treebank and have larger coverage, but leave the problem of semantic interpretation to the user. More recently, approaches have emerged that combine the robustness of datadriven (statistical) models with more detailed linguistic interpretation such that the output could be used for deeper semantic analysis. Cahill et al. (2002) use a PCFG-based parsing model in combination with a set of principles and heuristics to derive functional (f-)structures of Lexical-Functional Grammar (LFG). They show that the derived functional structures have a better quality than those generated by a parser based on a state-of-the-art hand-crafted LFG grammar. Advocates of Dependency Grammar usually point out that dependencies already are a semantically meaningful representation (cf. Menzel, 2003). However, parsers based on dependency grammar normally create underspecified representations with respect to certain phenomena such as coordination, apposition and control structures. In these areas they are too "shallow" to be directly used for semantic interpretation. In this paper, we adopt a similar approach to Cahill et al. (2002) using a dependency-based analysis to derive functional structure, and demonstrate the feasibility of this approach using German data. A major focus of our discussion is on the treatment of coordination and other potentially underspecified structures of the dependency data input. F-structure is one of the two core levels of syntactic representation in LFG (Bresnan, 2001). Independently of surface order, it encodes abstract syntactic functions that constitute predicate argument structure and other dependency relations such as subject, predicate, adjunct, but also further semantic information such as the semantic type of an adjunct (e.g. directional). Normally f-structure is captured as a recursive attribute value matrix, which is isomorphic to a directed graph representation. Figure 5 depicts an example target f-structure. As mentioned earlier, these deeper-level dependency relations can be used to construct logical forms as in the approaches of van Genabith and Crouch (1996), who construct underspecified discourse representations (UDRSs), and Spreyer and Frank (2005), who have robust minimal recursion semantics (RMRS) as their target representation. We therefore think that f-structures are a suitable target representation for automatic syntactic analysis in a larger pipeline of mapping text to interpretation. In this paper, we report on the conversion from dependency structures to fstructure. Firstly, we evaluate the f-structure conversion in isolation, starting from hand-corrected dependencies based on the TüBa-D/Z treebank and Versley (2005)´s conversion. Secondly, we start from tokenized text to evaluate the combined process of automatic parsing (using Foth and Menzel (2006)´s parser) and f-structure conversion. As a test set, we randomly selected 100 sentences from TüBa-D/Z which we annotated using a scheme very close to that of the TiGer Dependency Bank (Forst et al., 2004). In the next section, we sketch dependency analysis, the underlying theory of our input representations, and introduce four different representations of coordination. We also describe Weighted Constraint Dependency Grammar (WCDG), the dependency parsing formalism that we use in our experiments. Section 3 characterises the conversion of dependencies to f-structures. Our evaluation is presented in section 4, and finally, section 5 summarises our results and gives an overview of problems remaining to be solved.
This paper is concerned with the tagging of spatial expressions in German newspaper articles, assigning a meaning to the expression and classifying the usages of the spatial expression and linking the derived referent to an event description. In our system, we implemented the activation of concepts in a very simple fashion, a concept is activated once (with a cost depending on the item that activated it) and is left activated thereafter. As an example, a city also activates the nodes for the region and the country it is part of, so that cities from one country are chosen over cities from different countries. A test corpus of 12 German newspaper articles was tested regarding several disambiguation strategies. Disambiguation was carried out via a beam search to find an approximately cost-optimal solution for the conflict set of potential grounding candidates for the tagged spatial expression. Test showed that the disambiguation strategies improved accuracy significantly.
Using a qualitative analysis of disagreements from a referentially annotated newspaper corpus, we show that, in coreference annotation, vague referents are prone to greater disagreement. We show how potentially problematic cases can be dealt with in a way that is practical even for larger-scale annotation, considering a real-world example from newspaper text.
We investigate methods to improve the recall in coreference resolution by also trying to resolve those definite descriptions where no earlier mention of the referent shares the same lexical head (coreferent bridging). The problem, which is notably harder than identifying coreference relations among mentions which have the same lexical head, has been tackled with several rather different approaches, and we attempt to provide a meaningful classification along with a quantitative comparison. Based on the different merits of the methods, we discuss possibilities to improve them and show how they can be effectively combined.
In this paper, we investigate the usefulness of a wide range of features for their usefulness in the resolution of nominal coreference, both as hard constraints (i.e. completely removing elements from the list of possible candidates) as well as soft constraints (where a cumulation of violations of soft constraints will make it less likely that a candidate is chosen as the antecedent). We present a state of the art system based on such constraints and weights estimated with a maximum entropy model, using lexical information to resolve cases of coreferent bridging.
In recent years, research in parsing has extended in several new directions. One of these directions is concerned with parsing languages other than English. Treebanks have become available for many European languages, but also for Arabic, Chinese, or Japanese. However, it was shown that parsing results on these treebanks depend on the types of treebank annotations used. Another direction in parsing research is the development of dependency parsers. Dependency parsing profits from the non-hierarchical nature of dependency relations, thus lexical information can be included in the parsing process in a much more natural way. Especially machine learning based approaches are very successful (cf. e.g.). The results achieved by these dependency parsers are very competitive although comparisons are difficult because of the differences in annotation. For English, the Penn Treebank has been converted to dependencies. For this version, Nivre et al. report an accuracy rate of 86.3%, as compared to an F-score of 92.1 for Charniaks parser. The Penn Chinese Treebank is also available in a constituent and a dependency representations. The best results reported for parsing experiments with this treebank give an F-score of 81.8 for the constituent version and 79.8% accuracy for the dependency version. The general trend in comparisons between constituent and dependency parsers is that the dependency parser performs slightly worse than the constituent parser. The only exception occurs for German, where F-scores for constituent plus grammatical function parses range between 51.4 and 75.3, depending on the treebank, NEGRA or TüBa-D/Z. The dependency parser based on a converted version of Tüba-D/Z, in contrast, reached an accuracy of 83.4%, i.e. 12 percent points better than the best constituent analysis including grammatical functions.
This paper profiles significant differences in syntactic distribution and differences in word class frequencies for two treebanks of spoken and written German: the TüBa-D/S, a treebank of transliterated spontaneous dialogues, and the TüBa-D/Z treebank of newspaper articles published in the German daily newspaper die tageszeitung´(taz). The approach can be used more generally as a means of distinguishing and classifying language corpora of different genres.
This paper profiles significant differences in syntactic distribution and differences in word class frequencies for two treebanks of spoken and written German: the TüBa-D/S, a treebank of transliterated spontaneous dialogs, and the TüBa-D/Z treebank of newspaper articles published in the German daily newspaper ´die tageszeitung´(taz). The approach can be used more generally as a means of distinguishing and classifying language corpora of different genres.
This paper presents an approach to the question whether it is possible to construct a parser based on ideas from case-based reasoning. Such a parser would employ a partial analysis of the input sentence to select a (nearly) complete syntax tree and then adapt this tree to the input sentence. The experiments performed on German data from the Tüba-D/Z treebank and the KaRoPars partial parser show that a wide range of levels of generality can be reached, depending on which types of information are used to determine the similarity between input sentence and training sentences. The results are such that it is possible to construct a case-based parser. The optimal setting out of those presented here need to be determined empirically.
Quantitative evaluation of parsers has traditionally centered around the PARSEVAL measures of crossing brackets, (labeled) precision, and (labeled) recall. However, it is well known that these measures do not give an accurate picture of the quality of the parsers output. Furthermore, we will show that they are especially unsuited for partial parsers. In recent years, research has concentrated on dependencybased evaluation measures. We will show in this paper that such a dependency-based evaluation scheme is particularly suitable for partial parsers. TüBa-D, the treebank used here for evaluation, contains all the necessary dependency information so that the conversion of trees into a dependency structure does not have to rely on heuristics. Therefore, the dependency representations are not only reliable, they are also linguistically motivated and can be used for linguistic purposes.
The purpose of this paper is to describe the TüBa-D/Z treebank of written German and to compare it to the independently developed TIGER treebank (Brants et al., 2002). Both treebanks, TIGER and TüBa-D/Z, use an annotation framework that is based on phrase structure grammar and that is enhanced by a level of predicate-argument structure. The comparison between the annotation schemes of the two treebanks focuses on the different treatments of free word order and discontinuous constituents in German as well as on differences in phrase-internal annotation.
The ACL 2008 Workshop on Parsing German features a shared task on parsing German. The goal of the shared task was to find reasons for the radically different behavior of parsers on the different treebanks and between constituent and dependency representations. In this paper, we describe the task and the data sets. In addition, we provide an overview of the test results and a first analysis.
The Conference on Computational Natural Language Learning features a shared task, in which participants train and test their learning systems on the same data sets. In 2007, as in 2006, the shared task has been devoted to dependency parsing, this year with both a multilingual track and a domain adaptation track. In this paper, we define the tasks of the different tracks and describe how the data sets were created from existing treebanks for ten languages. In addition, we characterize the different approaches of the participating systems, report the test results, and provide a first analysis of these results.
Recent approaches to Word Sense Disambiguation (WSD) generally fall into two classes: (1) information-intensive approaches and (2) information-poor approaches. Our hypothesis is that for memory-based learning (MBL), a reduced amount of data is more beneficial than the full range of features used in the past. Our experiments show that MBL combined with a restricted set of features and a feature selection method that minimizes the feature set leads to competitive results, outperforming all systems that participated in the SENSEVAL-3 competition on the Romanian data. Thus, with this specific method, a tightly controlled feature set improves the accuracy of the classifier, reaching 74.0% in the fine-grained and 78.7% in the coarse-grained evaluation.
The purpose of this paper is to describe recent developments in the morphological, syntactic, and semantic annotation of the TüBa-D/Z treebank of German. The TüBa-D/Z annotation scheme is derived from the Verbmobil treebank of spoken German [4, 10], but has been extended along various dimensions to accommodate the characteristics of written texts. TüBa-D/Z uses as its data source the "die tageszeitung" (taz) newspaper corpus. The Verbmobil treebank annotation scheme distinguishes four levels of syntactic constituency: the lexical level, the phrasal level, the level of topological fields, and the clausal level. The primary ordering principle of a clause is the inventory of topological fields, which characterize the word order regularities among different clause types of German, and which are widely accepted among descriptive linguists of German [3, 6]. The TüBa-D/Z annotation relies on a context-free backbone (i.e. proper trees without crossing branches) of phrase structure combined with edge labels that specify the grammatical function of the phrase in question. The syntactic annotation scheme of the TüBa-D/Z is described in more detail in [12, 11]. TüBa-D/Z currently comprises approximately 15 000 sentences, with approximately 7 000 sentences being in the correction phase. The latter will be released along with an updated version of the existing treebank before the end of this year. The treebank is available in an XML format, in the NEGRA export format [1] and in the Penn treebank bracketing format. The XML format contains all types of information as described above, the NEGRA export format contains all sentenceinternal information while the Penn treebank format includes only those layers of information that can be expressed as pure tree structures. Over the course of the last year, more fine grained linguistic annotations have been added along the following dimensions: 1. the basic Stuttgart-Tübingen tagset, STTS, [9] labels have been enriched by relevant features of inflectional morphology, 2. named entity information has been encoded as part of the syntactic annotation, and 3. a set of anaphoric and coreference relations has been added to link referentially dependent noun phrases. In the following sections, we will describe each of these innovations in turn and will demonstrate how the additional annotations can be incorporated into one comprehensive annotation scheme.
The definition of similarity between sentences is formulated on the levels of words, POS tags, and chunks (Abney 91; Abney 96). The evaluation of this approach shows that while precision and recall based on the PARSEVAL measures (Black et al. 91) do not reach state of the art Parsers yet (F1=87.19 on syntactic constituents, F1=77.78 including functionargument structure), the parser shows a very reliable performance where function-argument structure is concerned (F1=96.52). The lower F-scores are very often due to unattached constituents.
The problem of vocalization, or diacritization, is essential to many tasks in Arabic NLP. Arabic is generally written without the short vowels, which leads to one written form having several pronunciations with each pronunciation carrying its own meaning(s). In the experiments reported here, we define vocalization as a classification problem in which we decide for each character in the unvocalized word whether it is followed by a short vowel. We investigate the importance of different types of context. Our results show that the combination of using memory-based learning with only a word internal context leads to a word error rate of 6.64%. If a lexical context is added, the results deteriorate slowly.
In syntax, the trend nowadays is towards lexicalized grammar formalisms. It is now widely accepted that dividing words into wordclasses may serve as a laborsaving mechanism - but at the same time, it discards all detailed information on the idiosyncratic behavior of words. And that is exactly the type of information that may be necessary in order to parse a sentence. For learning approaches, however, lexicalized grammars represent a challenge for the very reason that they include so much detailed and specific information, which is difficult to learn. This paper will present an algorithm for learning a link grammar of German. The problem of data sparseness is tackled by using all the available information from partial parses as well as from an existing grammar fragment and a tagger. This is a report about work in progress so there are no representative results available yet.
This paper presents a comparative study of probabilistic treebank parsing of German, using the Negra and TüBa-D/Z treebanks. Experiments with the Stanford parser, which uses a factored PCFG and dependency model, show that, contrary to previous claims for other parsers, lexicalization of PCFG models boosts parsing performance for both treebanks. The experiments also show that there is a big difference in parsing performance, when trained on the Negra and on the TüBa-D/Z treebanks. Parser performance for the models trained on TüBa-D/Z are comparable to parsing results for English with the Stanford parser, when trained on the Penn treebank. This comparison at least suggests that German is not harder to parse than its West-Germanic neighbor language English.
How to compare treebanks
(2008)
Recent years have seen an increasing interest in developing standards for linguistic annotation, with a focus on the interoperability of the resources. This effort, however, requires a profound knowledge of the advantages and disadvantages of linguistic annotation schemes in order to avoid importing the flaws and weaknesses of existing encoding schemes into the new standards. This paper addresses the question how to compare syntactically annotated corpora and gain insights into the usefulness of specific design decisions. We present an exhaustive evaluation of two German treebanks with crucially different encoding schemes. We evaluate three different parsers trained on the two treebanks and compare results using EVALB, the Leaf-Ancestor metric, and a dependency-based evaluation. Furthermore, we present TePaCoC, a new testsuite for the evaluation of parsers on complex German grammatical constructions. The testsuite provides a well thought-out error classification, which enables us to compare parser output for parsers trained on treebanks with different encoding schemes and provides interesting insights into the impact of treebank annotation schemes on specific constructions like PP attachment or non-constituent coordination.
In the last decade, the Penn treebank has become the standard data set for evaluating parsers. The fact that most parsers are solely evaluated on this specific data set leaves the question unanswered how much these results depend on the annotation scheme of the treebank. In this paper, we will investigate the influence which different decisions in the annotation schemes of treebanks have on parsing. The investigation uses the comparison of similar treebanks of German, NEGRA and TüBa-D/Z, which are subsequently modified to allow a comparison of the differences. The results show that deleted unary nodes and a flat phrase structure have a negative influence on parsing quality while a flat clause structure has a positive influence.
Transforming constituent-based annotation into dependency-based annotation has been shown to work for different treebanks and annotation schemes (e.g. Lin (1995) has transformed the Penn treebank, and Kübler and Telljohann (2002) the Tübinger Baumbank des Deutschen (TüBa-D/Z)). These ventures are usually triggered by the conflict between theory-neutral annotation, that targets most needs of a wider audience, and theory-specific annotation, that provides more fine-grained information for a smaller audience. As a compromise, it has been pointed out that treebanks can be designed to support more than one theory from the start (Nivre, 2003). We argue that information can also be added to an existing annotation scheme so that it supports additional theory-specific annotations. We also argue that such a transformation is useful for improving and extending the original annotation scheme with respect to both ambiguous annotation and annotation errors. We show this by analysing problems that arise when generating dependency information from the constituent-based TüBa-D/Z.
Chunk parsing has focused on the recognition of partial constituent structures at the level of individual chunks. Little attention has been paid to the question of how such partial analyses can be combined into larger structures for complete utterances. Such larger structures are not only desirable for a deeper syntactic analysis. They also constitute a necessary prerequisite for assigning function-argument structure. The present paper offers a similaritybased algorithm for assigning functional labels such as subject, object, head, complement, etc. to complete syntactic structures on the basis of prechunked input. The evaluation of the algorithm has concentrated on measuring the quality of functional labels. It was performed on a German and an English treebank using two different annotation schemes at the level of function argument structure. The results of 89.73% correct functional labels for German and 90.40%for English validate the general approach.
In this paper, we investigate the role of sub-optimality in training data for part-of-speech tagging. In particular, we examine to what extent the size of the training corpus and certain types of errors in it affect the performance of the tagger. We distinguish four types of errors: If a word is assigned a wrong tag, this tag can belong to the ambiguity class of the word (i.e. to the set of possible tags for that word) or not; furthermore, the major syntactic category (e.g. "N" or "V") can be correctly assigned (e.g. if a finite verb is classified as an infinitive) or not (e.g. if a verb is classified as a noun). We empirically explore the decrease of performance that each of these error types causes for different sizes of the training set. Our results show that those types of errors that are easier to eliminate have a particularly negative effect on the performance. Thus, it is worthwhile concentrating on the elimination of these types of errors, especially if the training corpus is large.
Prepositional phrase (PP) attachment is one of the major sources for errors in traditional statistical parsers. The reason for that lies in the type of information necessary for resolving structural ambiguities. For parsing, it is assumed that distributional information of parts-of-speech and phrases is sufficient for disambiguation. For PP attachment, in contrast, lexical information is needed. The problem of PP attachment has sparked much interest ever since Hindle and Rooth (1993) formulated the problem in a way that can be easily handled by machine learning approaches: In their approach, PP attachment is reduced to the decision between noun and verb attachment; and the relevant information is reduced to the two possible attachment sites (the noun and the verb) and the preposition of the PP. Brill and Resnik (1994) extended the feature set to the now standard 4-tupel also containing the noun inside the PP. Among many publications on the problem of PP attachment, Volk (2001; 2002) describes the only system for German. He uses a combination of supervised and unsupervised methods. The supervised method is based on the back-off model by Collins and Brooks (1995), the unsupervised part consists of heuristics such as ”If there is a support verb construction present, choose verb attachment”. Volk trains his back-off model on the Negra treebank (Skut et al., 1998) and extracts frequencies for the heuristics from the ”Computerzeitung”. The latter also serves as test data set. Consequently, it is difficult to compare Volk’s results to other results for German, including the results presented here, since not only he uses a combination of supervised and unsupervised learning, but he also performs domain adaptation. Most of the researchers working on PP attachment seem to be satisfied with a PP attachment system; we have found hardly any work on integrating the results of such approaches into actual parsers. The only exceptions are Mehl et al. (1998) and Foth and Menzel (2006), both working with German data. Mehl et al. report a slight improvement of PP attachment from 475 correct PPs out of 681 PPs for the original parser to 481 PPs. Foth and Menzel report an improvement of overall accuracy from 90.7% to 92.2%. Both integrate statistical attachment preferences into a parser. First, we will investigate whether dependency parsing, which generally uses lexical information, shows the same performance on PP attachment as an independent PP attachment classifier does. Then we will investigate an approach that allows the integration of PP attachment information into the output of a parser without having to modify the parser: The results of an independent PP attachment classifier are integrated into the parse of a dependency parser for German in a postprocessing step.
This report explores the question of compatibility between annotation projects including translating annotation formalisms to each other or to common forms. Compatibility issues are crucial for systems that use the results of multiple annotation projects. We hope that this report will begin a concerted effort in the field to track the compatibility of annotation schemes for part of speech tagging, time annotation, treebanking, role labeling and other phenomena.
This paper reports on the SYN-RA (SYNtax-based Reference Annotation) project, an on-going project of annotating German newspaper texts with referential relations. The project has developed an inventory of anaphoric and coreference relations for German in the context of a unified, XML-based annotation scheme for combining morphological, syntactic, semantic, and anaphoric information. The paper discusses how this unified annotation scheme relates to other formats currently discussed in the literature, in particular the annotation graph model of Bird and Liberman (2001) and the pie-in-thesky scheme for semantic annotation.
Chunk parsing has focused on the recognition of partial constituent structures at the level of individual chunks. Little attention has been paid to the question of how such partial analyses can be combined into larger structures for complete utterances. The TüSBL parser extends current chunk parsing techniques by a tree-construction component that extends partial chunk parses to complete tree structures including recursive phrase structure as well as function-argument structure. TüSBLs tree construction algorithm relies on techniques from memory-based learning that allow similarity-based classification of a given input structure relative to a pre-stored set of tree instances from a fully annotated treebank. A quantitative evaluation of TüSBL has been conducted using a semi-automatically constructed treebank of German that consists of appr. 67,000 fully annotated sentences. The basic PARSEVAL measures were used although they were developed for parsers that have as their main goal a complete analysis that spans the entire input.This runs counter to the basic philosophy underlying TüSBL, which has as its main goal robustness of partially analyzed structures.
This paper provides an overview of current research on a hybrid and robust parsing architecture for the morphological, syntactic and semantic annotation of German text corpora. The novel contribution of this research lies not in the individual parsing modules, each of which relies on state-of-the-art algorithms and techniques. Rather what is new about the present approach is the combination of these modules into a single architecture. This combination provides a means to significantly optimize the performance of each component, resulting in an increased accuracy of annotation.
A lot of interest has recently been paid to constraint-based definitions and extensions of Tree Adjoining Grammars (TAG). Examples are the so-called quasi-trees, D-Tree Grammars and Tree Description Grammars. The latter are grammars consisting of a set of formulars denoting trees. TDGs are derivation based where in each derivation step a conjunction is built of the old formular, a formular of the grammar and additional equivalences between node names of the two formulars. This formalism is more powerfull than TAGs. TDGs offer the advantages of MC-TAG and D-Tree Grammars for natural languages and they allow underspecification. However the problem is that TDGs might be unnecessarily powerfull for natural languages. To solve this problem, in this paper, I will propose a local TDGs, a restricted version of TDGs. Local TDGs still have the advantages of TDGs but they are semilinear and therefore more appropriate for natural languages. First, the notion of the semilinearity is defined. Then local TDGs are introduced, and, finally, semilinearity of local Tree Description Languages is proven.
This paper proposes a compositional semantics for lexicalized tree adjoining grammars (LTAG). Tree-local multicompnent derivations allow seperation of semantiv contribution of a lexical item into one component contributing to the predicate argument structure and second a component contributing to scope semantics. Based on this idea a syntx-semantics interface is presented where the compositional semantics depends only on the derivation structure. It is shown that the derivation structure allows an appropriate amount of underspecification. This is illustrated by investigating underspecified representations for quantifier scpoe ambiguities and related phenomena such as adjunct scope and island constraints.
A hierarchy of local TDGs
(1998)
Many recent variants of Tree Adoining Grammars (TAG) allow an underspecifiaction of the parent relation between nodes in a tree, i.e. they do not deal with fully specified trees as it is the case with TAGs.Such TAG variants are for example Description Tree Grammars (DTG), Unordered Vector Grammars with Dominance Links (UVG-DL), a definition of TAGs via so-called quasi trees and Tree Description Grammars (TDG. The last TAg variant, local TDG, is an extension of TAG generating Tree Descriptions. Local TDGs even allow an underspecification of the dominance relation between node names and thereby provide the possibility to generate underspecified representations for structural ambiguities such as quantifier scope ambiguities. This abstract deals with formal properties of local TDGs. A hierarchiy of local TDGs is established together with a pumping lemma for local TDGs of a certain rank.
Tree-local MCTAG with shared nodes : an analysis of word order variation in German and Korean
(2004)
Tree Adjoining Grammars (TAG) are known not to be powerful enough to deal with scrambling in free word order languages. The TAG-variants proposed so far in order to account for scrambling are not entirely satisfying. Therefore, an alternative extension of TAG is introduced based on the notion of node sharing. Considering data from German and Korean, it is shown that this TAG-extension can adequately analyse scrambling data, also in combination with extraposition and topicalization.
In this paper, we present an open-source parsing environment (Tübingen Linguistic Parsing Architecture, TuLiPA) which uses Range Concatenation Grammar (RCG) as a pivot formalism, thus opening the way to the parsing of several mildly context-sensitive formalisms. This environment currently supports tree-based grammars (namely Tree-Adjoining Grammars (TAG) and Multi-Component Tree-Adjoining Grammars with Tree Tuples (TT-MCTAG)) and allows computation not only of syntactic structures, but also of the corresponding semantic representations. It is used for the development of a tree-based grammar for German.
This paper proposes a corpus encoding standard that meets the needs of linguistic research using a variety of linguistic data structures. The standard was developed in SFB 441, a research project at the University of Tuebingen. The principal concern of SFB 441 are the empirical data structures which feed into linguistic theory building. SFB 441 consists of several projects, most of which are building corpora to empirically investigate various linguistic phenomena in various languages (e.g. modal verbs in German, forms of address and politeness in Russian). These corpora will form the components of the "Tuebingen collection of reusable, empirical, linguistic data structures (TUSNELDA)". The TUSNELDA annotation standard aims at providing a uniform encoding scheme for all subcorpora and texts of TUSNELDA such that they can be processed with uniform standardized tools. To guarantee maximal reusability we use XML for encoding. Previous SGML standards for text encoding were provided by the Text Encoding Initiative (TEI) and the Expert Advisory Group on Language Engineering Standards (Corpus Encoding Standard, CES). The TUSNELDA standard is based on TEI and XCES (XML version of CES) but takes into account the specific needs of the SFB projects, i.e. the peculiarities of the examined languages and linguistic phenomena.
Existing analyses of German scrambling phenomena within TAG-related formalisms all use non-local variants of TAG. However, there are good reasons to prefer local grammars, in particular with respect to the use of the derivation structure for semantics. Therefore this paper proposes to use local TDGs, a TAG-variant generating tree descriptions that shows a local derivation structure. However the construction of minimal trees for the derived tree descriptions is not subject to any locality constraint. This provides just the amount of non-locality needed for an adequate analysis of scrambling. To illustrate this a local TDG for some German scrambling data is presented.
This paper develops a framework for TAG (Tree Adjoining Grammar) semantics that brings together ideas from different recent approaches.Then, within this framework, an analysis of scope is proposed that accounts for the different scopal properties of quantifiers, adverbs, raising verbs and attitude verbs. Finally, including situation variables in the semantics, different situation binding possibilities are derived for different types of quantificational elements.
This paper presents an LTAG analysis of reflexives like himself and reciprocals like each other. These items need to find a c-commanding antecedent from which they retrieve (part of) their own denotation and with which they syntactically agree. The relation between anaphoric item and antecendent must satisfy the following important locality conditions (Chomsky (1981)).
Relative quantifier scope in German depends, in contrast to English, very much on word order. The scope possibilities of a quantifier are determined by its surface position, its base position and the type of the quantifier. In this paper we propose a multicomponent analysis for German quantifiers computing the scope of the quantifier, in particular its minimal nuclear scope, depending on the syntactic configuration it occurs in.