Refine
Year of publication
Document Type
- Article (181)
- Part of Periodical (69)
- Preprint (62)
- Book (37)
- Part of a Book (28)
- Conference Proceeding (26)
- Working Paper (15)
- Report (8)
- Doctoral Thesis (4)
- Other (3)
Language
- English (437) (remove)
Has Fulltext
- yes (437) (remove)
Is part of the Bibliography
- no (437) (remove)
Keywords
- Computerlinguistik (28)
- Deutsch (20)
- Syntax (16)
- Japanisch (15)
- new species (11)
- Grammatik (10)
- Multicomponent Tree Adjoining Grammar (9)
- Optimalitätstheorie (9)
- Maschinelle Übersetzung (8)
- Syntaktische Analyse (8)
Institute
- Extern (437) (remove)
We adopt Markert and Nissim (2005)’s approach of using the World Wide Web to resolve cases of coreferent bridging for German and discuss the strength and weaknesses of this approach. As the general approach of using surface patterns to get information on ontological relations between lexical items has only been tried on English, it is also interesting to see whether the approach works for German as well as it does for English and what differences between these languages need to be accounted for. We also present a novel approach for combining several patterns that yields an ensemble that outperforms the best-performing single patterns in terms of both precision and recall.
Multicomponent Tree Adjoining Grammars (MCTAG) is a formalism that has been shown to be useful for many natural language applications. The definition of MCTAG however is problematic since it refers to the process of the derivation itself: a simultaneity constraint must be respected concerning the way the members of the elementary tree sets are added. This way of characterizing MCTAG does not allow to abstract away from the concrete order of derivation. In this paper, we propose an alternative definition of MCTAG that characterizes the trees in the tree language of an MCTAG via the properties of the derivation trees (in the underlying TAG) the MCTAG licences. This definition gives a better understanding of the formalism, it allows a more systematic comparison of different types of MCTAG, and, furthermore, it can be exploited for parsing.
In this paper, we investigate the usefulness of a wide range of features for their usefulness in the resolution of nominal coreference, both as hard constraints (i.e. completely removing elements from the list of possible candidates) as well as soft constraints (where a cumulation of violations of soft constraints will make it less likely that a candidate is chosen as the antecedent). We present a state of the art system based on such constraints and weights estimated with a maximum entropy model, using lexical information to resolve cases of coreferent bridging.
The work presented here addresses the question of how to determine whether a grammar formalism is powerful enough to describe natural languages. The expressive power of a formalism can be characterized in terms of i) the string languages it generates (weak generative capacity (WGC)) or ii) the tree languages it generates (strong generative capacity (SGC)). The notion of WGC is not enough to determine whether a formalism is adequate for natural languages. We argue that even SGC is problematic since the sets of trees a grammar formalism for natural languages should be able to generate is difficult to determine. The concrete syntactic structures assumed for natural languages depend very much on theoretical stipulations and empirical evidence for syntactic structures is rather hard to obtain. Therefore, for lexicalized formalisms, we propose to consider the ability to generate certain strings together with specific predicate argument dependencies as a criterion for adequacy for natural languages.
This review lists Agama smithii Boulenger 1896 as a synonym of Agama agama (Linnaeus 1758), Agama trachypleura Peters 1982 as a synonym of Acanthocercus phillipsii (Boulenger 1895) and describes for the first time Acanthocercus guentherpetersi n. sp. Without more convincing evidence, Chamaeleon ruspolii Boettger 1893 cannot be accepted as specifically distinct from Chamaeleo dilepis Leach 1819, nor Chamaeleo calcaricarens Böhme 1985 from C. africanus Laurenti 1768. Consequently, 101 species of lizard are currently recognised in Ethiopia, of which some 40% appear to be denizens of the Somali-arid zone. This significant proportion is attributable in part to the importance of the Horn of Africa as a centre for reptilian diversification and endemicity, in part to the fact that this lowland fauna was rather extensively sampled during the 1930s, but also to the conspicuous neglect of lizards in other regions of the country. Mountain and forested habitats are widespread in Ethiopia, so it seems extraordinary to record only five saurian species which are believed to be endemic in such environments. The inference that there are many more still to be discovered has important implications for conservation, because montane forest is known to be among the most threatened of Ethiopian biomes and there is clearly an urgent need for its herpetofauna to be more thoroughly researched and documented.
The medium of (oral) language is mostly disregarded (or overlooked) in contemporary media theories. This "ignoring of language" in media studies is often accompanied by an inadequate transport model of communication, and it converges with an "ignoring of mediality" in mentalistic theories of language. In the present article it will be argued that this misleading opposition of language and media can only be overcome if one already regards oral language, not just written language, as a medium of the human mind. In my argumentation I fall back on Wittgenstein’s conception of language games to try to show how Wittgenstein’s ideas can help us to clear up the problem of the mediality of language and also to show to what extent the mentalistic conception of Chomskyan provenance cannot be adequate to the phenomenon of language.
This paper presents an approach to the question whether it is possible to construct a parser based on ideas from case-based reasoning. Such a parser would employ a partial analysis of the input sentence to select a (nearly) complete syntax tree and then adapt this tree to the input sentence. The experiments performed on German data from the Tüba-D/Z treebank and the KaRoPars partial parser show that a wide range of levels of generality can be reached, depending on which types of information are used to determine the similarity between input sentence and training sentences. The results are such that it is possible to construct a case-based parser. The optimal setting out of those presented here need to be determined empirically.
This paper profiles significant differences in syntactic distribution and differences in word class frequencies for two treebanks of spoken and written German: the TüBa-D/S, a treebank of transliterated spontaneous dialogues, and the TüBa-D/Z treebank of newspaper articles published in the German daily newspaper die tageszeitung´(taz). The approach can be used more generally as a means of distinguishing and classifying language corpora of different genres.
In recent years, research in parsing has extended in several new directions. One of these directions is concerned with parsing languages other than English. Treebanks have become available for many European languages, but also for Arabic, Chinese, or Japanese. However, it was shown that parsing results on these treebanks depend on the types of treebank annotations used. Another direction in parsing research is the development of dependency parsers. Dependency parsing profits from the non-hierarchical nature of dependency relations, thus lexical information can be included in the parsing process in a much more natural way. Especially machine learning based approaches are very successful (cf. e.g.). The results achieved by these dependency parsers are very competitive although comparisons are difficult because of the differences in annotation. For English, the Penn Treebank has been converted to dependencies. For this version, Nivre et al. report an accuracy rate of 86.3%, as compared to an F-score of 92.1 for Charniaks parser. The Penn Chinese Treebank is also available in a constituent and a dependency representations. The best results reported for parsing experiments with this treebank give an F-score of 81.8 for the constituent version and 79.8% accuracy for the dependency version. The general trend in comparisons between constituent and dependency parsers is that the dependency parser performs slightly worse than the constituent parser. The only exception occurs for German, where F-scores for constituent plus grammatical function parses range between 51.4 and 75.3, depending on the treebank, NEGRA or TüBa-D/Z. The dependency parser based on a converted version of Tüba-D/Z, in contrast, reached an accuracy of 83.4%, i.e. 12 percent points better than the best constituent analysis including grammatical functions.
This paper presents a comparative study of probabilistic treebank parsing of German, using the Negra and TüBa-D/Z treebanks. Experiments with the Stanford parser, which uses a factored PCFG and dependency model, show that, contrary to previous claims for other parsers, lexicalization of PCFG models boosts parsing performance for both treebanks. The experiments also show that there is a big difference in parsing performance, when trained on the Negra and on the TüBa-D/Z treebanks. Parser performance for the models trained on TüBa-D/Z are comparable to parsing results for English with the Stanford parser, when trained on the Penn treebank. This comparison at least suggests that German is not harder to parse than its West-Germanic neighbor language English.
This report explores the question of compatibility between annotation projects including translating annotation formalisms to each other or to common forms. Compatibility issues are crucial for systems that use the results of multiple annotation projects. We hope that this report will begin a concerted effort in the field to track the compatibility of annotation schemes for part of speech tagging, time annotation, treebanking, role labeling and other phenomena.
Using a qualitative analysis of disagreements from a referentially annotated newspaper corpus, we show that, in coreference annotation, vague referents are prone to greater disagreement. We show how potentially problematic cases can be dealt with in a way that is practical even for larger-scale annotation, considering a real-world example from newspaper text.
In the past, a divide could be seen between ’deep’ parsers on the one hand, which construct a semantic representation out of their input, but usually have significant coverage problems, and more robust parsers on the other hand, which are usually based on a (statistical) model derived from a treebank and have larger coverage, but leave the problem of semantic interpretation to the user. More recently, approaches have emerged that combine the robustness of datadriven (statistical) models with more detailed linguistic interpretation such that the output could be used for deeper semantic analysis. Cahill et al. (2002) use a PCFG-based parsing model in combination with a set of principles and heuristics to derive functional (f-)structures of Lexical-Functional Grammar (LFG). They show that the derived functional structures have a better quality than those generated by a parser based on a state-of-the-art hand-crafted LFG grammar. Advocates of Dependency Grammar usually point out that dependencies already are a semantically meaningful representation (cf. Menzel, 2003). However, parsers based on dependency grammar normally create underspecified representations with respect to certain phenomena such as coordination, apposition and control structures. In these areas they are too "shallow" to be directly used for semantic interpretation. In this paper, we adopt a similar approach to Cahill et al. (2002) using a dependency-based analysis to derive functional structure, and demonstrate the feasibility of this approach using German data. A major focus of our discussion is on the treatment of coordination and other potentially underspecified structures of the dependency data input. F-structure is one of the two core levels of syntactic representation in LFG (Bresnan, 2001). Independently of surface order, it encodes abstract syntactic functions that constitute predicate argument structure and other dependency relations such as subject, predicate, adjunct, but also further semantic information such as the semantic type of an adjunct (e.g. directional). Normally f-structure is captured as a recursive attribute value matrix, which is isomorphic to a directed graph representation. Figure 5 depicts an example target f-structure. As mentioned earlier, these deeper-level dependency relations can be used to construct logical forms as in the approaches of van Genabith and Crouch (1996), who construct underspecified discourse representations (UDRSs), and Spreyer and Frank (2005), who have robust minimal recursion semantics (RMRS) as their target representation. We therefore think that f-structures are a suitable target representation for automatic syntactic analysis in a larger pipeline of mapping text to interpretation. In this paper, we report on the conversion from dependency structures to fstructure. Firstly, we evaluate the f-structure conversion in isolation, starting from hand-corrected dependencies based on the TüBa-D/Z treebank and Versley (2005)´s conversion. Secondly, we start from tokenized text to evaluate the combined process of automatic parsing (using Foth and Menzel (2006)´s parser) and f-structure conversion. As a test set, we randomly selected 100 sentences from TüBa-D/Z which we annotated using a scheme very close to that of the TiGer Dependency Bank (Forst et al., 2004). In the next section, we sketch dependency analysis, the underlying theory of our input representations, and introduce four different representations of coordination. We also describe Weighted Constraint Dependency Grammar (WCDG), the dependency parsing formalism that we use in our experiments. Section 3 characterises the conversion of dependencies to f-structures. Our evaluation is presented in section 4, and finally, section 5 summarises our results and gives an overview of problems remaining to be solved.
This paper compares two approaches to computational semantics, namely semantic unification in Lexicalized Tree Adjoining Grammars (LTAG) and Lexical Resource Semantics (LRS) in HPSG. There are striking similarities between the frameworks that make them comparable in many respects. We will exemplify the differences and similarities by looking at several phenomena. We will show, first of all, that many intuitions about the mechanisms of semantic computations can be implemented in similar ways in both frameworks. Secondly, we will identify some aspects in which the frameworks intrinsically differ due to more general differences between the approaches to formal grammar adopted by LTAG and HPSG.
Relative quantifier scope in German depends, in contrast to English, very much on word order. The scope possibilities of a quantifier are determined by its surface position, its base position and the type of the quantifier. In this paper we propose a multicomponent analysis for German quantifiers computing the scope of the quantifier, in particular its minimal nuclear scope, depending on the syntactic configuration it occurs in.
In order to understand the specific structures and features of the German surnames the most important facts about their emergence and history should be outlined and, at the same time, be compared with the Swedish surnames because there are considerable differences (for further details cf. Nubling 1997 a, b). First of all, surnames in Germany emerged rather early, with the first instances occurring in the 11th century in southern Germany; by the 16th century surnames were common all over Germany. Differences are related to geography (from south to north), social class (from the upper to the lower classes) und urban versus rural areas.
As editor of the next iteration of the Köchel Catalogue, I have to deal with the current (sixth) edition’s Appendix C, devoted to "Doubtful and Misattributed Works." My goal is to reduce the potentially vast dimensions of that appendix to only those works for which some connection to Mozart cannot be ruled out. In the decades since 1964, when the current edition of Köchel was published, many of the works listed in Appendix C have been convincingly attributed to other composers. Other works therein can confidently be dismissed as never having had any meaningful connection to Mozart. Yet even after removing the reattributed and trivially misattributed works from the appendix, we are left with a handful of works that may possibly have had something to do with Mozart, even if clear evidence one way or the other remains elusive. One must, of course, be cautious in removing questionable and doubtful works from the catalogue, as the present case-study will illustrate. The work under consideration, catalogued as K6 Anh. C 9.07, is an unaccompanied piece for three or four voices with the text "Venerabilis barba capucinorum." ...
Advantageous fragmentation? : reimagining metropolitan governance and spatial planning in Rhine-Main
(2006)
This paper traces the latest round of debates about appropriate scales and scopes of government and governance in Rhine-Main - an economically highly integrated but politically, territorially and emotionally divided region. We identify a downscaling of political power from the regional to the municipal level, and an upscaling of informal networking and image building to an extended regional scale. These countertrends are signs of a more complex geographical rearrangement in municipal and institutional relations. The inherent contradictions in the rescaling and reimagining of Rhine-Main are evident in the Strategic Vision for Frankfurt/Rhein-Main 2020. Its new conceptualization of Rhine-Main postulates complementary polycentricity as a competitive asset but remains firmly grounded in an institutional territorial logic that contravenes its own economically-driven agenda.
Meadowbird populations in The Netherlands are under great pressure. Recently, predation is named increasingly
often as one of the key factors in contributing to the declines. A four-year research project (2001-2005) aimed to
collect (as yet mostly nonexisting) data to provide a factual basis for this discussion. A country-wide inventory based
on data for wader nests found by volunteers who mark nests for their protection from grazing/mowing indicated that
above-average predation losses are found predominantly in the half-open landscapes of northern and eastern Netherlands,
but also locally in the low-lying open grasslands which are the key areas for meadowbirds. Nest predation has increased in recent years, but the same is true for agricultural losses, at least in areas where no nest-protection takes
place. At a local scale, predation losses vary greatly from area to area and from year to year. Temperature loggers in nest showed that diurnal and nocturnal predators contribute equally in total predation losses up to 50%, but higher predation losses are mainly caused by nocturnal predators. As many as 10 animal species were identified as nest predators
on nests under surveillance with video cameras. Chick survival, investigated using radiotelemetry, was very low. About 60-80% were lost by predation, 5-15% by agricultural activities and 10-15% to all kind of other losses. At least 15
predator species were implied, with an apparently larger share taken by birds (notably Buzzard (16%) and Grey Heron
(7-18%)) than mammals, with one exception: stoat (16%). Of the most-discussed predator species, Carrion Crows were
W. Teunissen et al. Osnabrücker Naturwiss. Mitt. 32 2006
138 remarkably rarely involved in both nest and chick predation, while Red Foxes take a large toll of clutches in some areas, but not in others. Of all losses during the reproductive cycle about 75% and 60% was due to predation in Lapwing and Black-tailed Godwit respectively. Predation on chicks by birds had the largest effect on total breeding success, but at the same time elimination of this loss factor (if at all possible) alone would not be sufficient to establish a self-sustaining population. Predation seems to have become a factor of importance in some areas, in combination with already existing other losses. Our findings suggest that solutions to predation problems probably have to be found in locally/regionally targeted, specific action on multiple fronts rather than countrywide measures.
Black-tailed Godwits (Limosa limosa) have been declining for decades in The Netherlands and so far this has not been slowed by conservation measures. A new form of agri-environment scheme was tried out in 2003-2005 at 6 sites where a ‘grassland mosaic’ (200-300 ha) was created by collectives of farmers through a diverse use of fields including postponed and staggered mowing, (early) grazing, creating ‘refuge strips’ during mowing, and active nest protection. We measured breeding success of godwits in each of the experimental sites and nearby, paired controls. Breeding success was higher (0.28 chicks fledged / pair) in mosaics than in controls, but due to lower agricultural nest losses only. Chick survival was 11 % in both mosaics and controls. The amount of late-mown and other grassland suitable for chicks hardly differed between treatments during the fledging period, mainly due to rainfall delaying postponed mowing in all sites. Chick survival was however positively correlated with site variation in the amount of high grass (>18 cm). Breeding success was high enough to compensate for adult mortality (ca. 0.6) in only one mosaic site. Chick survival was lower than in previous Godwit studies, indicating that additional loss factors have increased. Predation (50-80 % of chicks, mostly by birds) is a candidate, but changes in the suitability of late-mown grassland (insect abundance and sward density in grass monocultures) may also play a role. Consequently a higher management investment is needed to achieve a self-sustaining population.
In this study, we report the results of a long-term investigation on changes in population size and fledging success of Northern Lapwing on Wangerooge, a German Wadden Sea island. This population is increasing over a period of 34 years in contrast to numerous populations in North-western Europe. The reproductive success however declines over time and also with population density. Both effects cannot be considered separately due to autocorrelation. However, it is noted that the population on Wangerooge is not sustained by local recruitment only. This outcome is even more alarming as coastal areas and islands are considered as rare high quality meadow bird habitats. According to the present results Wangerooge cannot be considered as a source habitat for Northern Lapwings in North-western Germany.
Human impacts on the landscape have increased the penalties for Black-tailed Godwits laying their eggs too late, especially in the very intensive agricultural landscapes of The Netherlands. Thus, godwits have experienced a dramatic change of their fitness landscape, because the advance in mowing date made late clutches worthless destroying either eggs or chicks. To determine the driving forces of the recent population decline we study the individual variation in timing of breeding with respect to reproductive success in a population unaffected by mowing. Our results show that even in a low intensity agricultural area it is very important for godwits to breed early in the season.
The retreat of BE as perfect auxiliary in the history of English is examined. Corpus data are presented showing that the initial advance of HAVE was most closely connected to a restriction against BE in past counterfactuals. Other factors which have been reported to favor the spread of HAVE are either dependent on the counterfactual effect, or significantly weaker in comparison. It is argued that the effect can be traced to the semantics of the BE perfect, which denoted resultativity rather than anteriority proper. Related data from other older Germanic and Romance languages are presented, and finally implications for existing theories of auxiliary selection stemming from the findings presented are discussed.
In this article we examine and "exapt" Wurzel's concept of superstable markers in an innovative manner. We develop an extended view of superstability through a critical discussion of Wurzel's original definition and the status of marker-superstability versus allomorphy in Natural Morphology: As we understand it, superstability is - above and beyond a step towards uniformity - mainly a symptom for the weakening of the category affected (cf. 1.,2. and 4.). This view is exemplified in four short case studies on superstability in different grammatical categories of four Germanic languages: genitive case in Mainland Scandinavian and English (3.1), plural formation in Dutch (3.2), second person singular ending -st in German (3.3), and ablaut generalisation in Luxembourgish (3.4).
In this text, we describe the development of a broad coverage grammar for Japanese that has been built for and used in different application contexts. The grammar is based on work done in the Verbmobil project (Siegel 2000) on machine translation of spoken dialogues in the domain of travel planning. The second application for JACY was the automatic email response task. Grammar development was described in Oepen et al. (2002a). Third, it was applied to the task of understanding material on mobile phones available on the internet, while embedded in the project DeepThought (Callmeier et al. 2004, Uszkoreit et al. 2004). Currently, it is being used for treebanking and ontology extraction from dictionary definition sentences by the Japanese company NTT (Bond et al. 2004).
In this paper we describe SOBA, a sub-component of the SmartWeb multi-modal dialog system. SOBA is a component for ontologybased information extraction from soccer web pages for automatic population of a knowledge base that can be used for domainspecific question answering. SOBA realizes a tight connection between the ontology, knowledge base and the information extraction component. The originality of SOBA is in the fact that it extracts information from heterogeneous sources such as tabular structures, text and image captions in a semantically integrated way. In particular, it stores extracted information in a knowledge base, and in turn uses the knowledge base to interpret and link newly extracted information with respect to already existing entities.
This demo abstract describes the SmartWeb Ontology-based Information Extraction System (SOBIE). A key feature of SOBIE is that all information is extracted and stored with respect to the SmartWeb ontology. In this way, other components of the systems, which use the same ontology, can access this information in a straightforward way. We will show how information extracted by SOBIE is visualized within its original context, thus enhancing the browsing experience of the end user.
In this paper I present five alternations of the verb system of Modern Greek, which are recurrently mapped on the syntactic frame NPi__NP. The actual claim is that only the participation in alternations and/or the allocation to an alternation variant can reliably determine the relation between a verb derivative and its base. In the second part, the conceptual structures and semantic/situational fields of a large number of “-ízo” derivatives appearing inside alternation classes are presented. The restricted character of the conceptual and situational preferences inside alternations classes suggests the dominant character of the alternations component.
Effects of BPA in snails
(2006)
It is an ethical requirement that new findings be presented in light of and in conjunction with a balanced evaluation of the current knowledge and published literature. We believe that Oehlmann et al. (2006) violated this general principle in several ways. For example, the authors inferred that prosobranch snails have a functional estrogen receptor and therefore a much higher sensitivity to estrogens and endocrine-disrupting compounds (EDCs) than other species previously reported in the literature. We found several other problems in their article...
In the last decade, the Penn treebank has become the standard data set for evaluating parsers. The fact that most parsers are solely evaluated on this specific data set leaves the question unanswered how much these results depend on the annotation scheme of the treebank. In this paper, we will investigate the influence which different decisions in the annotation schemes of treebanks have on parsing. The investigation uses the comparison of similar treebanks of German, NEGRA and TüBa-D/Z, which are subsequently modified to allow a comparison of the differences. The results show that deleted unary nodes and a flat phrase structure have a negative influence on parsing quality while a flat clause structure has a positive influence.
This paper develops a framework for TAG (Tree Adjoining Grammar) semantics that brings together ideas from different recent approaches.Then, within this framework, an analysis of scope is proposed that accounts for the different scopal properties of quantifiers, adverbs, raising verbs and attitude verbs. Finally, including situation variables in the semantics, different situation binding possibilities are derived for different types of quantificational elements.
This paper profiles significant differences in syntactic distribution and differences in word class frequencies for two treebanks of spoken and written German: the TüBa-D/S, a treebank of transliterated spontaneous dialogs, and the TüBa-D/Z treebank of newspaper articles published in the German daily newspaper ´die tageszeitung´(taz). The approach can be used more generally as a means of distinguishing and classifying language corpora of different genres.
Multicomponent Tree Adjoining Grammars (MCTAG) is a formalism that has been shown to be useful for many natural language applications. The definition of MCTAG however is problematic since it refers to the process of the derivation itself: a simultaneity constraint must be respected concerning the way the members of the elementary tree sets are added. Looking only at the result of a derivation (i.e., the derived tree and the derivation tree), this simultaneity is no longer visible and therefore cannot be checked. I.e., this way of characterizing MCTAG does not allow to abstract away from the concrete order of derivation. Therefore, in this paper, we propose an alternative definition of MCTAG that characterizes the trees in the tree language of an MCTAG via the properties of the derivation trees the MCTAG licences.
When a statistical parser is trained on one treebank, one usually tests it on another portion of the same treebank, partly due to the fact that a comparable annotation format is needed for testing. But the user of a parser may not be interested in parsing sentences from the same newspaper all over, or even wants syntactic annotations for a slightly different text type. Gildea (2001) for instance found that a parser trained on the WSJ portion of the Penn Treebank performs less well on the Brown corpus (the subset that is available in the PTB bracketing format) than a parser that has been trained only on the Brown corpus, although the latter one has only half as many sentences as the former. Additionally, a parser trained on both the WSJ and Brown corpora performs less well on the Brown corpus than on the WSJ one. This leads us to the following questions that we would like to address in this paper: - Is there a difference in usefulness of techniques that are used to improve parser performance between the same-corpus and the different-corpus case? - Are different types of parsers (rule-based and statistical) equally sensitive to corpus variation? To achieve this, we compared the quality of the parses of a hand-crafted constraint-based parser and a statistical PCFG-based parser that was trained on a treebank of German newspaper text.
This paper describes the creation and preparation of TUSNELDA, a collection of corpus data built for linguistic research. This collection contains a number of linguistically annotated corpora which differ in various aspects such as language, text sorts / data types, encoded annotation levels, and linguistic theories underlying the annotation. The paper focuses on this variation on the one hand and the way how these heterogeneous data are integrated into one resource on the other hand.
The current study was part of a series of environment related studies of the Jabal Akhdar sponsored by the Sultan Qaboos University, Al Khoud, Sultanate of Oman. The present study aimed to establish the range, habitat, status and population of breeding species in the area, review the historical perspective and list migrant and visitor species noted during the survey.
Trubetzkoy's recognition of a delimitative function of phonology, serving to signal boundaries between morphological units, is expressed in terms of alignment constraints in Optimality Theory, where the relevant constraints require specific morphological boundaries to coincide with phonological structure (Trubetzkoy 1936, 1939, McCarthy & Prince 1993). The approach pursued in the present article is to investigate the distribution of phonological boundary signals to gain insight into the criteria underlying morphological analysis. The evidence from English and Swedish suggests that necessary and sufficient conditions for word-internal morphological analysis concern the recognizability of head constituents, which include the rightmost members of compounds and head affixes. The claim is that the stability of word-internal boundary effects in historical perspective cannot in general be sufficiently explained in terms of memorization and imitation of phonological word form. Rather, these effects indicate a morphological parsing mechanism based on the recognition of word-internal head constituents. Head affixes can be shown to contrast systematically with modifying affixes with respect to syntactic function, semantic content, and prosodic properties. That is, head affixes, which cannot be omitted, often lack inherent meaning and have relatively unmarked boundaries, which can be obscured entirely under specific phonological conditions. By contrast, modifying affixes, which can be omitted, consistently have inherent meaning and have stronger boundaries, which resist prosodic fusion in all phonological contexts. While these correlations are hardly specific to English and Swedish it remains to be investigated to which extent they hold cross-linguistically. The observation that some of the constituents identified on the basis of prosodic evidence lack inherent meaning raises the issue of compositionality. I will argue that certain systematic aspects of word meaning cannot be captured with reference to the syntagmatic level, but require reference to the paradigmatic level instead. The assumption is then that there are two dimensions of morphological analysis: syntagmatic analysis, which centers on the criteria for decomposing words in terms of labelled constituents, and paradigmatic analysis, which centers on the criteria for establishing relations among (whole) words in the mental lexicon. While meaning is intrinsically connected with paradigmatic analysis (e.g. base relations, oppositeness) it is not essential to syntagmatic analysis.
This paper proposes an annotating scheme that encodes honorifics (respectful words). Honorifics are used extensively in Japanese, reflecting the social relationship (e.g. social ranks and age) of the referents. This referential information is vital for resolving zero
pronouns and improving machine translation outputs. Annotating honorifics is a complex task that involves identifying a predicate with honorifics, assigning ranks to referents of the
predicate, calibrating the ranks, and connecting referents with their predicates.
While the sortal constraints associated with Japanese numeral classifiers are well-studied, less attention has been paid to the details of their syntax. We describe an analysis implemented within a broad-coverage HPSG that handles an intricate set of numeral classifier construction types and compositionally relates each to an appropriate semantic representation, using Minimal Recursion Semantics.
The Deep Linguistic Processing with HPSG Initiative (DELH-IN) provides the infrastructure needed to produce open-source semantic transfer-based machine translation systems. We have made available a prototype Japanese-English machine translation system built from existing resources include parsers, generators, bidirectional grammars and a transfer engine.
Articulatory token-to-token variability not only depends on linguistic aspects like the phoneme inventory of a given language but also on speaker specific morphological and motor constraints. As has been noted previously (Perkell (1997), Mooshammer et al. (2004)) , speakers with coronally high "domeshaped" palates exhibit more articulatory variability than speakers with coronally low "flat" palates. One explanation for that is based on perception oriented control by the speaker. The influence of articulatory variation on the cross sectional area and consequently on the acoustics should be greater for flat palates than for domeshaped ones. This should force speakers with flat palates to place their tongue very precisely whereas speakers with domeshaped palates might tolerate a greater variability. A second explanation could be a greater amount of lateral linguo-palatal contact for flat palates holding the tongue in position. In this study both hypotheses were tested.
LTAG semantics for questions
(2004)
This papers presents a compositional semantic analysis of interrogatives clauses in LTAG (Lexicalized Tree Adjoining Grammar) that captures the scopal properties of wh- and nonwh-quantificational elements. It is shown that the present approach derives the correct semantics for examples claimed to be problematic for LTAG semantic approaches based on the derivation tree. The paper further provides an LTAG semantics for embedded interrogatives.
Weak function word shift
(2004)
The fact that object shift only affects weak pronouns in mainland Scandinavian is seen as an instance of a more general observation that can be made in all Germanic languages: weak function words tend to avoid the edges of larger prosodic domains. This generalisation has been formulated within Optimality Theory in terms of alignment constraints on prosodic structure by Selkirk (1996) in explaining thedistribution of prosodically strong and weak forms of English functionwords, especially modal verbs, prepositions and pronouns. But a purely phonological account fails to integrate the syntactic licensing conditions for object shift in an appropriate way. The standard semantico-syntactic accounts of object shift, onthe other hand, fail to explain why it is only weak pronouns that undergo object shift. This paper develops an Optimality theoretic model of the syntax-phonology interface which is based on the interaction of syntactic and prosodic factors. The account can successfully be applied to further related phenomena in English and German.
German dialects vary in which of the possible orders of the verbs in a 3-verb cluster they allow. In a still ongoing empirical investigation that I am undertaking together with Tanja Schmid, University of Stuttgart (Schmid and Vogel (2004)) we already found that each of the six logically possible permutations of the 3-verb cluster in (1) can be found in German dialects.
This paper reports the results of a corpus investigation on case conflicts in German argument free relative constructions. We investigate how corpus frequencies reflect the relative markedness of free relative and correlative constructions, the relative markedness of different case conflict configurations, and the relative markedness of different conflict resolution strategies. Section 1 introduces the conception of markedness as used in Optimality Theory. Section 2 introduces the facts about German free relative clauses, and section 3 presents the results of the corpus study. By and large, markedness and frequency go hand in hand. However, configurations at the highest end of the markedness scale rarely show up in corpus data, and for the configuration at the lowest end we found an unexpected outcome: the more marked structure is preferred.
The purpose of this paper is to describe the TüBa-D/Z treebank of written German and to compare it to the independently developed TIGER treebank (Brants et al., 2002). Both treebanks, TIGER and TüBa-D/Z, use an annotation framework that is based on phrase structure grammar and that is enhanced by a level of predicate-argument structure. The comparison between the annotation schemes of the two treebanks focuses on the different treatments of free word order and discontinuous constituents in German as well as on differences in phrase-internal annotation.
Tree-local MCTAG with shared nodes : an analysis of word order variation in German and Korean
(2004)
Tree Adjoining Grammars (TAG) are known not to be powerful enough to deal with scrambling in free word order languages. The TAG-variants proposed so far in order to account for scrambling are not entirely satisfying. Therefore, an alternative extension of TAG is introduced based on the notion of node sharing. Considering data from German and Korean, it is shown that this TAG-extension can adequately analyse scrambling data, also in combination with extraposition and topicalization.
Transforming constituent-based annotation into dependency-based annotation has been shown to work for different treebanks and annotation schemes (e.g. Lin (1995) has transformed the Penn treebank, and Kübler and Telljohann (2002) the Tübinger Baumbank des Deutschen (TüBa-D/Z)). These ventures are usually triggered by the conflict between theory-neutral annotation, that targets most needs of a wider audience, and theory-specific annotation, that provides more fine-grained information for a smaller audience. As a compromise, it has been pointed out that treebanks can be designed to support more than one theory from the start (Nivre, 2003). We argue that information can also be added to an existing annotation scheme so that it supports additional theory-specific annotations. We also argue that such a transformation is useful for improving and extending the original annotation scheme with respect to both ambiguous annotation and annotation errors. We show this by analysing problems that arise when generating dependency information from the constituent-based TüBa-D/Z.
This paper reports on the SYN-RA (SYNtax-based Reference Annotation) project, an on-going project of annotating German newspaper texts with referential relations. The project has developed an inventory of anaphoric and coreference relations for German in the context of a unified, XML-based annotation scheme for combining morphological, syntactic, semantic, and anaphoric information. The paper discusses how this unified annotation scheme relates to other formats currently discussed in the literature, in particular the annotation graph model of Bird and Liberman (2001) and the pie-in-thesky scheme for semantic annotation.
The purpose of this paper is to describe recent developments in the morphological, syntactic, and semantic annotation of the TüBa-D/Z treebank of German. The TüBa-D/Z annotation scheme is derived from the Verbmobil treebank of spoken German [4, 10], but has been extended along various dimensions to accommodate the characteristics of written texts. TüBa-D/Z uses as its data source the "die tageszeitung" (taz) newspaper corpus. The Verbmobil treebank annotation scheme distinguishes four levels of syntactic constituency: the lexical level, the phrasal level, the level of topological fields, and the clausal level. The primary ordering principle of a clause is the inventory of topological fields, which characterize the word order regularities among different clause types of German, and which are widely accepted among descriptive linguists of German [3, 6]. The TüBa-D/Z annotation relies on a context-free backbone (i.e. proper trees without crossing branches) of phrase structure combined with edge labels that specify the grammatical function of the phrase in question. The syntactic annotation scheme of the TüBa-D/Z is described in more detail in [12, 11]. TüBa-D/Z currently comprises approximately 15 000 sentences, with approximately 7 000 sentences being in the correction phase. The latter will be released along with an updated version of the existing treebank before the end of this year. The treebank is available in an XML format, in the NEGRA export format [1] and in the Penn treebank bracketing format. The XML format contains all types of information as described above, the NEGRA export format contains all sentenceinternal information while the Penn treebank format includes only those layers of information that can be expressed as pure tree structures. Over the course of the last year, more fine grained linguistic annotations have been added along the following dimensions: 1. the basic Stuttgart-Tübingen tagset, STTS, [9] labels have been enriched by relevant features of inflectional morphology, 2. named entity information has been encoded as part of the syntactic annotation, and 3. a set of anaphoric and coreference relations has been added to link referentially dependent noun phrases. In the following sections, we will describe each of these innovations in turn and will demonstrate how the additional annotations can be incorporated into one comprehensive annotation scheme.
This paper is concerned with the tagging of spatial expressions in German newspaper articles, assigning a meaning to the expression and classifying the usages of the spatial expression and linking the derived referent to an event description. In our system, we implemented the activation of concepts in a very simple fashion, a concept is activated once (with a cost depending on the item that activated it) and is left activated thereafter. As an example, a city also activates the nodes for the region and the country it is part of, so that cities from one country are chosen over cities from different countries. A test corpus of 12 German newspaper articles was tested regarding several disambiguation strategies. Disambiguation was carried out via a beam search to find an approximately cost-optimal solution for the conflict set of potential grounding candidates for the tagged spatial expression. Test showed that the disambiguation strategies improved accuracy significantly.
This paper sets up a framework for LTAG (Lexicalized Tree Adjoining Grammar) semantics that brings together ideas from different recent approaches addressing some shortcomings of TAG semantics based on the derivation tree. Within this framework, several sample analyses are proposed, and it is shown that the framework allows to analyze data that have been claimed to be problematic for derivation tree based LTAG semantics approaches.
Dialectal variation in german 3-verb clusters : a surface-oriented optimality theoretic account
(2004)
We present data from an empirical investigation on the dialectal variation in the syntax of German 3-verb clusters, consisting of a temporal auxiliary, a modal verb, and a predicative verb. The ordering possibilities vary greatly among the dialects. Some of the orders that we found occur only under particular stress assignments. We assume that these orders fulfil an information structural purpose and that the reordering processes are changes only in the linear order of the elements which is represented exclusively at the surface syntactic level, PF (Phonetic Form). Our Optimality theoretic account offers a multifactorial perspective on the phenomenon.
CONTENTS: WHITHER THE SOUTH AFRICAN PUBLISHING INDUSTRY ? 4;
APNET MESSAGE TO AFRICAN PUBLISHERS ON WORLD BOOK DAY 11 ;
FUNDING OPPORTUNITIES FOR OPERATORS IN CULTURE-RELATED INDUSTRIES 13;
4TH SALON INTERNATIONAL DU LIVRE D’ABIDJAN (SILA) 2004 16;
THE NIGERIA INTERNATIONAL BOOK FAIR (NIBF) 2004 20;
THE NOMA AWARD 2003 PRESENTATION 22;
A NEW CONSULTANCY FIRM IS FORMED 27;
EDILIS HOLD DEDICATION CEREMONY 30;
LETTERS TO THE EDITOR 34;
NEWS FROM PARTNER ORGANISATIONS 41;
NOTICE 44;
PROMOTIONS 50
When the concept of the auteur was coined in the 1950s and 1960s, it was an initiative to clarify the obscure matters of authorship in cinema. Because a film must necessarily be a collective work, understood as the result of a large number of creative contributions, it was often unclear who the decisive power behind a certain film was, who contributed the "distinctive quality". The control will usually belong to the director, the producer or the star (or all three in combination), but what singles out a given film could also come from the cinematographer, the scriptwriter, from the author of an adapted literary work, or from traditions in the studio or in the genre. Nothing can be taken for granted about a film's authorship, it can only be decided through a thorough analysis of each film's production process, an analysis that, in most cases, will be impossible to make. ...
The aim of this paper is the exploration of an optimality theoretic architecture for syntax that is guided by the concept of "correspondence": syntax is understood as the mechanism of "translating" underlying representations into a surface form. In minimalism, this surface form is called "Phonological Form" (PF). Both semantic and abstract syntactic information are reflected by the surface form. The empirical domain where this architecture is tested are minimal link effects, especially in the case of "wh"-movement. The OT constraints require the surface form to reflect the underlying semantic and syntactic representations as maximally as possible. The means by which underlying relations and properties are encoded are precedence, adjacency, surface morphology and prosodic structure. Information that is not encoded in one of these ways remains unexpressed, and gets lost unless it is recoverable via the context. Different kinds of information are often expressed by the same means. The resulting conflicts are resolved by the relative ranking of the relevant correspondence constraints.
The argument that I tried to elaborate on in this paper is that the conceptual problem behind the traditional competence/performance distinction does not go away, even if we abandon its original Chomskyan formulation. It returns as the question about the relation between the model of the grammar and the results of empirical investigations – the question of empirical verification The theoretical concept of markedness is argued to be an ideal correlate of gradience. Optimality Theory, being based on markedness, is a promising framework for the task of bridging the gap between model and empirical world. However, this task not only requires a model of grammar, but also a theory of the methods that are chosen in empirical investigations and how their results are interpreted, and a theory of how to derive predictions for these particular empirical investigations from the model. Stochastic Optimality Theory is one possible formulation of a proposal that derives empirical predictions from an OT model. However, I hope to have shown that it is not enough to take frequency distributions and relative acceptabilities at face value, and simply construe some Stochastic OT model that fits the facts. These facts first of all need to be interpreted, and those factors that the grammar has to account for must be sorted out from those about which grammar should have nothing to say. This task, to my mind, is more complicated than the picture that a simplistic application of (not only) Stochastic OT might draw.
Japanese is often taken to be strictly head-final in its syntax. In our work on a broad-coverage, precision implemented HPSG for Japanese, we have found that while this is generally true, there are nonetheless a few minor exceptions to the broad trend. In this paper, we describe the grammar engineering project, present the exceptions we have found, and conclude that this kind of phenomenon motivates on the one hand the HPSG type hierarchical approach which allows for the statement of both broad generalizations and exceptions to those generalizations and on the other hand the usefulness of grammar engineering as a means of testing linguistic hypotheses.
While the sortal constraints associated with Japanese numeral classifiers are wellstudied, less attention has been paid to the details of their syntax. We describe an analysis implemented within a broadcoverage HPSG that handles an intricate set of numeral classifier construction types and compositionally relates each to an appropriate semantic representation, using Minimal Recursion Semantics.
Hybrid robust deep and shallow semantic processing for creativity support in document production
(2004)
The research performed in the DeepThought project (http://www.project-deepthought.net) aims at demonstrating the potential of deep linguistic processing if added to existing shallow methods that ensure robustness. Classical information retrieval is extended by high precision concept indexing and relation detection. We use this approach to demonstrate the feasibility of three ambitious applications, one of which is a tool for creativity support in document production and collective brainstorming. This application is described in detail in this paper. Common to all three applications, and the basis for their development is a platform for integrated linguistic processing. This platform is based on a generic software architecture that combines multiple NLP components and on robust minimal recursive semantics (RMRS) as a uniform representation language.
The research performed in the DeepThought project aims at demonstrating the potential of deep linguistic processing if combined with shallow methods for robustness. Classical information retrieval is extended by high precision concept indexing and relation detection. On the basis of this approach, the feasibility of three ambitious applications will be demonstrated, namely: precise information extraction for business intelligence; email response management for customer relationship management; creativity support for document production and collective brainstorming. Common to these applications, and the basis for their development is the XML-based, RMRS-enabled core architecture framework that will be described in detail in this paper. The framework is not limited to the applications envisaged in the DeepThought project, but can also be employed e.g. to generate and make use of XML standoff annotation of documents and linguistic corpora, and in general for a wide range of NLP-based applications and research purposes.
The problematic economic situation in most parts of Russia today is nevertheless the ideal climate for the flourishing of the arts. Especially in St. Petersburg there grows a fascinating new experimental music scene, from Moscow we receive new impulses in literature such as the poet Alina Vituchnovskaja... Russian cinema always had a good reputation, and the new generation of Russian filmmakers clearly tries to keep up with it.
The definition of similarity between sentences is formulated on the levels of words, POS tags, and chunks (Abney 91; Abney 96). The evaluation of this approach shows that while precision and recall based on the PARSEVAL measures (Black et al. 91) do not reach state of the art Parsers yet (F1=87.19 on syntactic constituents, F1=77.78 including functionargument structure), the parser shows a very reliable performance where function-argument structure is concerned (F1=96.52). The lower F-scores are very often due to unattached constituents.
"[...] In 1639, Martin Opitz rescued for us the only complete surviving text of the Annolied (circa 1083), and now Graeme Dunphy has made available a reprint of the Opitz edition and with it Opitz’s prologue and notes, a new English translation, and the translator’s informative notes on the translation and on Opitz’s commentary. In his prologue Opitz expresses the purpose of the edition, which is to demonstrate that the German language was inherited by his contemporaries in an unbroken line from earliest times. This is a strikingly early formulation of the romantic thesis the Grimm brothers developed later. Thus by including Opitz’s prologue and notes on his sources and philological explanations, Dunphy gives us the essential tools to re-invigorate research in three areas: Opitz, who is too frequently thought of as a narrowly focused poeticist, the serious study of philology and history in the sixteenth century, and most importantly, the Annolied itself. [...]" Quelle: Maria Dobozy : http://www.iaslonline.de/index.php?vorgang_id=751
In this paper we propose a compositional semantics for lexicalized tree-adjoining grammar (LTAG). Tree-local multicomponent derivations allow separation of the semantic contribution of a lexical item into one component contributing to the predicate argument structure and a second component contributing to scope semantics. Based on this idea a syntax-semantics interface is presented where the compositional semantics depends only on the derivation structure. It is shown that the derivation structure (and indirectly the locality of derivations) allows an appropriate amount of underspecification. This is illustrated by investigating underspecified representations for quantifier scope ambiguities and related phenomena such as adjunct scope and island constraints.
This paper addresses the problem ofconstraints for relative quantifier sope, in partiular in inverse linking readings wherecertain scope orders are exluded. We show how to account for such restrictions in the Tree Adjoining Grammar (TAG) framework by adopting a notion offlexible composition. In the semantics we use for TAG we introduce quantifier sets that group quantifiers that are "glued" together in the sense that no other quantifieran scopally intervene between them. Theflexible composition approach allows us to obtain the desired quantifier sets and thereby the desiredconstraints for quantifier sope.
This paper argues for a particular architecture of OT syntax. This architecture hasthree core features: i) it is bidirectional, the usual production-oriented optimisation (called ‘first optimisation’ here) is accompanied by a second step that checks the recoverability of an underlying form; ii) this underlying form already contains a full-fledged syntactic specification; iii) especially the procedure checking for recoverability makes crucial use of semantic and pragmatic factors. The first section motivates the basic architecture. The second section shows with two examples, how contextual factors are integrated. The third section examines its implications for learning theory, and the fourth section concludes with a broader discussion of the advantages and disadvantages of the proposed model.
Most systematic discussion of dyad morphemes has focussed on Australian languages, owing to a combination of their relative prevalence there, and the development of a descriptive tradition that investigates them in some depth. In the course of researching this paper, however, I became aware of functionally and semantically similar morphemes in many other parts of the world, almost invariably described in isolation from any typological reference point. I have incorporated such data as far as I am aware of it, in the hope that a systematic study will encourage other investigators to identify, and investigate in detail, similar constructions in a range of languages. The current state of our research, however, as well as some interesting geographical skewings that I discuss below, such that outside Australia dyad constructions almost exclusively employ reciprocal morphology, means that most of this paper will focus on Australian languages.
CONTENTS
NEPAD AND AFRICAN PUBLISHING 2
HISTORY AND CULTURES IN AFRICA : THE MOVEMENT OF BOOKS 4
CURRENT OPPORTUNITIES AND CHALLENGES FACING AFRICAN PUBLISHERS 8
SAFEGUARDS AUTHORS’ WORKS 10
THE INTERNATIONAL CONFERENCE ON PUBLISHING IN THE CARIBBEAN 11
2002 NOMA AWARD WINNER 14
A REPORT OF THE ZIMBABWE INTERNATIONAL BOOK FAIR (ZIBF) 16
THE UNIVERSITY TRAINING COURSE 18
APNET AT THE 2003 NAIROBI INTERNATIONAL BOOK FAIR 21
THE JOMO KENYATTA PRIZE 24
BUISINESS OPPORTUNUITIES 25
REPORT OF THE 4TH FOIRE INTERNATIONALE DU LIVRE DE OUAGADOUGOU 30
APNET’S SECOND STRATEGIC PLAN 32
FIFTH PAN AFRICAN BOOKSELLERS ASSOCIATION CONVENTION 35
NOTICES 37
CHALLENGES AND OPPORTUNUITIES OF INTRA-AFRICAN TRADE IN EAST AFRICA 38
PROMOTIONS 42
Marcus Stiglegger revives a lost Gothic treasure in this brief discussion of Robert Sigl's Laurin—a rare case of German genre film-making and the heir to FW Murnau's legacy. Phantastic genre cinema is very rare in contemporary Germany—especially in the 1980s, the time when Italian horror reached another peak with Dario Argento's Opera (1985). The cliché of the German "easy comedy" ruled mainstream film production at the time, and so it appeared a kind of miracle when 27-year-old writer/director Robert Sigl was awarded the Bavarian Film Prize in 1988 for his debut feature: the Gothic horror fairytale Laurin.
We present an effort for the development of multilingual named entity grammars in a unification-based finite-state formalism (SProUT). Following an extended version of the MUC7 standard, we have developed Named Entity Recognition grammars for German, Chinese, Japanese, French, Spanish, English, and Czech. The grammars recognize person names, organizations, geographical locations, currency, time and date expressions. Subgrammars and gazetteers are shared as much as possible for the grammars of the different languages. Multilingual corpora from the business domain are used for grammar development and evaluation. The annotation format (named entity and other linguistic information) is described. We present an evaluation tool which provides detailed statistics and diagnostics, allows for partial matching of annotations, and supports user-defined mappings between different annotation and grammar output formats.
Quantitative evaluation of parsers has traditionally centered around the PARSEVAL measures of crossing brackets, (labeled) precision, and (labeled) recall. However, it is well known that these measures do not give an accurate picture of the quality of the parsers output. Furthermore, we will show that they are especially unsuited for partial parsers. In recent years, research has concentrated on dependencybased evaluation measures. We will show in this paper that such a dependency-based evaluation scheme is particularly suitable for partial parsers. TüBa-D, the treebank used here for evaluation, contains all the necessary dependency information so that the conversion of trees into a dependency structure does not have to rely on heuristics. Therefore, the dependency representations are not only reliable, they are also linguistically motivated and can be used for linguistic purposes.
This paper provides an overview of current research on a hybrid and robust parsing architecture for the morphological, syntactic and semantic annotation of German text corpora. The novel contribution of this research lies not in the individual parsing modules, each of which relies on state-of-the-art algorithms and techniques. Rather what is new about the present approach is the combination of these modules into a single architecture. This combination provides a means to significantly optimize the performance of each component, resulting in an increased accuracy of annotation.
This article examines the expression of natural gender in Icelandic nouns denoting human beings. Particular attention will be paid to the system's symmetry with regards to nouns denoting women and men. Our society consists more or less exactly of half women and half men. One would therefore assume that systems for terms denoting persons would also be symmetrically organised. Yet this assumption could not be further from the truth, and not just in single isolated cases, but in many languages: I will attempt to show that Icelandic has numerous methods for referring to women, but also many barriers and idiosyncrasies.
There are many aspects of Haas' life and experiences in India which deserve greater attention. I would like to refer briefly only to his attempts as a litterateur to come to terms with 'India' as presented in his autobiographical recollection and to some comparative cultural reflections in his essays. Like all reconstructions his autobiographical recollection of India is also a construct in which the site of India as a place of exile is justified by an achieved awareness between conscious individual choice and inevitability. An individual acts out a personal history, the prefiguration of which he only becomes aware of in the form of a subsequent epiphanic realization. Given Haas' literary background, it is not surprising that this is articulated through a literary association.
Based on a detailed case study of parallel grammar development distributed across two sites, we review some of the requirements for regression testing in grammar engineering, summarize our approach to systematic competence and performance profiling, and discuss our experience with grammar development for a commercial application. If possible, the workshop presentation will be organized around a software demonstration.
We present a broad coverage Japanese grammar written in the HPSG formalism with MRS semantics. The grammar is created for use in real world applications, such that robustness and performance issues play an important role. It is connected to a POS tagging and word segmentation tool. This grammar is being developed in a multilingual context, requiring MRS structures that are easily comparable across languages.
We present an architecture for the integration of shallow and deep NLP components which is aimed at flexible combination of different language technologies for a range of practical current and future applications. In particular, we describe the integration of a high-level HPSG parsing system with different high-performance shallow components, ranging from named entity recognition to chunk parsing and shallow clause recognition. The NLP components enrich a representation of natural language text with layers of new XML meta-information using a single shared data structure, called the text chart. We describe details of the integration methods, and show how information extraction and language checking applications for realworld German text benefit from a deep grammatical analysis.
Chunk parsing has focused on the recognition of partial constituent structures at the level of individual chunks. Little attention has been paid to the question of how such partial analyses can be combined into larger structures for complete utterances. Such larger structures are not only desirable for a deeper syntactic analysis. They also constitute a necessary prerequisite for assigning function-argument structure. The present paper offers a similaritybased algorithm for assigning functional labels such as subject, object, head, complement, etc. to complete syntactic structures on the basis of prechunked input. The evaluation of the algorithm has concentrated on measuring the quality of functional labels. It was performed on a German and an English treebank using two different annotation schemes at the level of function argument structure. The results of 89.73% correct functional labels for German and 90.40%for English validate the general approach.
Chunk parsing has focused on the recognition of partial constituent structures at the level of individual chunks. Little attention has been paid to the question of how such partial analyses can be combined into larger structures for complete utterances. The TüSBL parser extends current chunk parsing techniques by a tree-construction component that extends partial chunk parses to complete tree structures including recursive phrase structure as well as function-argument structure. TüSBLs tree construction algorithm relies on techniques from memory-based learning that allow similarity-based classification of a given input structure relative to a pre-stored set of tree instances from a fully annotated treebank. A quantitative evaluation of TüSBL has been conducted using a semi-automatically constructed treebank of German that consists of appr. 67,000 fully annotated sentences. The basic PARSEVAL measures were used although they were developed for parsers that have as their main goal a complete analysis that spans the entire input.This runs counter to the basic philosophy underlying TüSBL, which has as its main goal robustness of partially analyzed structures.
The goals of this exercise are essentially threefold: (1) to rescrutinize, archaeologically, epigraphically and linguistically, the pre-Roman inscriptions of the justly famous Negau A and B helmets, (2) to identify "eastward graphemic drift" in preRoman northern Italy and (3) to reconsider and perhaps identify the origin of the Germanic runes in light of (1) and (2). While moving toward these goals, we cite but a sampling of the burgeoning literature, some of which may not be generally known or easily accessible, in these rapidly expanding venues; see Ellis (1998) for a recent overview in English.
In linguistics and the philosophy of language, the mass/count distinction has traditionally been regarded as a bi-partition on the nominal domain, where typical instances are nouns like "beef" (mass) vs."cow" (count). In the present paper, we argue that this partition reveals a system that is based on both syntactic features and conceptual features, and present experimental evidence suggesting that the discrimination of the two kinds of features has a psychological reality.
MED (Media EDitor) is a program designed to facilitate the transcription of digitized soundfiles into textfiles. It was written by Hans Drexler and Daan Broeder, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands. [...] The aim of MED is to facilitate the transcription of sound into text using a single program. It works on the principle of the coexistence and interaction of two basic elements, the waveform display window and the text window. [...] This means that you no longer need to use both a sound editor and a word processor at the same time in order to transcribe digitized speech files. Instead, you can directly type the sound you hear (and see) via MED into the text window. Furthermore, you can directly link sound portions of the waveform display window to text portions of the text window, so that you can easily locate and listen to the original source of your transcription once the links have been set. In this function the waveform display window and the text window virtually interact with each other.
Giulio Camillo (1480 - 1544) was as well-known in his era as Bill Gates is now. Just like Gates he cherished a vision of a universal Storage and Retrieval System, and just like Microsoft Windows, his ‘Theatre of the Memory’ was, despite constant revision, never completed. Camillo’s legendary Theatre of Memory remained only a fragment, its benefits only an option for the future. When it was finished, the user - so he predicted - would have access to the knowledge of the whole universe. On account of his promising invention, Camillo’s contemporaries called him ‘the divine’. For others, like Erasmus or the Parisian scholars, he was just a ‘quack’, but also this only shows that his reception was as strong as is the case with the computer gurus of our days. Still, Camillo was forgotten immediately after his death. No trace is left of his spectacular databank - except a short treatise which he dictated on his deathbed and which was formulated in the future tense: ‘L’Idea del Theatro’ (1550). ...
Harman Dahl's legacy
(2001)
It was midnight on Friday 31, December 1999. Harman Dahl fell off his seat at the sound of all hell letting loose around him. He held on to the bench on which he had dozed off and wobbled onto his feet. His senses returned, even though he was still tipsy, under the influence of alcohol. He had been drinking with colleagues for most of the day. ...
The two papers included in this volume have developed from work with the CHILDES tools and the Media Editor in the two research projects, "Second language acquisition of German by Russian learners", sponsored by the Max Planck Institute for Psycholinguistics, Nijmegen, from 1998 to 1999 (directed by Ursula Stephany, University of Cologne, and Wolfgang Klein, Max Planck Institute for Psycholinguistics, Nijmegen) and "The age factor in the acquisition of German as a second language", sponsored by the German Science Foundation (DFG), Bonn, since 2000 (directed by Ursula Stephany, University of Cologne, and Christine Dimroth, Max Planck Institute for Psycholinguistics, Nijmegen). The CHILDES Project has been developed and is being continuously improved at Carnegie Mellon University, Pittsburgh, under the supervision of Brian MacWhinney. Having used the CHILDES tools for more than ten years for transcribing and analyzing Greek child data there it was no question that I would also use them for research into the acquisition of German as a second language and analyze the big amount of spontaneous speech gathered from two Russian girls with the help of the CLAN programs. When in the spring of 1997, Steven Gillis from the University of Antwerp (in collaboration with Gert Durieux) developed a lexicon-based automatic coding system based on the CLAN program MOR and suitable for coding languages with richer morphologies than English, such as Modern Greek. Coding huge amounts of data then became much quicker and more comfortable so that I decided to adopt this system for German as well. The paper "Working with the CHILDES Tools" is based on two earlier manuscripts which have grown out of my research on Greek child language and the many CHILDES workshops taught in Germany, Greece, Portugal, and Brazil over the years. Its contents have now been adapted to the requirements of research into the acquisition of German as a second language and for use on Windows.
This paper is part of a research project on OT Syntax and the typology of the free relative (FR) construction. It concentrates on the details of an OT analysis and some of its consequences for OT syntax. I will not present a general discussion of the phenomenon and the many controversial issues it is famous for in generative syntax.
In this paper we show an approach to the customization of GermaNet to the German HPSG grammar lexicon developed in the Verbmobil project. GermaNet has a broad coverage of the German base vocabulary and fine-grained semantic classification; while the HPSG grammar lexicon is comparatively small und has a coarse-grained semantic classification. In our approach, we have developed a mapping algorithm to relate the synsets in GermaNet with the semantic sorts in HPSG. The evaluation result shows that this approach is useful for the lexical extension of our deep grammar development to cope with real-world text understanding.
It has been the goal of this review to describe the functional interrelations between Deiters' vestibular nucleus and numerous brain structures. Emphasis is placed on dynamic and integrative properties of linkages between the neurons of Deiters' nucleus and many other brain structures in order to begin considering the capabilities of the loops in the light of motor control and coordination of movement. The problem of somatotopy within the loops is also considered. Putting this information together, the possible roles of Deiters' nucleus in the control of movements are described. It is suggested that Deiters' nucleus in co-operation with cerebral cortex, cerebellum, subcortical and brainstem structures are responsible for the integration and realization of different movements.
This paper proposes a corpus encoding standard that meets the needs of linguistic research using a variety of linguistic data structures. The standard was developed in SFB 441, a research project at the University of Tuebingen. The principal concern of SFB 441 are the empirical data structures which feed into linguistic theory building. SFB 441 consists of several projects, most of which are building corpora to empirically investigate various linguistic phenomena in various languages (e.g. modal verbs in German, forms of address and politeness in Russian). These corpora will form the components of the "Tuebingen collection of reusable, empirical, linguistic data structures (TUSNELDA)". The TUSNELDA annotation standard aims at providing a uniform encoding scheme for all subcorpora and texts of TUSNELDA such that they can be processed with uniform standardized tools. To guarantee maximal reusability we use XML for encoding. Previous SGML standards for text encoding were provided by the Text Encoding Initiative (TEI) and the Expert Advisory Group on Language Engineering Standards (Corpus Encoding Standard, CES). The TUSNELDA standard is based on TEI and XCES (XML version of CES) but takes into account the specific needs of the SFB projects, i.e. the peculiarities of the examined languages and linguistic phenomena.
Existing analyses of German scrambling phenomena within TAG-related formalisms all use non-local variants of TAG. However, there are good reasons to prefer local grammars, in particular with respect to the use of the derivation structure for semantics. Therefore this paper proposes to use local TDGs, a TAG-variant generating tree descriptions that shows a local derivation structure. However the construction of minimal trees for the derived tree descriptions is not subject to any locality constraint. This provides just the amount of non-locality needed for an adequate analysis of scrambling. To illustrate this a local TDG for some German scrambling data is presented.
In this paper, we investigate the role of sub-optimality in training data for part-of-speech tagging. In particular, we examine to what extent the size of the training corpus and certain types of errors in it affect the performance of the tagger. We distinguish four types of errors: If a word is assigned a wrong tag, this tag can belong to the ambiguity class of the word (i.e. to the set of possible tags for that word) or not; furthermore, the major syntactic category (e.g. "N" or "V") can be correctly assigned (e.g. if a finite verb is classified as an infinitive) or not (e.g. if a verb is classified as a noun). We empirically explore the decrease of performance that each of these error types causes for different sizes of the training set. Our results show that those types of errors that are easier to eliminate have a particularly negative effect on the performance. Thus, it is worthwhile concentrating on the elimination of these types of errors, especially if the training corpus is large.
The bringing together of the two realms, that of Tristan and Isolde and that of Arthur, thus has a mutually corrosive effect. However, in the further course of the action Tristan and Isolde’s love regains some of its absoluteness: for instance Heinrich refrains from taking over the quarrel of lovers from Eilhart. He plays a double game, on the one hand reducing the absoluteness and self-sufficiency of love, on the other hand building it up again and thus preventing the establishment of a firm doctrine in the course of the narrative (…), as neither the Arthurian court nor the love of Tristan and Isolde provides an absolute norm. Heinrich wrote his romance for the Bohemian noble Raimund von Lichtenburg, and the account of the foundation of the Round Table and the self-directed activities of the knights have belonged (…). The initial Arthurian ideal has become a confirmatory ritual for an exclusive body of noblemen – that matches the spirit of the knightly societies.