OPUS 4 | Search

An earley parsing algorithm for range concatenation grammars (2009)

Kallmeyer, Laura ; Maier, Wolfgang ; Parmentier, Yannick

We present a CYK and an Earley-style algorithm for parsing Range Concatenation Grammar (RCG), using the deductive parsing framework. The characteristic property of the Earley parser is that we use a technique of range boundary constraint propagation to compute the yields of non-terminals as late as possible. Experiments show that, compared to previous approaches, the constraint propagation helps to considerably decrease the number of items in the chart.

Vagueness and referential ambiguity in a large-scale annotated corpus (2009)

Versley, Yannick

In this paper, we argue that difficulties in the definition of coreference itself contribute to lower inter-annotator agreement in certain cases. Data from a large referentially annotated corpus serves to corroborate this point, using a quantitative investigation to assess which effects or problems are likely to be the most prominent. Several examples where such problems occur are discussed in more detail, and we then propose a generalisation of Poesio, Reyle and Stevenson’s Justified Sloppiness Hypothesis to provide a unified model for these cases of disagreement and argue that a deeper understanding of the phenomena involved allows to tackle problematic cases in a more principled fashion than would be possible using only pre-theoretic intuitions.

A testsuite for testing parser performance on complex German grammatical constructions [TePaCoC - a corpus for testing parser performance on complex German grammatical constructions] (2009)

Kübler, Sandra ; Rehbein, Ines ; Genabith, Josef van

Traditionally, parsers are evaluated against gold standard test data. This can cause problems if there is a mismatch between the data structures and representations used by the parser and the gold standard. A particular case in point is German, for which two treebanks (TiGer and TüBa-D/Z) are available with highly different annotation schemes for the acquisition of (e.g.) PCFG parsers. The differences between the TiGer and TüBa-D/Z annotation schemes make fair and unbiased parser evaluation difficult [7, 9, 12]. The resource (TEPACOC) presented in this paper takes a different approach to parser evaluation: instead of providing evaluation data in a single annotation scheme, TEPACOC uses comparable sentences and their annotations for 5 selected key grammatical phenomena (with 20 sentences each per phenomena) from both TiGer and TüBa-D/Z resources. This provides a 2 times 100 sentence comparable testsuite which allows us to evaluate TiGer-trained parsers against the TiGer part of TEPACOC, and TüBa-D/Z-trained parsers against the TüBa-D/Z part of TEPACOC for key phenomena, instead of comparing them against a single (and potentially biased) gold standard. To overcome the problem of inconsistency in human evaluation and to bridge the gap between the two different annotation schemes, we provide an extensive error classification, which enables us to compare parser output across the two different treebanks. In the remaining part of the paper we present the testsuite and describe the grammatical phenomena covered in the data. We discuss the different annotation strategies used in the two treebanks to encode these phenomena and present our error classification of potential parser errors.

Parsing coordinations (2009)

Kübler, Sandra ; Hinrichs, Erhard ; Maier, Wolfgang ; Klett, Eva

The present paper is concerned with statistical parsing of constituent structures in German. The paper presents four experiments that aim at improving parsing performance of coordinate structure: 1) reranking the n-best parses of a PCFG parser, 2) enriching the input to a PCFG parser by gold scopes for any conjunct, 3) reranking the parser output for all possible scopes for conjuncts that are permissible with regard to clause structure. Experiment 4 reranks a combination of parses from experiments 1 and 3. The experiments presented show that n- best parsing combined with reranking improves results by a large margin. Providing the parser with different scope possibilities and reranking the resulting parses results in an increase in F-score from 69.76 for the baseline to 74.69. While the F-score is similar to the one of the first experiment (n-best parsing and reranking), the first experiment results in higher recall (75.48% vs. 73.69%) and the third one in higher precision (75.43% vs. 73.26%). Combining the two methods results in the best result with an F-score of 76.69.

Decorrelation and shallow semantic patterns for distributional clustering of nouns and verbs (2009)

Versley, Yannick

Distributional approximations to lexical semantics are very useful not only in helping the creation of lexical semantic resources (Kilgariff et al., 2004; Snow et al., 2006), but also when directly applied in tasks that can benefit from large-coverage semantic knowledge such as coreference resolution (Poesio et al., 1998; Gasperin and Vieira, 2004; Versley, 2007), word sense disambiguation (Mc- Carthy et al., 2004) or semantical role labeling (Gordon and Swanson, 2007). We present a model that is built from Webbased corpora using both shallow patterns for grammatical and semantic relations and a window-based approach, using singular value decomposition to decorrelate the feature space which is otherwise too heavily influenced by the skewed topic distribution of Web corpora.

Biogenesis of the cell wall and other glycoconjugates of Mycobacterium tuberculosis (2009)

Kaur, Devinder ; Guerin, Marcelo E. ; Škovierová, Henrieta ; Brennan, Patrick J. ; Jackson, Mary

The re-emergence of tuberculosis in its present-day manifestations - single, multiple and extensive drug-resistant forms and as HIV-TB coinfections - has resulted in renewed research on fundamental questions such as the nature of the organism itself, Mycobacterium tuberculosis, the molecular basis of its pathogenesis, definition of the immunological response in animal models and humans, and development of new intervention strategies such as vaccines and drugs. Foremost among these developments has been the precise chemical definition of the complex and distinctive cell wall of M. tuberculosis, elucidation of the relevant pathways and underlying genetics responsible for the synthesis of the hallmark moieties of the tubercle bacillus such as the mycolic acid-arabinogalactan-peptidoglycan complex, the phthiocerol- and trehalose-containing effector lipids, the phosphatidylinositol-containing mannosides, lipomannosides and lipoarabinomannosides, major immunomodulators, and others. In this review, the laboratory personnel who have been the focal point of some to these developments review recent progress towards a comprehensive understanding of the basic physiology and functions of the cell wall of M. tuberculosis.

‘Turning many to righteousness’ : Religious didacticism in the ›Speculum humanae salvationis‹ and the similitude of the oak tree (2009)

Palmer, Nigel F.

In this contribution I shall be interested, among other things, in finding a place for the European phenomenon of the ›Speculum humanae salvationis‹ within German literary history, which will inescapably involve revisiting the unfashionable discussion of date and origins. I also intend to ask about the place of this text in the ‘didactic’ literature of the Middle Ages. Is a religious text structured according to sacred history didactic? Much didactic poetry is in the vernacular: What does it mean that the ›Speculum‹ was composed in Latin? And what place should be accorded to its vernacular reception? The ›Speculum‹ is inscribed within a set of oppositions that would appear to be recurrent in the didactic literature of the later Middle Ages: Latin and vernacular, verse and prose, words and pictures, religious and profane, moral teaching and devotion, clerical and lay. In view of its exceptionally broad transmission in the German lands, both in Latin and in vernacular reworkings, is it possible to describe this text so that it takes a place within a larger picture? In some respects it may stand at a threshold in the history of European didacticism.

Diversity and distribution of type specimens deposited in the Invertebrate section of the Museum of Zoology QCAZ, Quito, Ecuador (2009)

Donoso, David A. ; Salazar, Fernanda ; Maza, Florencio ; Cárdenas, Rafael E. ; Dangles, Olivier

The Invertebrate section of the Museum of Zoology QCAZ at the Pontifical Catholic University of Ecuador in Quito maintains nearly two million curated specimens, and comprises Ecuador's largest collection of native taxa. We review 1902 type specimens from 6 subspecies and 320 species in 121 genera and 42 families, currently kept in the Museum. The list includes 116 holotypes, 10 allotypes, 1774 paratypes and 2 neoparatypes. The collection of type specimens is particularly strong in the Coleoptera (family Carabidae and Staphylinidae) and Hymenoptera. However, other insect orders such as Diptera and Lepidoptera and non-insect arthropods such as Acari, Aranea and Scorpiones, are moderately represented in the collection. This report provides original data from labels of every type specimen record. An analysis of the geographic distribution of type localities showed that collection sites are clustered geographically with most of them found. towards the northern region of Ecuador, in Pichincha, Cotopaxi and Napo provinces. Sites are mainly located in highly accessible areas near highways and towns. Localities with a high number of type species include the cloud forest reserve Bosque Integral Otonga and Parque Nacional Yasunf in the Amazon rainforest near PUCE's Yasuni Scientific Station. Type localities are not well represented in the Ecuadorian National System of Protected Areas. Future fieldwork Sllould include. localities in the southern region of Ecuador but also target less accessible areas not located near highways or towns. We discuss the value of the collection as a source of information for conservation and biodiversity policies in Ecuador.

Open Access

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Institute

14 search hits