Refine
Document Type
Language
- English (2) (remove)
Has Fulltext
- yes (2)
Is part of the Bibliography
- no (2)
Keywords
- XML (2) (remove)
Institute
- Extern (1)
This paper describes a set of guidelines for the citation of zoological and botanical specimens in the European Journal of Taxonomy. The guidelines stipulate controlled vocabularies and precise formats for presenting the specimens examined within a taxonomic publication, which allow for the rich data associated with the primary research material to be harvested, distributed and interlinked online via international biodiversity data aggregators. Herein we explain how the EJT editorial standard was defined and how this initiative fits into the journal's project to semantically enhance its publications using the Plazi TaxPub DTD extension. By establishing a standardised format for the citation of taxonomic specimens, the journal intends to widen the distribution of and improve accessibility to the data it publishes. Authors who conform to these guidelines will benefit from higher visibility and new ways of visualising their work. In a wider context, we hope that other taxonomy journals will adopt this approach to their publications, adapting their working methods to enable domain-specific text mining to take place. If specimen data can be efficiently cited, harvested and linked to wider resources, we propose that there is also the potential to develop alternative metrics for assessing impact and productivity within the natural sciences.
We present an architecture for the integration of shallow and deep NLP components which is aimed at flexible combination of different language technologies for a range of practical current and future applications. In particular, we describe the integration of a high-level HPSG parsing system with different high-performance shallow components, ranging from named entity recognition to chunk parsing and shallow clause recognition. The NLP components enrich a representation of natural language text with layers of new XML meta-information using a single shared data structure, called the text chart. We describe details of the integration methods, and show how information extraction and language checking applications for realworld German text benefit from a deep grammatical analysis.