Refine
Document Type
- Conference Proceeding (4) (remove)
Has Fulltext
- yes (4) (remove)
Is part of the Bibliography
- no (4)
Keywords
- anatomy ontologies (1)
- literature digitization (1)
- non-commercial publishing (1)
- open access (1)
- text mining tools (1)
Institute
- Senckenbergische Naturforschende Gesellschaft (4) (remove)
In order to promote the accessibility of biodiversity data in historic and contemporary literature, we introduce a new interdisciplinary project called BIOfid (FID=Fachinformationsdienst, a service for providing specialized information). The project aims at a mobilization of data available in print only by combining digitization of scientific biodiversity literature with the development of innovative text mining tools for complex, eventually semantic searches throughout the complete text corpus. A major prerequisite for the development of such search tools is the provision of sophisticated anatomy ontologies on the one hand, and of complete lists of species names (currently considered valid as well as all synonyms) at a global scale on the other hand. In the initial stage, we chose examples from German publications of the past 250 years dealing with the geographic distribution and ecology of vascular plants (Tracheophyta), birds (Aves), as well as moths and butterflies (Lepidoptera) in Germany. These taxa have been prioritized according to current demands of German research groups (about 50 sites) aiming at analyses and modeling of distribution patterns and their changes through time. In the long term, we aim at providing data and open source software applicable for any taxon and geographic region. For this purpose, a platform for open access journals for long-term availability of professional e-journals will be established. All generated data will also be made accessible through GFBio (German Federation for Biological Data). BIOfid is supported by the LIS-Scientific Library Services and Information Systems program of the German Research Foundation (DFG).
The Specialized Information Service Biodiversity Research (BIOfid) has been launched to mobilize valuable biological data from printed literature hidden in German libraries for over the past 250 years. In this project, we annotate German texts converted by OCR from historical scientific literature on the biodiversity of plants, birds, moths and butterflies. Our work enables the automatic extraction of biological information previously buried in the mass of papers and volumes. For this purpose, we generated training data for the tasks of Named Entity Recognition (NER) and Taxa Recognition (TR) in biological documents. We use this data to train a number of leading machine learning tools and create a gold standard for TR in biodiversity literature. More specifically, we perform a practical analysis of our newly generated BIOfid dataset through various downstream-task evaluations and establish a new state of the art for TR with 80.23% F-score. In this sense, our paper lays the foundations for future work in the field of information extraction in biology texts.
The 99th Annual Meeting of the Geologische Vereinigung (GV) and International Conference on Earth Control on Planetary Life and Environment, held in October 2009 at the Geosciences Centre of the Georg-August-Universität Göttingen, brings together researchers from all fields of Earth Sciences and beyond to shape an attractive interdisciplinary program on the geological history of Planet Earth and its control over and interaction with biological evolution, development of habitats, environmental and climate change as well as history and culture of Homo sapiens. This volume contains the abstracts of invited keynote lectures as well as all oral and poster presentations.