Refine
Document Type
- Conference Proceeding (4) (remove)
Language
- English (4)
Has Fulltext
- yes (4)
Is part of the Bibliography
- no (4)
Keywords
Institute
Biodiversity research heavily relies on recent and older literature, and the data contained therein. Despite great effort, large parts of the literature and the data it holds are still not available in appropriate formats needed for efficient compilation and analysis. As a part of the current funding strategy of the German Research Council (Deutsche Forschungsgemeinschaft, DFG), and resulting from an extensive dialogue with the scientific community in Germany, a "Specialised Information Service" (Fachinformationsdienst, FID) for Biodiversity Research will be established with the objective of making further segments of literature about biodiversity available in up-to-date formats. This project, starting 2017, is conducted by the University Library Johann Christian Senckenberg (Frankfurt/Main, Germany) together with the Senckenberg Gesellschaft für Naturforschung and the Text Technology Lab of the Goethe University (Frankfurt/Main).
The new Specialised Information Service for Biodiversity Research (FID Biodiversitätsforschung) comprises four core elements: (A) A text mining approach which encompasses advanced text technologies and a large body of 20th century literature; (B) the digitisation of selected German biodiversity literature; (C) a platform für Open Access journals; and (D) Acquisition of specialised print literature.
In an ideal world, extraction of machine-readable data and knowledge from natural-language biodiversity literature would be done automatically, but not so currently. The BIOfid project has developed some tools that can help with important parts of this highly demanding task, while certain parts of the workflow cannot be automated yet. BIOfid focuses on the 20th century legacy literature, a large part of which is only available in printed form. In this workshop, we will present the current state of the art in mobilisation of data from our corpus, as well as some challenges ahead of us. Together with the participants, we will exercise or explain the following tasks (some of which can be performed by the participants themselves, while other tasks currently require execution by our specialists with special equipment): Preparation of text files as an input; pre-processing with TextImager/TextAnnotator; semiautomated annotation and linking of named entities; generation of output in various formats; evaluation of the output. The workshop will also provide an outlook for further developments regarding extraction of statements from natural-language literature, with the long-term aim to produce machine-readable data from literature that can extend biodiversity databases and knowledge graphs.
With the ongoing loss of global biodiversity, long-term recordings of species distribution patterns are increasingly becoming important to investigate the causes and consequences for their change. Therefore, the digitization of scientific literature, both modern and historical, has been attracting growing attention in recent years. To meet this growing demand the Specialised Information Service for Biodiversity Research (BIOfid) was launched in 2017 with the aim of increasing the availability and accessibility of biodiversity information. Closely tied to the research community the interdisciplinary BIOfid team is digitizing data sources of biodiversity related research and provides a modern and professional infrastructure for hosting and sharing them. As a pilot project, German publications on the distribution and ecology of vascular plants, birds, moths and butterflies covering the past 250 years are prioritized. Large parts of the text corpus defined in accordance with the needs of the relevant German research community have already been transferred to a machine-readable format and will be publicly accessible soon. Software tools for text mining, semantic annotation and analysis with respect to the current trends in machine learning are developed to maximize bioscientific data output through user-specific queries that can be created via the BIOfid web portal (https://www.biofid.de/). To boost knowledge discovery, specific ontologies focusing on morphological traits and taxonomy are being prepared and will continuously be extended to keep up with an ever-expanding volume of literature sources.
In order to promote the accessibility of biodiversity data in historic and contemporary literature, we introduce a new interdisciplinary project called BIOfid (FID=Fachinformationsdienst, a service for providing specialized information). The project aims at a mobilization of data available in print only by combining digitization of scientific biodiversity literature with the development of innovative text mining tools for complex, eventually semantic searches throughout the complete text corpus. A major prerequisite for the development of such search tools is the provision of sophisticated anatomy ontologies on the one hand, and of complete lists of species names (currently considered valid as well as all synonyms) at a global scale on the other hand. In the initial stage, we chose examples from German publications of the past 250 years dealing with the geographic distribution and ecology of vascular plants (Tracheophyta), birds (Aves), as well as moths and butterflies (Lepidoptera) in Germany. These taxa have been prioritized according to current demands of German research groups (about 50 sites) aiming at analyses and modeling of distribution patterns and their changes through time. In the long term, we aim at providing data and open source software applicable for any taxon and geographic region. For this purpose, a platform for open access journals for long-term availability of professional e-journals will be established. All generated data will also be made accessible through GFBio (German Federation for Biological Data). BIOfid is supported by the LIS-Scientific Library Services and Information Systems program of the German Research Foundation (DFG).