Application of BIOfid tools for extracting data from biodiversity literature

  • In an ideal world, extraction of machine-readable data and knowledge from natural-language biodiversity literature would be done automatically, but not so currently. The BIOfid project has developed some tools that can help with important parts of this highly demanding task, while certain parts of the workflow cannot be automated yet. BIOfid focuses on the 20th century legacy literature, a large part of which is only available in printed form. In this workshop, we will present the current state of the art in mobilisation of data from our corpus, as well as some challenges ahead of us. Together with the participants, we will exercise or explain the following tasks (some of which can be performed by the participants themselves, while other tasks currently require execution by our specialists with special equipment): Preparation of text files as an input; pre-processing with TextImager/TextAnnotator; semiautomated annotation and linking of named entities; generation of output in various formats; evaluation of the output. The workshop will also provide an outlook for further developments regarding extraction of statements from natural-language literature, with the long-term aim to produce machine-readable data from literature that can extend biodiversity databases and knowledge graphs.

Download full text files

Export metadata

Additional Services

Share in Twitter Search Google Scholar
Author:Gerwin KasperekORCiDGND, Giuseppe AbramiORCiD, Christine DrillerORCiDGND, Andy LückingORCiD, Alexander MehlerORCiDGND, Carlos Alberto Martínez-Muñoz, Adrian PachzeltORCiDGND
Parent Title (German):SPNHC 2022, Edinburgh 5th-10th June 2022
Place of publication:Frankfurt am Main
Document Type:Conference Proceeding
Date of Publication (online):2022/06/09
Year of first Publication:2022
Publishing Institution:Universitätsbibliothek Johann Christian Senckenberg
Release Date:2022/09/12
Page Number:58
Dewey Decimal Classification:5 Naturwissenschaften und Mathematik / 57 Biowissenschaften; Biologie / 570 Biowissenschaften; Biologie
Sammlungen:Sammlung Biologie / Sondersammelgebiets-Volltexte
Licence (German):License LogoDeutsches Urheberrecht