OPUS 4 | Universitätsbibliothek

BIOfid dataset: publishing a German gold standard for named entity recognition in historical biodiversity literature (2019)

Ahmed, Sajawel ; Stoeckel, Manuel ; Driller, Christine ; Pachzelt, Adrian ; Mehler, Alexander

The Specialized Information Service Biodiversity Research (BIOfid) has been launched to mobilize valuable biological data from printed literature hidden in German libraries for over the past 250 years. In this project, we annotate German texts converted by OCR from historical scientific literature on the biodiversity of plants, birds, moths and butterflies. Our work enables the automatic extraction of biological information previously buried in the mass of papers and volumes. For this purpose, we generated training data for the tasks of Named Entity Recognition (NER) and Taxa Recognition (TR) in biological documents. We use this data to train a number of leading machine learning tools and create a gold standard for TR in biodiversity literature. More specifically, we perform a practical analysis of our newly generated BIOfid dataset through various downstream-task evaluations and establish a new state of the art for TR with 80.23% F-score. In this sense, our paper lays the foundations for future work in the field of information extraction in biology texts.

University of Frankfurt reports its 2019 APC expenditures (2020)

Broschinski, Christoph

The Goethe University Frankfurt has updated its APC expenditures, providing data for the 2019 period. The University Library Johann Christian Senckenberg is in charge of the University’s Open Access Publishing Fund, which is supported under the DFG’s Open Access Publishing Programme. Contact Person is Roland Wagner.

From raw data to rich(er) data: Lessons learned while aggregating metadata (2020)

Beck, Julia

The Specialised Information Service Performing Arts (SIS PA) is part of a funding programme by the German Research Foundation that enables libraries to develop tailor-made services for individual disciplines in order to provide researchers direct access to relevant materials and resources from their field. For the field of performing arts, the SIS PA is aggregating metadata about theater and dance resources from currently, mostly, German-speaking cultural heritage institutions in a VuFind-based search portal. In this article, we focus on metadata quality and its impact on the aggregation workflow by describing the different, possibly data provider-specific, process stages of improving data quality in order to achieve a searchable, interlinked knowledge base. We also describe lessons learned and limitations of the process.

Multiple annotation for biodiversity: developing an annotation framework among biology, linguistics and text technology (2021)

Lücking, Andy ; Driller, Christine ; Stoeckel, Manuel ; Abrami, Giuseppe ; Pachzelt, Adrian ; Mehler, Alexander

Biodiversity information is contained in countless digitized and unprocessed scholarly texts. Although automated extraction of these data has been gaining momentum for years, there are still innumerable text sources that are poorly accessible and require a more advanced range of methods to extract relevant information. To improve the access to semantic biodiversity information, we have launched the BIOfid project (www.biofid.de) and have developed a portal to access the semantics of German language biodiversity texts, mainly from the 19th and 20th century. However, to make such a portal work, a couple of methods had to be developed or adapted first. In particular, text-technological information extraction methods were needed, which extract the required information from the texts. Such methods draw on machine learning techniques, which in turn are trained by learning data. To this end, among others, we gathered the BIOfid text corpus, which is a cooperatively built resource, developed by biologists, text technologists, and linguists. A special feature of BIOfid is its multiple annotation approach, which takes into account both general and biology-specific classifications, and by this means goes beyond previous, typically taxon- or ontology-driven proper name detection. We describe the design decisions and the genuine Annotation Hub Framework underlying the BIOfid annotations and present agreement results. The tools used to create the annotations are introduced, and the use of the data in the semantic portal is described. Finally, some general lessons, in particular with multiple annotation projects, are drawn.

Blogging histories of knowledge in Washington, D. C. (2021)

Stoneman, Mark R. ; Krone, Kerstin von der

The authors reflect on their experiences as the founding editors of the History of Knowledge blog. Situating the project in its specific institutional, geographical, and historiographical contexts, they highlight its role in scholarly communication and research alongside journals and books in a research domain that is still young, especially when viewed from an international perspective. At the same time, the authors discuss the blog’s role as a tool for classifying and structuring a corpus of work as it grows over time and as new themes and connections emerge from the contributions of its many authors.

Non-native woody plant species in urban forests of Frankfurt/Main (Germany) (2021)

Gregor, Thomas ; Kasperek, Gerwin

In 23 survey areas with woodland vegetation or woodland succession in Frankfurt/Main with a total size of 134 hectares, woody species were surveyed (excluding species only occurring as planted individuals). We found 149 woody taxa; 42% of them indigenous, and 58% non-native. Out of the 86 non-native taxa, 49 were naturalized in Frankfurt while 37 were considered as casual. Among non-native taxa, East Asian taxa formed the largest phytogeographic group. We found taxa originating from horticulture (cultigens) to be an important part of the woody flora of Frankfurt/Main. The most common taxa were Acer pseudoplatanus, A. platanoides, Betula pendula, and Sambucus nigra; the two Acer species were regarded as naturalized. Non-native woody species were generally common (with percentages ranging from 24% to 79% in individual areas).

Current research on theory and practice of digital libraries: best papers from TPDL 2019 & 2020 (2022)

Aalberg, Trond ; Duchateau, Fabien ; Hall, Mark ; Merčun, Tanja ; Risse, Thomas

This volume presents a special issue on selected papers from the 2019 & 2020 editions of the International Conference on Theory and Practice of Digital Libraries (TPDL). They cover different research areas within Digital Libraries, from Ontology and Linked Data to quality in Web Archives and Topic Detection. We first provide a brief overview of both TPDL editions, and we introduce the selected papers.

Current research on theory and practice of digital libraries: best papers from TPDL 2019 & 2020 (2022)

Aalberg, Trond ; Duchateau, Fabien ; Hall, Mark ; Merčun, Tanja ; Risse, Thomas

This volume presents a special issue on selected papers from the 2019 & 2020 editions of the International Conference on Theory and Practice of Digital Libraries (TPDL). They cover different research areas within Digital Libraries, from Ontology and Linked Data to quality in Web Archives and Topic Detection. We first provide a brief overview of both TPDL editions, and we introduce the selected papers.

"ein/aus gepackt. Die Kinderbuchsammlung Benjamin" wird verlängert : nominiert für Dr. Marschner Ausstellungspreis (2023)

Albus, Adolf

Die Ausstellung in der Universitätsbibliothek wird noch bis zum 26. Februar 2023 verlängert

Open Access

Universitätsbibliothek

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Institute

99 search hits