OPUS 4 | Universitätsbibliothek

Current research on theory and practice of digital libraries: best papers from TPDL 2019 & 2020 (2022)

Aalberg, Trond ; Duchateau, Fabien ; Hall, Mark ; Merčun, Tanja ; Risse, Thomas

This volume presents a special issue on selected papers from the 2019 & 2020 editions of the International Conference on Theory and Practice of Digital Libraries (TPDL). They cover different research areas within Digital Libraries, from Ontology and Linked Data to quality in Web Archives and Topic Detection. We first provide a brief overview of both TPDL editions, and we introduce the selected papers.

"ein/aus gepackt. Die Kinderbuchsammlung Benjamin" wird verlängert : nominiert für Dr. Marschner Ausstellungspreis (2023)

Albus, Adolf

Die Ausstellung in der Universitätsbibliothek wird noch bis zum 26. Februar 2023 verlängert

Multiple annotation for biodiversity: developing an annotation framework among biology, linguistics and text technology (2021)

Lücking, Andy ; Driller, Christine ; Stoeckel, Manuel ; Abrami, Giuseppe ; Pachzelt, Adrian ; Mehler, Alexander

Biodiversity information is contained in countless digitized and unprocessed scholarly texts. Although automated extraction of these data has been gaining momentum for years, there are still innumerable text sources that are poorly accessible and require a more advanced range of methods to extract relevant information. To improve the access to semantic biodiversity information, we have launched the BIOfid project (www.biofid.de) and have developed a portal to access the semantics of German language biodiversity texts, mainly from the 19th and 20th century. However, to make such a portal work, a couple of methods had to be developed or adapted first. In particular, text-technological information extraction methods were needed, which extract the required information from the texts. Such methods draw on machine learning techniques, which in turn are trained by learning data. To this end, among others, we gathered the BIOfid text corpus, which is a cooperatively built resource, developed by biologists, text technologists, and linguists. A special feature of BIOfid is its multiple annotation approach, which takes into account both general and biology-specific classifications, and by this means goes beyond previous, typically taxon- or ontology-driven proper name detection. We describe the design decisions and the genuine Annotation Hub Framework underlying the BIOfid annotations and present agreement results. The tools used to create the annotations are introduced, and the use of the data in the semantic portal is described. Finally, some general lessons, in particular with multiple annotation projects, are drawn.

BIOfid dataset: publishing a German gold standard for named entity recognition in historical biodiversity literature (2019)

Ahmed, Sajawel ; Stoeckel, Manuel ; Driller, Christine ; Pachzelt, Adrian ; Mehler, Alexander

The Specialized Information Service Biodiversity Research (BIOfid) has been launched to mobilize valuable biological data from printed literature hidden in German libraries for over the past 250 years. In this project, we annotate German texts converted by OCR from historical scientific literature on the biodiversity of plants, birds, moths and butterflies. Our work enables the automatic extraction of biological information previously buried in the mass of papers and volumes. For this purpose, we generated training data for the tasks of Named Entity Recognition (NER) and Taxa Recognition (TR) in biological documents. We use this data to train a number of leading machine learning tools and create a gold standard for TR in biodiversity literature. More specifically, we perform a practical analysis of our newly generated BIOfid dataset through various downstream-task evaluations and establish a new state of the art for TR with 80.23% F-score. In this sense, our paper lays the foundations for future work in the field of information extraction in biology texts.

Current research on theory and practice of digital libraries: best papers from TPDL 2019 & 2020 (2022)

Aalberg, Trond ; Duchateau, Fabien ; Hall, Mark ; Merčun, Tanja ; Risse, Thomas

This volume presents a special issue on selected papers from the 2019 & 2020 editions of the International Conference on Theory and Practice of Digital Libraries (TPDL). They cover different research areas within Digital Libraries, from Ontology and Linked Data to quality in Web Archives and Topic Detection. We first provide a brief overview of both TPDL editions, and we introduce the selected papers.

Blogging histories of knowledge in Washington, D. C. (2021)

Stoneman, Mark R. ; Krone, Kerstin von der

The authors reflect on their experiences as the founding editors of the History of Knowledge blog. Situating the project in its specific institutional, geographical, and historiographical contexts, they highlight its role in scholarly communication and research alongside journals and books in a research domain that is still young, especially when viewed from an international perspective. At the same time, the authors discuss the blog’s role as a tool for classifying and structuring a corpus of work as it grows over time and as new themes and connections emerge from the contributions of its many authors.

Non-native woody plant species in urban forests of Frankfurt/Main (Germany) (2021)

Gregor, Thomas ; Kasperek, Gerwin

In 23 survey areas with woodland vegetation or woodland succession in Frankfurt/Main with a total size of 134 hectares, woody species were surveyed (excluding species only occurring as planted individuals). We found 149 woody taxa; 42% of them indigenous, and 58% non-native. Out of the 86 non-native taxa, 49 were naturalized in Frankfurt while 37 were considered as casual. Among non-native taxa, East Asian taxa formed the largest phytogeographic group. We found taxa originating from horticulture (cultigens) to be an important part of the woody flora of Frankfurt/Main. The most common taxa were Acer pseudoplatanus, A. platanoides, Betula pendula, and Sambucus nigra; the two Acer species were regarded as naturalized. Non-native woody species were generally common (with percentages ranging from 24% to 79% in individual areas).

An architecture blueprint for knowlege-based e-Science (2007)

Niederée, Claudia ; Risse, Thomas ; Paukert, Marco ; Stein, Adelheit

The scientific innovation process embraces the steps from problem definition through the development and evaluation of innovative solutions to their successful exploitation. The challenges imposed by this process can be answered by the creation of a powerful and flexible next-generation e-Science infrastructure, which exploits leading edge information and knowledge technologies and enables a comprehensive and intelligent means of supporting this process. This paper describes our vision of a Knowledge-based eScience infrastructure, which is based on the results of an in-depth study of the researchers requirements. Furthermore, it introduces the Fraunhofer e-Science Cockpit as a first implementation of our vision.

Terminology evolution in web archiving: Open issues (2008)

Tahmasebi, Nina ; Iofciu, Tereza ; Risse, Thomas ; Niederée, Claudia ; Siberski, Wolf

The correspondence between the terminology used for querying and the one used in content objects to be retrieved, is a crucial prerequisite for effective retrieval technology. However, as terminology is evolving over time, a growing gap opens up between older documents in (long-term) archives and the active language used for querying such archives. Thus, technologies for detecting and systematically handling terminology evolution are required to ensure "semantic" accessibility of (Web) archive content on the long run. As a starting point for dealing with terminology evolution this paper formalizes the problem and discusses issues, first ideas and relevant technologies.

NEER: An unsupervised method for named entity evolution recognition (2012)

Tahmasebi, Nina ; Gossen, Gerhard ; Kanhabua, Nattiya ; Holzmann, Helge ; Risse, Thomas

High impact events, political changes and new technologies are reflected in our language and lead to constant evolution of terms, expressions and names. Not knowing about names used in the past for referring to a named entity can severely decrease the performance of many computational linguistic algorithms. We propose NEER, an unsupervised method for named entity evolution recognition independent of external knowledge sources. We find time periods with high likelihood of evolution. By analyzing only these time periods using a sliding window co-occurrence method we capture evolving terms in the same context. We thus avoid comparing terms from widely different periods in time and overcome a severe limitation of existing methods for named entity evolution, as shown by the high recall of 90% on the New York Times corpus. We compare several relatedness measures for filtering to improve precision. Furthermore, using machine learning with minimal supervision improves precision to 94%.

Open Access

Universitätsbibliothek

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Institute

99 search hits