OPUS 4 | Universitätsbibliothek

The ARCOMEM architecture for social- and semantic-driven web archiving (2014)

Risse, Thomas ; Demidova, Elena ; Dietze, Stefan ; Peters, Wim ; Papailiou, Nikolaos ; Doka, Katerina ; Stavrakas, Yannis ; Plachouras, Vassilis ; Senellart, Pierre ; Carpentier, Florent ; Mantrach, Amin ; Cautis, Bogdan ; Siehndel, Patrick ; Spiliotopoulos, Dimitris

The constantly growing amount of Web content and the success of the SocialWeb lead to increasing needs for Web archiving. These needs go beyond the pure preservationo of Web pages. Web archives are turning into “community memories” that aim at building a better understanding of the public view on, e.g., celebrities, court decisions and other events. Due to the size of the Web, the traditional “collect-all” strategy is in many cases not the best method to build Web archives. In this paper, we present the ARCOMEM (From Future Internet 2014, 6 689 Collect-All Archives to Community Memories) architecture and implementation that uses semantic information, such as entities, topics and events, complemented with information from the Social Web to guide a novel Web crawler. The resulting archives are automatically enriched with semantic meta-information to ease the access and allow retrieval based on conditions that involve high-level concepts.

Analysing and enriching focused semantic web archives for parliament applications (2014)

Demidova, Elena ; Barbieri, Nicola ; Dietze, Stefan ; Funk, Adam ; Holzmann, Helge ; Maynard, Diana ; Papailiou, Nikolaos ; Peters, Wim ; Risse, Thomas ; Spiliotopoulos, Dimitris

The web and the social web play an increasingly important role as an information source for Members of Parliament and their assistants, journalists, political analysts and researchers. It provides important and crucial background information, like reactions to political events and comments made by the general public. The case study presented in this paper is driven by two European parliaments (the Greek and the Austrian parliament) and targets an effective exploration of political web archives. In this paper, we describe semantic technologies deployed to ease the exploration of the archived web and social web content and present evaluation results.

Cultural heritage reconstructed - Compact Memory and the Frankfurt Digital Judaica Collection (2014)

Heuberger, Rachel

Compact Memory, the internet archive of German Jewish periodicals, provides free global internet access to the vast majority of German-Jewish newspapers and periodicals of the 19th and 20th century. Jewish historical newspapers are the invaluable sources that supply direct and detailed information of the transformation process of Jewry and offer new insights into European Jewish history. The use of these historical sources however is extremely difficult, as complete sets of periodicals are very rarely to be found and they are scattered all over the world in different libraries and archives and in different physical formats (paper, microfilm). Compact Memory contains the 110 most important Jewish German newspapers and periodicals in Central Europe in the period from 1806-1938, covering the complete range of religious, political, social, cultural and academic aspects of Jewish life. The texts are available partly as full-texts, processed by OCR, partly as graphic documents with corresponding index options. The database offers advanced search options, downloading and printing of articles. Thousands of essays of more than 10.000 individual contributors have been bibliographically indexed. Compact Memory was established by the Judaica Division of the University Library Frankfurt am Main and in charge today in cooperation with the Aachen Chair of German-Jewish Literary History and the Cologne library Germania Judaica. Compact Memory is one database within the Digital Collection Judaica which being part of Europeana and other digital portals offers resources for the reconstruction and representation of Jewish cultural heritage.

Commercial publishers in the world of total access : (an inside view) (2015)

Haank, Derk

Visions and open challenges for a knowledge-based culturomics (2015)

Tahmasebi, Nina ; Borin, Lars ; Capannini, Gabriele ; Dubhashi, Devdatt ; Exner, Peter ; Forsberg, Markus ; Gossen, Gerhard ; Johansson, Fredrik D. ; Johansson, Richard ; Kågebäck, Mikael ; Mogren, Olof ; Nugues, Pierre ; Risse, Thomas

The concept of culturomics was born out of the availability of massive amounts of textual data and the interest to make sense of cultural and language phenomena over time. Thus far however, culturomics has only made use of, and shown the great potential of, statistical methods. In this paper, we present a vision for a knowledge-based culturomics that complements traditional culturomics. We discuss the possibilities and challenges of combining knowledge-based methods with statistical methods and address major challenges that arise due to the nature of the data; diversity of sources, changes in language over time as well as temporal dynamics of information in general. We address all layers needed for knowledge-based culturomics, from natural language processing and relations to summaries and opinions.

Lin|gu|is|tik: building the linguist's pathway to bibliographies, libraries, language resources and linked open data (2016)

Chiarcos, Christian ; Fäth, Christian ; Renner-Westermann, Heike ; Abromeit, Frank ; Dimitrova, Vanya

This paper introduces a novel research tool for the field of linguistics: The Linjgujisjtik web portal provides a virtual library which offers scientific information on every linguistic subject. It comprises selected internet sources and databases as well as catalogues for linguistic literature, and addresses an interdisciplinary audience. The virtual library is the most recent outcome of the Special Subject Collection Linguistics of the German Research Foundation (DFG), and also integrates the knowledge accumulated in the Bibliography of Linguistic Literature. In addition to the portal, we describe long-term goals and prospects with a special focus on ongoing efforts regarding an extension towards integrating language resources and Linguistic Linked Open Data.

Jewish Studies, Israel Studies : new specialized information service : creating the central access point, offering high performance information infrastructure, providing resources (2016)

Extracting event-centric document collections from large-scale web archives (2017)

Gossen, Gerhard ; Demidova, Elena ; Risse, Thomas

Web archives created by the Internet Archive (IA) (https://archive.org), national libraries and other archiving services contain large amounts of information collected for a time period of over twenty years. These archives constitute a valuable source for research in many disciplines, including the digital humanities and the historical sciences by offering a unique possibility to look into past events and their representation on the Web. Most Web archive services aim to capture the entire Web (IA) or national top-level domains and are therefore broad in their scope, diverse regarding the topics they contain and the time intervals they cover. Due to the large size and the broad scope it is difficult for interested researchers to locate relevant information in the archives as search facilities are very limited. Many users are more interested in studying smaller and topically coherent event-centric collections of documents contained in a Web archive [1,2]. Such collections can reflect specific events such as elections, or natural disasters, e.g. the Fukushima nuclear disaster (2011) or the German federal elections.

Setup of BIOfid, a new specialised information service for biodiversity research (2017)

Koch, Markus ; Kasperek, Gerwin ; Hörnschemeyer, Thomas ; Mehler, Alexander ; Weiland, Claus ; Hausinger, Angela

In order to promote the accessibility of biodiversity data in historic and contemporary literature, we introduce a new interdisciplinary project called BIOfid (FID=Fachinformationsdienst, a service for providing specialized information). The project aims at a mobilization of data available in print only by combining digitization of scientific biodiversity literature with the development of innovative text mining tools for complex, eventually semantic searches throughout the complete text corpus. A major prerequisite for the development of such search tools is the provision of sophisticated anatomy ontologies on the one hand, and of complete lists of species names (currently considered valid as well as all synonyms) at a global scale on the other hand. In the initial stage, we chose examples from German publications of the past 250 years dealing with the geographic distribution and ecology of vascular plants (Tracheophyta), birds (Aves), as well as moths and butterflies (Lepidoptera) in Germany. These taxa have been prioritized according to current demands of German research groups (about 50 sites) aiming at analyses and modeling of distribution patterns and their changes through time. In the long term, we aim at providing data and open source software applicable for any taxon and geographic region. For this purpose, a platform for open access journals for long-term availability of professional e-journals will be established. All generated data will also be made accessible through GFBio (German Federation for Biological Data). BIOfid is supported by the LIS-Scientific Library Services and Information Systems program of the German Research Foundation (DFG).

Finding individual word sense changes and their delay in appearance (2017)

Tahmasebi, Nina ; Risse, Thomas

We present a method for detecting word sense changes by utilizing automatically induced word senses. Our method works on the level of individual senses and allows a word to have e.g. one stable sense and then add a novel sense that later experiences change. Senses are grouped based on polysemy to find linguistic concepts and we can find broadening and narrowing as well as novel (polysemous and homonymic) senses. We evaluate on a testset, present recall and estimates of the time between expected and found change.

A Specialised Information Service for Biodiversity Research, involving large-scale data mobilisation by mining German biodiversity literature (2017)

Kasperek, Gerwin ; Schmidt, Marco

Biodiversity research heavily relies on recent and older literature, and the data contained therein. Despite great effort, large parts of the literature and the data it holds are still not available in appropriate formats needed for efficient compilation and analysis. As a part of the current funding strategy of the German Research Council (Deutsche Forschungsgemeinschaft, DFG), and resulting from an extensive dialogue with the scientific community in Germany, a "Specialised Information Service" (Fachinformationsdienst, FID) for Biodiversity Research will be established with the objective of making further segments of literature about biodiversity available in up-to-date formats. This project, starting 2017, is conducted by the University Library Johann Christian Senckenberg (Frankfurt/Main, Germany) together with the Senckenberg Gesellschaft für Naturforschung and the Text Technology Lab of the Goethe University (Frankfurt/Main). The new Specialised Information Service for Biodiversity Research (FID Biodiversitätsforschung) comprises four core elements: (A) A text mining approach which encompasses advanced text technologies and a large body of 20th century literature; (B) the digitisation of selected German biodiversity literature; (C) a platform für Open Access journals; and (D) Acquisition of specialised print literature.

Workflow and current achievements of BIOfid, an information service mobilizing biodiversity data from literature sources (2018)

Driller, Christine ; Koch, Markus ; Schmidt, Marco ; Weiland, Claus ; Hörnschemeyer, Thomas ; Hickler, Thomas ; Abrami, Giuseppe ; Ahmed, Sajawel ; Gleim, Rüdiger ; Hemati, Wahed ; Uslu, Tolga ; Mehler, Alexander ; Pachzelt, Adrian ; Rexhepi, Jashar ; Risse, Thomas ; Schuster, Janina ; Kasperek, Gerwin ; Hausinger, Angela

BIOfid is a specialized information service currently being developed to mobilize biodiversity data dormant in printed historical and modern literature and to offer a platform for open access journals on the science of biodiversity. Our team of librarians, computer scientists and biologists produce high-quality text digitizations, develop new text-mining tools and generate detailed ontologies enabling semantic text analysis and semantic search by means of user-specific queries. In a pilot project we focus on German publications on the distribution and ecology of vascular plants, birds, moths and butterflies extending back to the Linnaeus period about 250 years ago. The three organism groups have been selected according to current demands of the relevant research community in Germany. The text corpus defined for this purpose comprises over 400 volumes with more than 100,000 pages to be digitized and will be complemented by journals from other digitization projects, copyright-free and project-related literature. With TextImager (Natural Language Processing & Text Visualization) and TextAnnotator (Discourse Semantic Annotation) we have already extended and launched tools that focus on the text-analytical section of our project. Furthermore, taxonomic and anatomical ontologies elaborated by us for the taxa prioritized by the project’s target group - German institutions and scientists active in biodiversity research - are constantly improved and expanded to maximize scientific data output. Our poster describes the general workflow of our project ranging from literature acquisition via software development, to data availability on the BIOfid web portal (http://biofid.de/), and the implementation into existing platforms which serve to promote global accessibility of biodiversity data.

BIOfid dataset: publishing a German gold standard for named entity recognition in historical biodiversity literature (2019)

Ahmed, Sajawel ; Stoeckel, Manuel ; Driller, Christine ; Pachzelt, Adrian ; Mehler, Alexander

The Specialized Information Service Biodiversity Research (BIOfid) has been launched to mobilize valuable biological data from printed literature hidden in German libraries for over the past 250 years. In this project, we annotate German texts converted by OCR from historical scientific literature on the biodiversity of plants, birds, moths and butterflies. Our work enables the automatic extraction of biological information previously buried in the mass of papers and volumes. For this purpose, we generated training data for the tasks of Named Entity Recognition (NER) and Taxa Recognition (TR) in biological documents. We use this data to train a number of leading machine learning tools and create a gold standard for TR in biodiversity literature. More specifically, we perform a practical analysis of our newly generated BIOfid dataset through various downstream-task evaluations and establish a new state of the art for TR with 80.23% F-score. In this sense, our paper lays the foundations for future work in the field of information extraction in biology texts.

University of Frankfurt reports its 2019 APC expenditures (2020)

Broschinski, Christoph

The Goethe University Frankfurt has updated its APC expenditures, providing data for the 2019 period. The University Library Johann Christian Senckenberg is in charge of the University’s Open Access Publishing Fund, which is supported under the DFG’s Open Access Publishing Programme. Contact Person is Roland Wagner.

From raw data to rich(er) data: Lessons learned while aggregating metadata (2020)

Beck, Julia

The Specialised Information Service Performing Arts (SIS PA) is part of a funding programme by the German Research Foundation that enables libraries to develop tailor-made services for individual disciplines in order to provide researchers direct access to relevant materials and resources from their field. For the field of performing arts, the SIS PA is aggregating metadata about theater and dance resources from currently, mostly, German-speaking cultural heritage institutions in a VuFind-based search portal. In this article, we focus on metadata quality and its impact on the aggregation workflow by describing the different, possibly data provider-specific, process stages of improving data quality in order to achieve a searchable, interlinked knowledge base. We also describe lessons learned and limitations of the process.

Multiple annotation for biodiversity: developing an annotation framework among biology, linguistics and text technology (2021)

Lücking, Andy ; Driller, Christine ; Stoeckel, Manuel ; Abrami, Giuseppe ; Pachzelt, Adrian ; Mehler, Alexander

Biodiversity information is contained in countless digitized and unprocessed scholarly texts. Although automated extraction of these data has been gaining momentum for years, there are still innumerable text sources that are poorly accessible and require a more advanced range of methods to extract relevant information. To improve the access to semantic biodiversity information, we have launched the BIOfid project (www.biofid.de) and have developed a portal to access the semantics of German language biodiversity texts, mainly from the 19th and 20th century. However, to make such a portal work, a couple of methods had to be developed or adapted first. In particular, text-technological information extraction methods were needed, which extract the required information from the texts. Such methods draw on machine learning techniques, which in turn are trained by learning data. To this end, among others, we gathered the BIOfid text corpus, which is a cooperatively built resource, developed by biologists, text technologists, and linguists. A special feature of BIOfid is its multiple annotation approach, which takes into account both general and biology-specific classifications, and by this means goes beyond previous, typically taxon- or ontology-driven proper name detection. We describe the design decisions and the genuine Annotation Hub Framework underlying the BIOfid annotations and present agreement results. The tools used to create the annotations are introduced, and the use of the data in the semantic portal is described. Finally, some general lessons, in particular with multiple annotation projects, are drawn.

Non-native woody plant species in urban forests of Frankfurt/Main (Germany) (2021)

Gregor, Thomas ; Kasperek, Gerwin

In 23 survey areas with woodland vegetation or woodland succession in Frankfurt/Main with a total size of 134 hectares, woody species were surveyed (excluding species only occurring as planted individuals). We found 149 woody taxa; 42% of them indigenous, and 58% non-native. Out of the 86 non-native taxa, 49 were naturalized in Frankfurt while 37 were considered as casual. Among non-native taxa, East Asian taxa formed the largest phytogeographic group. We found taxa originating from horticulture (cultigens) to be an important part of the woody flora of Frankfurt/Main. The most common taxa were Acer pseudoplatanus, A. platanoides, Betula pendula, and Sambucus nigra; the two Acer species were regarded as naturalized. Non-native woody species were generally common (with percentages ranging from 24% to 79% in individual areas).

Blogging histories of knowledge in Washington, D. C. (2021)

Stoneman, Mark R. ; Krone, Kerstin von der

The authors reflect on their experiences as the founding editors of the History of Knowledge blog. Situating the project in its specific institutional, geographical, and historiographical contexts, they highlight its role in scholarly communication and research alongside journals and books in a research domain that is still young, especially when viewed from an international perspective. At the same time, the authors discuss the blog’s role as a tool for classifying and structuring a corpus of work as it grows over time and as new themes and connections emerge from the contributions of its many authors.

Current research on theory and practice of digital libraries: best papers from TPDL 2019 & 2020 (2022)

Aalberg, Trond ; Duchateau, Fabien ; Hall, Mark ; Merčun, Tanja ; Risse, Thomas

This volume presents a special issue on selected papers from the 2019 & 2020 editions of the International Conference on Theory and Practice of Digital Libraries (TPDL). They cover different research areas within Digital Libraries, from Ontology and Linked Data to quality in Web Archives and Topic Detection. We first provide a brief overview of both TPDL editions, and we introduce the selected papers.

Current research on theory and practice of digital libraries: best papers from TPDL 2019 & 2020 (2022)

Aalberg, Trond ; Duchateau, Fabien ; Hall, Mark ; Merčun, Tanja ; Risse, Thomas

This volume presents a special issue on selected papers from the 2019 & 2020 editions of the International Conference on Theory and Practice of Digital Libraries (TPDL). They cover different research areas within Digital Libraries, from Ontology and Linked Data to quality in Web Archives and Topic Detection. We first provide a brief overview of both TPDL editions, and we introduce the selected papers.

StolperSeiten: NS-Raubgut in der Universitätsbibliothek Frankfurt am Main (2022)

Dudde, Daniel ; Vogl, Ulrike

The article discusses the University Library Frankfurt am Main’s current exhibition focusing on the background of and the systematic search for looted assets in the library holdings as part of a wider provenance research project. It offers an overview of various topical areas reaching from initial changes in 1933 to raids throughout Europe by Nazi organisations and restitution procedures during the post-war period. The scope and first findings of the provenance research project will also be addressed.

"ein/aus gepackt. Die Kinderbuchsammlung Benjamin" wird verlängert : nominiert für Dr. Marschner Ausstellungspreis (2023)

Albus, Adolf

Die Ausstellung in der Universitätsbibliothek wird noch bis zum 26. Februar 2023 verlängert

Open Research Data – Bright New Future or Just a Flash in the Pan? ; Workshop in the context of the Frankfurt Open Science Initiative ; 1st July 2024 (2024)

Grossmann, Yves Vincent

Introduction to software management plans ; NFDI infra-dmp meeting, remote ; 24.05.2024 (2024)

Grossmann, Yves Vincent