Extracting event-centric document collections from large-scale web archives

  • Web archives created by the Internet Archive (IA) (https://archive.org), national libraries and other archiving services contain large amounts of information collected for a time period of over twenty years. These archives constitute a valuable source for research in many disciplines, including the digital humanities and the historical sciences by offering a unique possibility to look into past events and their representation on the Web. Most Web archive services aim to capture the entire Web (IA) or national top-level domains and are therefore broad in their scope, diverse regarding the topics they contain and the time intervals they cover. Due to the large size and the broad scope it is difficult for interested researchers to locate relevant information in the archives as search facilities are very limited. Many users are more interested in studying smaller and topically coherent event-centric collections of documents contained in a Web archive [1,2]. Such collections can reflect specific events such as elections, or natural disasters, e.g. the Fukushima nuclear disaster (2011) or the German federal elections.

Download full text files

Export metadata

Additional Services

Share in Twitter Search Google Scholar
Author:Gerhard Gossen, Elena DemidovaORCiDGND, Thomas RisseORCiDGND
Parent Title (English):EuropeanaTech Insight
Publisher:Europeana Foundation
Place of publication:Den Haag, Netherlands
Document Type:Conference Proceeding
Year of Completion:2017
Year of first Publication:2017
Publishing Institution:Universit├Ątsbibliothek Johann Christian Senckenberg
Creating Corporation:TPDL (21. : 2017 : Thessaloniki)
Release Date:2020/06/24
Issue:8: TPDL (2017)
Page Number:4
All texts are CC BY-SA, images and media licensed individually.
Institutes:Zentrale Einrichtung / Universit├Ątsbibliothek
Dewey Decimal Classification:0 Informatik, Informationswissenschaft, allgemeine Werke / 02 Bibliotheks- und Informationswissenschaften / 020 Bibliotheks- und Informationswissenschaften
Licence (German):License LogoCreative Commons - Namensnennung-Weitergabe unter gleichen Bedingungen