TY  - CONF
A1  - Gossen, Gerhard
A1  - Demidova, Elena
A1  - Risse, Thomas
T1  - Extracting event-centric document collections from large-scale web archives
T2  - EuropeanaTech Insight
N2  - Web archives created by the Internet Archive (IA) (https://archive.org), national libraries and other archiving services contain large amounts of information collected for a time period of over twenty years. These archives constitute a valuable source for research in many disciplines, including the digital humanities and the historical sciences by offering a unique possibility to look into past events and their representation on the Web.
Most Web archive services aim to capture the entire Web (IA) or national top-level domains and are therefore broad in their scope, diverse regarding the topics they contain and the time intervals they cover. Due to the large size and the broad scope it is difficult for interested researchers to locate relevant information in the archives as search facilities are very limited. Many users are more interested in studying smaller and topically coherent event-centric collections of documents contained in a Web archive [1,2]. Such collections can reflect specific events such as elections, or natural disasters, e.g. the Fukushima nuclear disaster (2011) or the German federal elections.
Y1  - 2017
UR  - http://publikationen.ub.uni-frankfurt.de/frontdoor/index/index/docId/54250
UR  - https://nbn-resolving.org/urn:nbn:de:hebis:30:3-542504
UR  - https://pro.europeana.eu/page/issue-8-tpdl
N1  - All texts are CC BY-SA, images and media licensed individually.
IS  - 8: TPDL (2017)
PB  - Europeana Foundation
CY  - Den Haag, Netherlands
ER  -