TY - CONF A1 - Gossen, Gerhard A1 - Demidova, Elena A1 - Risse, Thomas T1 - Extracting event-centric document collections from large-scale web archives T2 - EuropeanaTech Insight N2 - Web archives created by the Internet Archive (IA) (https://archive.org), national libraries and other archiving services contain large amounts of information collected for a time period of over twenty years. These archives constitute a valuable source for research in many disciplines, including the digital humanities and the historical sciences by offering a unique possibility to look into past events and their representation on the Web. Most Web archive services aim to capture the entire Web (IA) or national top-level domains and are therefore broad in their scope, diverse regarding the topics they contain and the time intervals they cover. Due to the large size and the broad scope it is difficult for interested researchers to locate relevant information in the archives as search facilities are very limited. Many users are more interested in studying smaller and topically coherent event-centric collections of documents contained in a Web archive [1,2]. Such collections can reflect specific events such as elections, or natural disasters, e.g. the Fukushima nuclear disaster (2011) or the German federal elections. Y1 - 2017 UR - http://publikationen.ub.uni-frankfurt.de/frontdoor/index/index/docId/54250 UR - https://nbn-resolving.org/urn:nbn:de:hebis:30:3-542504 UR - https://pro.europeana.eu/page/issue-8-tpdl N1 - All texts are CC BY-SA, images and media licensed individually. IS - 8: TPDL (2017) PB - Europeana Foundation CY - Den Haag, Netherlands ER -