TY - UNPD A1 - Karapanagiotis, Pantelis A1 - Liebald, Marius T1 - Entity matching with similarity encoding: a supervised learning recommendation framework for linking (big) data N2 - In this study, we introduce a novel entity matching (EM) framework. It com-bines state-of-the-art EM approaches based on Artificial Neural Networks (ANN) with a new similarity encoding derived from matching techniques that are preva-lent in finance and economics. Our framework is on-par or outperforms alternative end-to-end frameworks in standard benchmark cases. Because similarity encod-ing is constructed using (edit) distances instead of semantic similarities, it avoids out-of-vocabulary problems when matching dirty data. We highlight this property by applying an EM application to dirty financial firm-level data extracted from historical archives. T3 - SAFE working paper - 398 KW - Entity matching KW - Entity resolution KW - Database linking KW - Machine learning KW - Record resolution KW - Similarity encoding Y1 - 2023 UR - http://publikationen.ub.uni-frankfurt.de/frontdoor/index/index/docId/70399 UR - https://nbn-resolving.org/urn:nbn:de:hebis:30:3-703996 UR - https://ssrn.com/abstract=4541376 N1 - We gratefully acknowledge research support from the Leibniz Institute for Financial Research SAFE. PB - SAFE CY - Frankfurt am Main ER -