• search hit 39 of 657
Back to Result List

Entity matching with similarity encoding: a supervised learning recommendation framework for linking (big) data

  • In this study, we introduce a novel entity matching (EM) framework. It com-bines state-of-the-art EM approaches based on Artificial Neural Networks (ANN) with a new similarity encoding derived from matching techniques that are preva-lent in finance and economics. Our framework is on-par or outperforms alternative end-to-end frameworks in standard benchmark cases. Because similarity encod-ing is constructed using (edit) distances instead of semantic similarities, it avoids out-of-vocabulary problems when matching dirty data. We highlight this property by applying an EM application to dirty financial firm-level data extracted from historical archives.

Download full text files

Export metadata

Metadaten
Author:Pantelis KarapanagiotisORCiDGND, Marius LiebaldORCiD
URN:urn:nbn:de:hebis:30:3-703996
URL:https://ssrn.com/abstract=4541376
DOI:https://doi.org/10.2139/ssrn.4541376
Series (Serial Number):SAFE working paper (398)
Publisher:SAFE
Place of publication:Frankfurt am Main
Document Type:Working Paper
Language:English
Year of Completion:2023
Year of first Publication:2023
Publishing Institution:Universitätsbibliothek Johann Christian Senckenberg
Release Date:2023/08/22
Tag:Database linking; Entity matching; Entity resolution; Machine learning; Record resolution; Similarity encoding
Edition:This Version: August 15, 2023
Page Number:31
Note:
We gratefully acknowledge research support from the Leibniz Institute for Financial Research SAFE.
HeBIS-PPN:511906757
Institutes:Wirtschaftswissenschaften / Wirtschaftswissenschaften
Wissenschaftliche Zentren und koordinierte Programme / House of Finance (HoF)
Wissenschaftliche Zentren und koordinierte Programme / Center for Financial Studies (CFS)
Wissenschaftliche Zentren und koordinierte Programme / Sustainable Architecture for Finance in Europe (SAFE)
Dewey Decimal Classification:3 Sozialwissenschaften / 33 Wirtschaft / 330 Wirtschaft
JEL-Classification:C Mathematical and Quantitative Methods / C8 Data Collection and Data Estimation Methodology; Computer Programs
Sammlungen:Universitätspublikationen
Licence (German):License LogoDeutsches Urheberrecht