Refine
Document Type
- Working Paper (2)
Language
- English (2)
Has Fulltext
- yes (2)
Is part of the Bibliography
- no (2) (remove)
Keywords
- Database linking (1)
- Entity matching (1)
- Entity resolution (1)
- Machine learning (1)
- Record resolution (1)
- Similarity encoding (1)
Institute
- Center for Financial Studies (CFS) (2) (remove)
In this study, we introduce a novel entity matching (EM) framework. It com-bines state-of-the-art EM approaches based on Artificial Neural Networks (ANN) with a new similarity encoding derived from matching techniques that are preva-lent in finance and economics. Our framework is on-par or outperforms alternative end-to-end frameworks in standard benchmark cases. Because similarity encod-ing is constructed using (edit) distances instead of semantic similarities, it avoids out-of-vocabulary problems when matching dirty data. We highlight this property by applying an EM application to dirty financial firm-level data extracted from historical archives.
Broad, long-term financial and economic datasets are a scarce resource, in particular in the European context. In this paper, we present an approach for an extensible, i.e. adaptable to future changes in technologies and sources, data model that may constitute a basis for digitized and structured long- term, historical datasets. The data model covers specific peculiarities of historical financial and economic data and is flexible enough to reach out for data of different types (quantitative as well as qualitative) from different historical sources, hence achieving extensibility. Furthermore, based on historical German company and stock market data, we discuss a relational implementation of this approach.