C8 Data Collection and Data Estimation Methodology; Computer Programs
Refine
Document Type
- Article (1)
- Working Paper (1)
Language
- English (2)
Has Fulltext
- yes (2)
Is part of the Bibliography
- no (2)
Keywords
- Machine learning (2)
- Artificial intelligence (1)
- Database linking (1)
- Entity matching (1)
- Entity resolution (1)
- Information systems (1)
- Record resolution (1)
- Similarity encoding (1)
Artificial Intelligence (AI) and Machine Learning (ML) are currently hot topics in industry and business practice, while management-oriented research disciplines seem reluctant to adopt these sophisticated data analytics methods as research instruments. Even the Information Systems (IS) discipline with its close connections to Computer Science seems to be conservative when conducting empirical research endeavors. To assess the magnitude of the problem and to understand its causes, we conducted a bibliographic review on publications in high-level IS journals. We reviewed 1,838 articles that matched corresponding keyword-queries in journals from the AIS senior scholar basket, Electronic Markets and Decision Support Systems (Ranked B). In addition, we conducted a survey among IS researchers (N = 110). Based on the findings from our sample we evaluate different potential causes that could explain why ML methods are rather underrepresented in top-tier journals and discuss how the IS discipline could successfully incorporate ML methods in research undertakings.
In this study, we introduce a novel entity matching (EM) framework. It com-bines state-of-the-art EM approaches based on Artificial Neural Networks (ANN) with a new similarity encoding derived from matching techniques that are preva-lent in finance and economics. Our framework is on-par or outperforms alternative end-to-end frameworks in standard benchmark cases. Because similarity encod-ing is constructed using (edit) distances instead of semantic similarities, it avoids out-of-vocabulary problems when matching dirty data. We highlight this property by applying an EM application to dirty financial firm-level data extracted from historical archives.