Enhancing explainable machine learning by reconsidering initially unselected items in feature selection for classification
- Feature selection is a common step in data preprocessing that precedes machine learning to reduce data space and the computational cost of processing or obtaining the data. Filtering out uninformative variables is also important for knowledge discovery. By reducing the data space to only those components that are informative to the class structure, feature selection can simplify models so that they can be more easily interpreted by researchers in the field, reminiscent of explainable artificial intelligence. Knowledge discovery in complex data thus benefits from feature selection that aims to understand feature sets in the thematic context from which the data set originates. However, a single variable selected from a very small number of variables that are technically sufficient for AI training may make little immediate thematic sense, whereas the additional consideration of a variable discarded during feature selection could make scientific discovery very explicit. In this report, we propose an approach to explainable feature selection (XFS) based on a systematic reconsideration of unselected features. The difference between the respective classifications when training the algorithms with the selected features or with the unselected features provides a valid estimate of whether the relevant features in a data set have been selected and uninformative or trivial information was filtered out. It is shown that revisiting originally unselected variables in multivariate data sets allows for the detection of pathologies and errors in the feature selection that occasionally resulted in the failure to identify the most appropriate variables.
Author: | Jörn LötschORCiDGND, Alfred UltschGND |
---|---|
URN: | urn:nbn:de:hebis:30:3-755686 |
DOI: | https://doi.org/10.3390/biomedinformatics2040047 |
ISSN: | 2673-7426 |
Parent Title (English): | BioMedInformatics |
Publisher: | MDPI |
Place of publication: | Basel |
Document Type: | Article |
Language: | English |
Date of Publication (online): | 2022/12/12 |
Date of first Publication: | 2022/12/12 |
Publishing Institution: | Universitätsbibliothek Johann Christian Senckenberg |
Release Date: | 2023/09/11 |
Tag: | artificial intelligence; data science; digital medicine; machine-learning |
Volume: | 2 |
Issue: | 4 |
Page Number: | 14 |
First Page: | 701 |
Last Page: | 714 |
HeBIS-PPN: | 513120300 |
Institutes: | Medizin |
Dewey Decimal Classification: | 0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik |
6 Technik, Medizin, angewandte Wissenschaften / 61 Medizin und Gesundheit / 610 Medizin und Gesundheit | |
Sammlungen: | Universitätspublikationen |
Licence (German): | Creative Commons - Namensnennung 4.0 |