The search result changed since you submitted your search request. Documents might be displayed in a different sort order.
  • search hit 28 of 33
Back to Result List

Recursive computed ABC (cABC) analysis as a precise method for reducing machine learning based feature sets to their minimum informative size

  • Selecting the k best features is a common task in machine learning. Typically, a few features have high importance, but many have low importance (right-skewed distribution). This report proposes a numerically precise method to address this skewed feature importance distribution in order to reduce a feature set to the informative minimum of items. Computed ABC analysis (cABC) is an item categorization method that aims to identify the most important items by partitioning a set of non-negative numerical items into subsets "A", "B", and "C" such that subset "A" contains the "few important" items based on specific properties of ABC curves defined by their relationship to Lorenz curves. In its recursive form, the cABC analysis can be applied again to subset "A". A generic image dataset and three biomedical datasets (lipidomics and two genomics datasets) with a large number of variables were used to perform the experiments. The experimental results show that the recursive cABC analysis limits the dimensions of the data projection to a minimum where the relevant information is still preserved and directs the feature selection in machine learning to the most important class-relevant information, including filtering feature sets for nonsense variables. Feature sets were reduced to 10% or less of the original variables and still provided accurate classification in data not used for feature selection. cABC analysis, in its recursive variant, provides a computationally precise means of reducing information to a minimum. The minimum is the result of a computation of the number of k most relevant items, rather than a decision to select the k best items from a list. In addition, there are precise criteria for stopping the reduction process. The reduction to the most important features can improve the human understanding of the properties of the data set. The cABC method is implemented in the Python package "cABCanalysis" available at https://pypi.org/project/cABCanalysis/.

Download full text files

Export metadata

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Jörn LötschORCiDGND, Alfred UltschGND
URN:urn:nbn:de:hebis:30:3-755743
DOI:https://doi.org/10.1038/s41598-023-32396-9
ISSN:2045-2322
Parent Title (English):Scientific Reports
Publisher:Macmillan Publishers Limited, part of Springer
Place of publication:London
Document Type:Article
Language:English
Date of Publication (online):2023/04/04
Date of first Publication:2023/04/04
Publishing Institution:Universitätsbibliothek Johann Christian Senckenberg
Release Date:2023/09/11
Volume:13
Issue:5470
Page Number:18
HeBIS-PPN:514228288
Institutes:Medizin
Dewey Decimal Classification:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
6 Technik, Medizin, angewandte Wissenschaften / 61 Medizin und Gesundheit / 610 Medizin und Gesundheit
Sammlungen:Universitätspublikationen
Licence (German):License LogoCreative Commons - CC BY - Namensnennung 4.0 International