• search hit 5 of 76
Back to Result List

The Fundamental Clustering and Projection Suite (FCPS): a dataset collection to test the performance of clustering and data projection algorithms

  • In the context of data science, data projection and clustering are common procedures. The chosen analysis method is crucial to avoid faulty pattern recognition. It is therefore necessary to know the properties and especially the limitations of projection and clustering algorithms. This report describes a collection of datasets that are grouped together in the Fundamental Clustering and Projection Suite (FCPS). The FCPS contains 10 datasets with the names "Atom", "Chainlink", "EngyTime", "Golfball", "Hepta", "Lsun", "Target", "Tetra", "TwoDiamonds", and "WingNut". Common clustering methods occasionally identified non-existent clusters or assigned data points to the wrong clusters in the FCPS suite. Likewise, common data projection methods could only partially reproduce the data structure correctly on a two-dimensional plane. In conclusion, the FCPS dataset collection addresses general challenges for clustering and projection algorithms such as lack of linear separability, different or small inner class spacing, classes defined by data density rather than data spacing, no cluster structure at all, outliers, or classes that are in contact. This report describes a collection of datasets that are grouped together in the Fundamental Clustering and Projection Suite (FCPS). It is designed to address specific problems of structure discovery in high-dimensional spaces.

Download full text files

Export metadata

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Alfred UltschGND, Jörn LötschORCiDGND
URN:urn:nbn:de:hebis:30:3-544292
DOI:https://doi.org/10.3390/data5010013
Parent Title (English):Data
Publisher:MDPI
Place of publication:Basel
Document Type:Article
Language:English
Date of Publication (online):2020/01/30
Date of first Publication:2020/01/30
Publishing Institution:Universitätsbibliothek Johann Christian Senckenberg
Release Date:2020/03/17
Tag:benchmark standards; clustering; data projection; high dimensional complex data; performance tests
Volume:5
Issue:13
Page Number:10
Note:
This is an open access article distributed under the Creative Commons Attribution License https://creativecommons.org/licenses/by/4.0/ which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited
HeBIS-PPN:464656877
Institutes:Biochemie, Chemie und Pharmazie
Dewey Decimal Classification:6 Technik, Medizin, angewandte Wissenschaften / 61 Medizin und Gesundheit / 610 Medizin und Gesundheit
Sammlungen:Universitätspublikationen
Licence (German):License LogoCreative Commons - Namensnennung 4.0