Refine
Year of publication
- 2023 (1)
Document Type
- Master's Thesis (1)
Language
- English (1)
Has Fulltext
- yes (1)
Is part of the Bibliography
- no (1)
Keywords
- MIMIC-III (1)
- data analysis (1)
- machine learning (1)
- prediction quality (1)
- stroke (1)
Institute
- Informatik (1)
Analysis of machine learning prediction quality for automated subgroups within the MIMIC III dataset
(2023)
The motivation for this master’s thesis is to explore the potential of predictive data analytics in the field of medicine. For this, the MIMIC-III dataset offers an extensive foundation for the construction of prediction models, including Random Forest, XGBOOST, and deep learning networks. These models were implemented to forecast the mortality of 2,655 stroke patients.
The first part of the thesis involved conducting a comprehensive data analysis of the filtered MIMIC-III dataset.
Subsequently, the effectiveness and fairness of the predictive models were evaluated. Although the performance levels of the developed models did not match those reported in related research, their potential became evident. The results obtained demonstrated promising capabilities and highlighted the effectiveness of the applied methodologies. Moreover, the feature relevance within the XGBOOST model was examined to increase model explainability.
Finally, relevant subgroups were identified to perform a comparative analysis of the prediction performance across these subgroups. While this approach can be regarded as a valuable methodology, it was not possible to investigate underlying reasons for potential unfairness across clusters. Inside the test data, not enough instances remained per subgroup for further fairness or feature relevance analysis.
In conclusion, the implementation of an alternative use case with a higher patient count is recommended.
The code for this analysis is made available via a GitHub repository and includes a frontend to visualize the results.