TY - INPR A1 - Winter, Nils R. A1 - Blanke, Julian A1 - Leenings, Ramona A1 - Ernsting, Jan A1 - Fisch, Lukas A1 - Sarink, Kelvin A1 - Barkhau, Carlotta A1 - Thiel, Katharina A1 - Flinkenflügel, Kira A1 - Winter, Alexandra A1 - Goltermann, Janik A1 - Meinert, Susanne A1 - Dohm, Katharina A1 - Repple, Jonathan A1 - Gruber, Marius A1 - Leehr, Elisabeth Johanna A1 - Opel, Nils A1 - Grotegerd, Dominik A1 - Redlich, Ronny A1 - Nitsch, Robert A1 - Bauer, Jochen A1 - Heindel, Walter A1 - Groß, Joachim A1 - Andlauer, Till A1 - Forstner, Andreas Josef A1 - Nöthen, Markus Maria A1 - Rietschel, Marcella A1 - Hofmann, Stefan G. A1 - Pfarr, Julia-Katharina A1 - Teutenberg, Lea A1 - Usemann, Paula A1 - Thomas-Odenthal, Florian A1 - Wroblewski, Adrian A1 - Brosch, Katharina A1 - Stein, Frederike A1 - Jansen, Andreas A1 - Jamalabadi, Hamidreza A1 - Alexander, Nina A1 - Straube, Benjamin A1 - Nenadić, Igor A1 - Kircher, Tilo A1 - Dannlowski, Udo A1 - Hahn, Tim T1 - A systematic evaluation of machine learning-based biomarkers for major depressive disorder across modalities T2 - medRxiv N2 - Background: Biological psychiatry aims to understand mental disorders in terms of altered neurobiological pathways. However, for one of the most prevalent and disabling mental disorders, Major Depressive Disorder (MDD), patients only marginally differ from healthy individuals on the group-level. Whether Precision Psychiatry can solve this discrepancy and provide specific, reliable biomarkers remains unclear as current Machine Learning (ML) studies suffer from shortcomings pertaining to methods and data, which lead to substantial over-as well as underestimation of true model accuracy. Methods: Addressing these issues, we quantify classification accuracy on a single-subject level in N=1,801 patients with MDD and healthy controls employing an extensive multivariate approach across a comprehensive range of neuroimaging modalities in a well-curated cohort, including structural and functional Magnetic Resonance Imaging, Diffusion Tensor Imaging as well as a polygenic risk score for depression. Findings Training and testing a total of 2.4 million ML models, we find accuracies for diagnostic classification between 48.1% and 62.0%. Multimodal data integration of all neuroimaging modalities does not improve model performance. Similarly, training ML models on individuals stratified based on age, sex, or remission status does not lead to better classification. Even under simulated conditions of perfect reliability, performance does not substantially improve. Importantly, model error analysis identifies symptom severity as one potential target for MDD subgroup identification. Interpretation: Although multivariate neuroimaging markers increase predictive power compared to univariate analyses, single-subject classification – even under conditions of extensive, best-practice Machine Learning optimization in a large, harmonized sample of patients diagnosed using state-of-the-art clinical assessments – does not reach clinically relevant performance. Based on this evidence, we sketch a course of action for Precision Psychiatry and future MDD biomarker research. Y1 - 2023 UR - http://publikationen.ub.uni-frankfurt.de/frontdoor/index/index/docId/73540 UR - https://nbn-resolving.org/urn:nbn:de:hebis:30:3-735402 IS - 2023.02.27.23286311 ER -