Refine
Document Type
- Preprint (5)
Language
- English (5)
Has Fulltext
- yes (5)
Is part of the Bibliography
- no (5)
Keywords
- Auto-ML (1)
- Graph Machine Learning (1)
- Graph Neural Networks (1)
- Network Neuroscience (1)
Institute
- Medizin (5)
Background: Biological psychiatry aims to understand mental disorders in terms of altered neurobiological pathways. However, for one of the most prevalent and disabling mental disorders, Major Depressive Disorder (MDD), patients only marginally differ from healthy individuals on the group-level. Whether Precision Psychiatry can solve this discrepancy and provide specific, reliable biomarkers remains unclear as current Machine Learning (ML) studies suffer from shortcomings pertaining to methods and data, which lead to substantial over-as well as underestimation of true model accuracy.
Methods: Addressing these issues, we quantify classification accuracy on a single-subject level in N=1,801 patients with MDD and healthy controls employing an extensive multivariate approach across a comprehensive range of neuroimaging modalities in a well-curated cohort, including structural and functional Magnetic Resonance Imaging, Diffusion Tensor Imaging as well as a polygenic risk score for depression.
Findings Training and testing a total of 2.4 million ML models, we find accuracies for diagnostic classification between 48.1% and 62.0%. Multimodal data integration of all neuroimaging modalities does not improve model performance. Similarly, training ML models on individuals stratified based on age, sex, or remission status does not lead to better classification. Even under simulated conditions of perfect reliability, performance does not substantially improve. Importantly, model error analysis identifies symptom severity as one potential target for MDD subgroup identification.
Interpretation: Although multivariate neuroimaging markers increase predictive power compared to univariate analyses, single-subject classification – even under conditions of extensive, best-practice Machine Learning optimization in a large, harmonized sample of patients diagnosed using state-of-the-art clinical assessments – does not reach clinically relevant performance. Based on this evidence, we sketch a course of action for Precision Psychiatry and future MDD biomarker research.
Graph data is an omnipresent way to represent information in machine learning. Especially, in neuroscience research, data from Diffusion-Tensor Imaging (DTI) and functional Magnetic Resonance Imaging (fMRI) is commonly represented as graphs. Exploiting the graph structure of these modalities using graph-specific machine learning applications is currently hampered by the lack of easy-to-use software. PHOTONAI Graph aims to close the gap between domain experts of machine learning, graph experts and neuroscientists. Leveraging the rapid machine learning model development features of the Python machine learning API PHOTONAI, PHOTONAI Graph enables the design, optimization, and evaluation of reliable graph machine learning models for practitioners. As such, it provides easy access to custom graph machine learning pipelines including, hyperparameter optimization and algorithm evaluation ensuring reproducibility and valid performance estimates. Integrating established algorithms such as graph neural networks, graph embeddings and graph kernels, it allows researchers without significant coding experience to build and optimize complex graph machine learning models within a few lines of code. We showcase the versatility of this toolbox by building pipelines for both resting–state fMRI and DTI data in the hope that it will increase the adoption of graph-specific machine learning algorithms in neuroscience research.
Mapping cortical brain asymmetry in 17,141 healthy individuals worldwide via the ENIGMA Consortium
(2017)
Bipolar disorder (BD) is a heritable mental illness with complex etiology. While the largest published genome-wide association study identified 64 BD risk loci, the causal SNPs and genes within these loci remain unknown. We applied a suite of statistical and functional fine-mapping methods to these loci, and prioritized 22 likely causal SNPs for BD. We mapped these SNPs to genes, and investigated their likely functional consequences by integrating variant annotations, brain cell-type epigenomic annotations, brain quantitative trait loci, and results from rare variant exome sequencing in BD. Convergent lines of evidence supported the roles of SCN2A, TRANK1, DCLK3, INSYN2B, SYNE1, THSD7A, CACNA1B, TUBBP5, PLCB3, PRDX5, KCNK4, AP001453.3, TRPT1, FKBP2, DNAJC4, RASGRP1, FURIN, FES, YWHAE, DPH1, GSDMB, MED24, THRA, EEF1A2, and KCNQ2 in BD. These represent promising candidates for functional experiments to understand biological mechanisms and therapeutic potential. Additionally, we demonstrated that fine-mapping effect sizes can improve performance and transferability of BD polygenic risk scores across ancestrally diverse populations, and present a high-throughput fine-mapping pipeline (https://github.com/mkoromina/SAFFARI).
Investigators in the cognitive neurosciences have turned to Big Data to address persistent replication and reliability issues by increasing sample sizes, statistical power, and representativeness of data. While there is tremendous potential to advance science through open data sharing, these efforts unveil a host of new questions about how to integrate data arising from distinct sources and instruments. We focus on the most frequently assessed area of cognition - memory testing - and demonstrate a process for reliable data harmonization across three common measures. We aggregated raw data from 53 studies from around the world which measured at least one of three distinct verbal learning tasks, totaling N = 10,505 healthy and brain-injured individuals. A mega analysis was conducted using empirical bayes harmonization to isolate and remove site effects, followed by linear models which adjusted for common covariates. After corrections, a continuous item response theory (IRT) model estimated each individual subject’s latent verbal learning ability while accounting for item difficulties. Harmonization significantly reduced inter-site variance by 37% while preserving covariate effects. The effects of age, sex, and education on scores were found to be highly consistent across memory tests. IRT methods for equating scores across AVLTs agreed with held-out data of dually-administered tests, and these tools are made available for free online. This work demonstrates that large-scale data sharing and harmonization initiatives can offer opportunities to address reproducibility and integration challenges across the behavioral sciences.