TY - INPR A1 - Baumgarten, Nina A1 - Rumpf, Laura A1 - Keßler, Thorsten A1 - Schulz, Marcel Holger T1 - A statistical approach to identify regulatory DNA variations T2 - bioRxiv N2 - Non-coding variations located within regulatory elements may alter gene expression by modifying Transcription Factor (TF) binding sites and thereby lead to functional consequences like various traits or diseases. To understand these molecular mechanisms, different TF models are being used to assess the effect of DNA sequence variations, such as Single Nucleotide Polymorphisms (SNPs). However, few statistical approaches exist to compute statistical significance of results but they often are slow for large sets of SNPs, such as data obtained from a genome-wide association study (GWAS) or allele-specific analysis of chromatin data. Results We investigate the distribution of maximal differential TF binding scores for general computational models that assess TF binding. We find that a modified Laplace distribution can adequately approximate the empirical distributions. A benchmark on in vitro and in vivo data sets showed that our new approach improves on an existing method in terms of performance and speed. In applications on large sets of eQTL and GWAS SNPs we could illustrate the usefulness of the novel statistic to highlight cell type specific regulators and TF target genes. Conclusions Our approach allows the evaluation of DNA changes that induce differential TF binding in a fast and accurate manner, permitting computations on large mutation data sets. An implementation of the novel approach is freely available at https://github.com/SchulzLab/SNEEP. Y1 - 2023 UR - http://publikationen.ub.uni-frankfurt.de/frontdoor/index/index/docId/73167 UR - https://nbn-resolving.org/urn:nbn:de:hebis:30:3-731671 IS - 2023.01.31.526404 ER -