Using fingerprints and machine learning tools for the prediction of novel dual active compounds for leukotriene A4 hydrolase and soluble epoxide hydrolase

  • The aim of this work was to establish a new way of predicting novel dual active compounds by combining classical fingerprint representation with state-of-the-art machine learning algorithms. Advantages and disadvantages of the applied 2D- and 3D-fingerprints were investigated. Further, the impact of various machine learning algorithms was analyzed. The new method developed in this work was used to predict compounds, which inhibit two different targets (LTA4H and sEH) involved in the same disease pattern (inflammation). The development of multitarget drugs has become more important in recent years. Many widespread diseases like metabolic syndrome, or cancer are of a multifactorial nature, which makes them hard to be treated effectively with a single drug. The new in silico method presented in this work can help to accelerate the design and development of multitarget drugs, saving time and efforts. The nowadays readily available access to a large number of 3D-structures of biological targets and published activity data of millions of synthesized compounds enabled this study and was used as a starting point for this work. Four different data sets were compiled (crystalized ligands from the PDB, active and inactive compounds from ChEMBL23, newly designed compounds using a combinatorial library). Those data sets were collected and processed using an automated KNIME workflow. This automation has the advantage of allowing easy change and update of compound sources and adapted processing ways. In a next step, the compounds from the compiled data sets were represented using a variety of well-established 2D- and 3D-fingerprints (PLIF, AtomPair, Morgan, FeatMorgan, MACCS). All those fingerprints share the same underlying bit string scheme but vary in the way they describe the molecular structure. Especially the difference between 2D- and 3D-fingerprints was investigated. 2D-fingerprints are solely based on ligand information. 3D-fingerprints, on the other hand, are based on X-ray structure information of protein-ligand complexes. One major difference between 2D- and 3D-fingerprints usage is the need for a 3D-conformation (pose) of the compound in the targets of interest when using 3D-fingerprints. This additional step is time-consuming and brings further uncertainties to the method. Based on the calculated fingerprints state-of-the-art machine learning algorithms (SVC, RF, XGB and ADA) were used to predict novel dual active compounds. The models were evaluated by 10-fold cross validation and accuracy as the primary measure of model performance was maximized. Second, individual parameters of the four machine learning algorithms were optimized in a grid search to achieve maximal accuracy using the optimized partitioning scheme. Overall accuracies, regardless of fingerprint and machine learning algorithm, are slightly better for LTA4H than for sEH. The goal to predict dual active compounds was realized by comparing the set of predicted to be active compounds for LTA4H and sEH. For the 3D-fingerprint PLIF the machine learning algorithm Random Forest was chosen, from which compounds for synthesis and testing were selected. Of 115 predicted to be active compounds, six compounds were cherry picked. Two compounds showed very good/moderate dual inhibitory activity. Of the 2D-fingerprints, the AtomPair fingerprint in combination with the machine learning algorithm Random Forest was chosen from which compounds were selected for synthesis and testing. 116 compounds were predicted to be dual active against LTA4H and sEH. One of those compounds showed good dual inhibitory activity. In this work it was possible to show advantages and disadvantages of using 2D- and 3D-fingerprints in combination with machine learning algorithms. Both strategies (2D: ligand-based, 3D: structure-based) lead to the prediction of novel dual active compounds with moderate to very good inhibitory activity. The method developed in this work is able to predict dual active compounds with very good inhibitory activity and novel (previously unknown) scaffolds inhibiting the targets LTA4H and sEH. This contribution to in silico drug design is promising and can be used for the prediction of novel dual active compounds. Those compounds can further be optimized regarding binding affinity, solubility and further pharmacological and physicochemical properties.
  • Ziel dieser Arbeit ist es neuartige Verbindungen vorherzusagen, die nicht nur ein Einzelnes, sondern zugleich zwei unterschiedliche Proteine inhibieren. Die Zielproteine dieser Arbeit (Leukotrien A4 Hydrolase (LTA4H) und lösliche Epoxid Hydrolase (sEH)) befinden sich in der Arachidonsäure (AA) Kaskade und werden mit verschiedenen inflammatorischen Erkrankungen in Verbindung gebracht (z.B. Asthma, Rheumatoide Arthritis, Dermatitis und Atherosklerose). Die AA Kaskade zeigt eine intensive Kommunikation zwischen den einzelnen Metabolisierungswegen. Die Inhibition von nur einem Metabolisierungsweg lässt den metabolischen Abbau von AA über die anderen beiden Metabolisierungswege zu. Dadurch werden positive Auswirkungen von verabreichten Wirkstoffen verringert. Werden jedoch zwei verschiedene Metabolisierungswege gleichzeitig von einem Wirkstoff inhibiert kann dieses Phänomen überwunden werden. Dies kann über die Gabe von mehreren Wirkstoffen oder einen Wirkstoff, der mehrere Proteine inhibiert erreicht werden (dualer Wirkstoff). Ein dualer Wirkstoff minimiert die Gefahr unvorhersehbarer Wirkstoffinteraktionen, die durch die Gabe von zwei verschiedenen Wirkstoffen hervorgerufen werden können...

Download full text files

Export metadata

Additional Services

Share in Twitter Search Google Scholar
Author:Lena Hefke
Place of publication:Frankfurt am Main
Referee:Ewgenij ProschakORCiDGND, Stefan KnappORCiD
Document Type:Doctoral Thesis
Date of Publication (online):2020/12/28
Year of first Publication:2021
Publishing Institution:Universitätsbibliothek Johann Christian Senckenberg
Granting Institution:Johann Wolfgang Goethe-Universität
Date of final exam:2020/11/12
Release Date:2021/01/21
Page Number:114
Institutes:Biochemie, Chemie und Pharmazie / Biochemie und Chemie
Dewey Decimal Classification:5 Naturwissenschaften und Mathematik / 57 Biowissenschaften; Biologie / 570 Biowissenschaften; Biologie
Licence (German):License LogoDeutsches Urheberrecht