Refine
Keywords
- bacterial autotransporter (1)
- pattern (1)
- protein targeting (1)
- protein trafficking (1)
- sequence analysis (1)
- signal peptide (1)
Institute
-
Virtual screening for PPAR-gamma ligands using the ISOAK molecular graph kernel and gaussian processes
(2009)
- For a virtual screening study, we introduce a combination of machine learning techniques, employing a graph kernel, Gaussian process regression and clustered cross-validation. The aim was to find ligands of peroxisome-proliferator activated receptor gamma (PPAR-y). The receptors in the PPAR family belong to the steroid-thyroid-retinoid superfamily of nuclear receptors and act as transcription factors. They play a role in the regulation of lipid and glucose metabolism in vertebrates and are linked to various human processes and diseases [1]. For this study, we used a dataset of 176 PPAR-y agonists published by Ruecker et al [2]. Gaussian process (GP) models can provide a confidence estimate for each individual prediction, thereby allowing to assess which compounds are inside of the model's domain of applicability. This feature is useful in virtual screening, where a large fraction of the tested compounds may be outside of the model's domain of applicability. In cheminformatics, GPs have been applied to different classification and regression tasks using either radial basis function or rational quadratic kernels based on vectorial descriptors [4,5]. We used a graph kernel based on iterative similarity and optimal assignments (ISOAK, [3]) for non-linear Bayesian regression with Gaussian process priors (GP regression, [4]). A number of kernel-based learning algorithms (including GPs) are capable of multiple kernel learning [5], which allows combining heterogeneous information by using multiple kernels at the same time. In this work, we combined rational quadratic kernels for vectorial molecular descriptors (MOE2D, CATS2D and Ghose-Crippen fragment descriptors) with the ISOAK graph kernel. We evaluated our methodology in different ranking and regression settings. Ranking performance was assessed using the number of false positives within the top k predicted compounds. Predicted compounds were ranked based on both predicted binding affinity and the confidence in each prediction. In the regression setting, we employed standard loss functions like mean absolute error (MEA) and root mean squared error. The established linear ridge regression (LRR) and support vector regression (SVR) algorithms served as baseline methods. In addition to standard test/training splits and cross-validation, we used a clustered cross-validation strategy where clusters of compounds are left out when constructing training sets. This results in less optimistic results, but has the advantage of favouring more robust and potentially extrapolation-capable algorithms than standard training/test splits and normal cross-validation. In the regression setting, both GP and SVR models performed well, yielding MAEs as low as 0.66 +- 0.08 log units (clustered CV) and 0.51 +- 0.3 log units (normal CV). In the ranking setting, GPs slightly outperform SVR (0.21 +- 0.09 log units vs. 0.3 +- 0.08 log units). In conclusion, Gaussian process regression using simultaneously – via multiple kernel learning – the ISOAK molecular graph kernel and the rational quadratic kernel (with standard molecular descriptors) performs excellent in retrospective evaluation. A prospective evaluation study is currently in progress.
-
Virtual chemical reactions for drug design
(2009)
- Two methods for the fast, fragment-based combinatorial molecule assembly were developed. The software COLIBREE® (Combinatorial Library Breeding) generates candidate structures from scratch, based on stochastic optimization [1]. Result structures of a COLIBREE design run are based on a fixed scaffold and variable linkers and side-chains. Linkers representing virtual chemical reactions and side-chain building blocks obtained from pseudo-retrosynthetic dissection of large compound databases are exchanged during optimization. The process of molecule design employs a discrete version of Particle Swarm Optimization (PSO) [2]. Assembled compounds are scored according to their similarity to known reference ligands. Distance to reference molecules is computed in the space of the topological pharmacophore descriptor CATS [3]. In a case study, the approach was applied to the de novo design of potential peroxisome proliferator-activated receptor (PPAR gamma) selective agonists. In a second approach, we developed the formal grammar Reaction-MQL [4] for the in silico representation and application of chemical reactions. Chemical transformation schemes are defined by functional groups participating in known organic reactions. The substructures are specified by the linear Molecular Query Language (MQL) [5]. The developed software package contains a parser for Reaction-MQL-expressions and enables users to design, test and virtually apply chemical reactions. The program has already been used to create combinatorial libraries for virtual screening studies. It was also applied in fragmentation studies with different sets of retrosynthetic reactions and various compound libraries.
-
Unterwegs in chemischen Räumen : Chemieinformatik und Moleküldesign
(2003)
- Wie findet man einen neuen Wirkstoff? Die pharmazeutisch-chemische Forschung steht mit diesem Vorhaben vor einer scheinbar unlösbaren Aufgabe, denn der "chemische Raum" aller wirkstoffartigen Moleküle ist unvorstellbar groß. So wurde geschätzt, dass man prinzipiell aus 1060 bis 10100 verschiedenen Verbindungen die geeigneten Kandidaten auswählen kann. Zum Vergleich: Seit dem Urknall sollen "nur" etwa 10 hoch 18 Sekunden, etwa 14 Milliarden Jahre, vergangen sein. Dies bedeutet, dass der chemische Raum praktisch unendlich ist. Aus dieser Überlegung lassen sich zumindest zwei Schlussfolgerungen ziehen: Zum einen gibt es die begründete Hoffnung, dass ein Molekül mit der gewünschten Aktivität existiert, zum anderen stellt sich die Frage, wie diese unvorstellbar große Zahl chemischer Verbindungen systematisch durchmustert werden kann? Doch die Situation ist nicht so hoffnungslos, wie sie auf den ersten Blick erscheint. Dies zeigt die erfolgreiche Entwicklung immer neuer Medikamente. Das Forschungsgebiet der Chemieinformatik befasst sich mit der Entwicklung von intelligenten Lösungsansätzen, die Chemikern bei dieser Suche nach den "Nadeln im riesigen Heuhaufen" helfen können.
-
The plasmodium export element revisited
(2008)
- We performed a bioinformatical analysis of protein export elements (PEXEL) in the putative proteome of the malaria parasite Plasmodium falciparum. A protein family-specific conservation of physicochemical residue profiles was found for PEXEL-flanking sequence regions. We demonstrate that the family members can be clustered based on the flanking regions only and display characteristic hydrophobicity patterns. This raises the possibility that the flanking regions may contain additional information for a family-specific role of PEXEL. We further show that signal peptide cleavage results in a positional alignment of PEXEL from both proteins with, and without, a signal peptide.
-
SQUIRRELnovo : de novo design of a PPARalpha agonist by bioisosteric replacement
(2009)
- Shape complementarity is a compulsory condition for molecular recognition [1]. In our 3D ligand-based virtual screening approach called SQUIRREL, we combine shape-based rigid body alignment [2] with fuzzy pharmacophore scoring [3]. Retrospective validation studies demonstrate the superiority of methods which combine both shape and pharmacophore information on the family of peroxisome proliferator-activated receptors (PPARs). We demonstrate the real-life applicability of SQUIRREL by a prospective virtual screening study, where a potent PPARalpha agonist with an EC50 of 44 nM and 100-fold selectivity against PPARgamma has been identified. SQUIRREL molecular superposition is based on a graph-matching routine [4] and allows partial matching. We used this advantage for searching for bioisosteric replacement suggestions in a database of molecular fragments derived from a collection of drug-like compounds [5]. The bioisosteric groups suggested by our tool SQURRELnovo, can be used for ligand-based de novo design by a human expert. Using the fibrate derivative GW590735 [6] as query, we designed a novel lead structure by substitution of the acidic head group and hydrophobic tail. The synthesis and following testing in a cell-based reporter gene assay [7,8] revealed that the designed structure activates PPARalpha with an EC50 of 510 nM.
-
Spherical harmonics coeffcients for ligand-based virtual screening of cyclooxygenase inhibitors
(2011)
- Background: Molecular descriptors are essential for many applications in computational chemistry, such as ligand-based similarity searching. Spherical harmonics have previously been suggested as comprehensive descriptors of molecular structure and properties. We investigate a spherical harmonics descriptor for shape-based virtual screening. Methodology/Principal Findings: We introduce and validate a partially rotation-invariant three-dimensional molecular shape descriptor based on the norm of spherical harmonics expansion coefficients. Using this molecular representation, we parameterize molecular surfaces, i.e., isosurfaces of spatial molecular property distributions. We validate the shape descriptor in a comprehensive retrospective virtual screening experiment. In a prospective study, we virtually screen a large compound library for cyclooxygenase inhibitors, using a self-organizing map as a pre-filter and the shape descriptor for candidate prioritization. Conclusions/Significance: 12 compounds were tested in vitro for direct enzyme inhibition and in a whole blood assay. Active compounds containing a triazole scaffold were identified as direct cyclooxygenase-1 inhibitors. This outcome corroborates the usefulness of spherical harmonics for representation of molecular shape in virtual screening of large compound collections. The combination of pharmacophore and shape-based filtering of screening candidates proved to be a straightforward approach to finding novel bioactive chemotypes with minimal experimental effort.
-
Sequential anti-cytomegalovirus response monitoring may allow prediction of cytomegalovirus reactivation after allogeneic stem cell transplantation
(2012)
- Background: Reconstitution of cytomegalovirus-specific CD3+CD8+ T cells (CMV-CTLs) after allogeneic hematopoietic stem cell transplantation (HSCT) is necessary to bring cytomegalovirus (CMV) reactivation under control. However, the parameters determining protective CMV-CTL reconstitution remain unclear to date. Design and Methods: In a prospective tri-center study, CMV-CTL reconstitution was analyzed in the peripheral blood from 278 patients during the year following HSCT using 7 commercially available tetrameric HLA-CMV epitope complexes. All patients included could be monitored with at least CMV-specific tetramer. Results: CMV-CTL reconstitution was detected in 198 patients (71%) after allogeneic HSCT. Most importantly, reconstitution with 1 CMV-CTL per µl blood between day +50 and day +75 post-HSCT discriminated between patients with and without CMV reactivation in the R+/D+ patient group, independent of the CMV-epitope recognized. In addition, CMV-CTLs expanded more daramtaically in patients experiencing only one CMV-reactivation than those without or those with multiple CMV reactivations. Monitoring using at least 2 tetramers was possible in 63% (n = 176) of the patients. The combinations of particular HLA molecules influenced the numbers of CMV-CTLs detected. The highest CMV-CTL count obtained for an individual tetramer also changed over time in 11% of these patients (n = 19) resulting in higher levels of HLA-B*0801 (IE-1) recognizing CMV-CTLs in 14 patients. Conclusions: Our results indicate that 1 CMV-CTL per µl blood between day +50 to +75 marks the beginning of an immune response against CMV in the R+/D+ group. Detection of CMV-CTL expansion thereafter indicates successful resolution of the CMV reactivation. Thus, sequential monitoring of CMV-CTL reconstitution can be used to predict patients at risk for recurrent CMV reactivation.
-
SBE13, a newly identified inhibitor of inactive polo-like kinase 1
(2010)
- Poster presentation at 5th German Conference on Cheminformatics: 23. CIC-Workshop Goslar, Germany. 8-10 November 2009 Protein kinases are important targets for drug development. The almost identical protein folding of kinases and the common co-substrate ATP leads to the problem of inhibitor selectivity. Type II inhibitors, targeting the inactive conformation of kinases, occupy a hydrophobic pocket with less conserved surrounding amino acids [1]. Human polo-like kinase 1 (Plk1) represents a promising target for approaches to identify new therapeutic agents. Plk1 belongs to a family of highly conserved serine/threonine kinases, and is a key player in mitosis, where it modulates the spindle checkpoint at metaphase/anaphase transition. Plk1 is over-expressed in all today analyzed human tumors of different origin and serves as a negative prognostic marker in cancer patients. The newly identified inhibitor, SBE13, a vanillin derivative, targets Plk1 in its inactive conformation [2]. This leads to selectivity within the Plk family and towards Aurora A. This selectivity can be explained by docking studies of SBE13 into the binding pocket of homology models of Plk1, Plk2 and Plk3 in their inactive conformation. SBE13 showed anti-proliferative effects in cancer cell lines of different origins with EC50 values between 5 microM and 39 microM and induced apoptosis. Increasing concentrations of SBE13 result in increasing amounts of cells in G2/M phase 13 hours after double thymidin block of HeLa cells. The kinase activity of Plk1 was inhibited with an IC50 of 200 pM. Taken together, we could show that carefully designed structure-based virtual screening is well-suited to identify selective type II kinase inhibitors targeting Plk1 as potential anti-cancer therapeutics.
-
Pseudoreceptor-based pocket selection in a molecular dynamics simulation of the histamine H4 receptor
(2009)
- There is a renewed interest in pseudoreceptor models which enable computational chemists to bridge the gap of ligand- and receptor-based drug design [1]. We developed a pseudoreceptor model for the histamine H4 receptor (H4R) based on five potent antagonists representing different chemotypes. Here we present the selection of potential ligand binding pockets that occur during molecular dynamics (MD) simulations of a homology-based receptor model. We present a method for prioritizing receptor models according to their match with the consensus ligand-binding mode represented by the pseudoreceptor. In this way, ligand information can be transferred to receptor-based modelling. We use Geometric Hashing to match three-dimensional points in Cartesion space [2]. This allows for the rapid translation- and rotation-free comparison of atom coordinates, which also permits partial matching. The only prerequisite is a hash table, which uses distance triplets as hash keys. Each time a distance triplet occurring in the candidate point set which corresponds to an existing key, the match is represented by a vote of the respective key. Finally, the global match of both point sets can be easily extracted by selection of voted distance triplets. The results revealed a preferred ligand-binding pocket in H4R, which would not have been identified using an unrefined homology model of the protein. The key idea was to rely on ligand information by pseudoreceptor modelling.
-
Prediction of type III secretion signals in genomes of gram-negative bacteria
(2009)
- Background: Pathogenic bacteria infecting both animals as well as plants use various mechanisms to transport virulence factors across their cell membranes and channel these proteins into the infected host cell. The type III secretion system represents such a mechanism. Proteins transported via this pathway (‘‘effector proteins’’) have to be distinguished from all other proteins that are not exported from the bacterial cell. Although a special targeting signal at the N-terminal end of effector proteins has been proposed in literature its exact characteristics remain unknown. Methodology/Principal Findings: In this study, we demonstrate that the signals encoded in the sequences of type III secretion system effectors can be consistently recognized and predicted by machine learning techniques. Known protein effectors were compiled from the literature and sequence databases, and served as training data for artificial neural networks and support vector machine classifiers. Common sequence features were most pronounced in the first 30 amino acids of the effector sequences. Classification accuracy yielded a cross-validated Matthews correlation of 0.63 and allowed for genome-wide prediction of potential type III secretion system effectors in 705 proteobacterial genomes (12% predicted candidates protein), their chromosomes (11%) and plasmids (13%), as well as 213 Firmicute genomes (7%). Conclusions/Significance: We present a signal prediction method together with comprehensive survey of potential type III secretion system effectors extracted from 918 published bacterial genomes. Our study demonstrates that the analyzed signal features are common across a wide range of species, and provides a substantial basis for the identification of exported pathogenic proteins as targets for future therapeutic intervention. The prediction software is publicly accessible from our web server ( www.modlab.org ).
