Year of publication
- 2009 (12) (remove)
- Distance phenomena in high-dimensional chemical descriptor spaces : consequences for similarity-based approaches (2009)
- SQUIRRELnovo : de novo design of a PPARalpha agonist by bioisosteric replacement (2009)
- Shape complementarity is a compulsory condition for molecular recognition . In our 3D ligand-based virtual screening approach called SQUIRREL, we combine shape-based rigid body alignment  with fuzzy pharmacophore scoring . Retrospective validation studies demonstrate the superiority of methods which combine both shape and pharmacophore information on the family of peroxisome proliferator-activated receptors (PPARs). We demonstrate the real-life applicability of SQUIRREL by a prospective virtual screening study, where a potent PPARalpha agonist with an EC50 of 44 nM and 100-fold selectivity against PPARgamma has been identified. SQUIRREL molecular superposition is based on a graph-matching routine  and allows partial matching. We used this advantage for searching for bioisosteric replacement suggestions in a database of molecular fragments derived from a collection of drug-like compounds . The bioisosteric groups suggested by our tool SQURRELnovo, can be used for ligand-based de novo design by a human expert. Using the fibrate derivative GW590735  as query, we designed a novel lead structure by substitution of the acidic head group and hydrophobic tail. The synthesis and following testing in a cell-based reporter gene assay [7,8] revealed that the designed structure activates PPARalpha with an EC50 of 510 nM.
- PocketGraph : graph representation of binding site volumes (2009)
- The representation of small molecules as molecular graphs  is a common technique in various fields of cheminformatics. This approach employs abstract descriptions of topology and properties for rapid analyses and comparison. Receptor-based methods in contrast mostly depend on more complex representations impeding simplified analysis and limiting the possibilities of property assignment. In this study we demonstrate that ligand-based methods can be applied to receptor-derived binding site analysis. We introduce the new method PocketGraph that translates representations of binding site volumes into linear graphs and enables the application of graph-based methods to the world of protein pockets. The method uses the PocketPicker  algorithm for characterization of binding site volumes and employs a Growing Neural Gas  procedure to derive graph representations of pocket topologies. Self-organizing map (SOM) projections revealed a limited number of pocket topologies. We argue that there is only a small set of pocket shapes realized in the known ligand-receptor complexes.
- Pseudoreceptor-based pocket selection in a molecular dynamics simulation of the histamine H4 receptor (2009)
- There is a renewed interest in pseudoreceptor models which enable computational chemists to bridge the gap of ligand- and receptor-based drug design . We developed a pseudoreceptor model for the histamine H4 receptor (H4R) based on five potent antagonists representing different chemotypes. Here we present the selection of potential ligand binding pockets that occur during molecular dynamics (MD) simulations of a homology-based receptor model. We present a method for prioritizing receptor models according to their match with the consensus ligand-binding mode represented by the pseudoreceptor. In this way, ligand information can be transferred to receptor-based modelling. We use Geometric Hashing to match three-dimensional points in Cartesion space . This allows for the rapid translation- and rotation-free comparison of atom coordinates, which also permits partial matching. The only prerequisite is a hash table, which uses distance triplets as hash keys. Each time a distance triplet occurring in the candidate point set which corresponds to an existing key, the match is represented by a vote of the respective key. Finally, the global match of both point sets can be easily extracted by selection of voted distance triplets. The results revealed a preferred ligand-binding pocket in H4R, which would not have been identified using an unrefined homology model of the protein. The key idea was to rely on ligand information by pseudoreceptor modelling.
- Virtual chemical reactions for drug design (2009)
- Two methods for the fast, fragment-based combinatorial molecule assembly were developed. The software COLIBREE® (Combinatorial Library Breeding) generates candidate structures from scratch, based on stochastic optimization . Result structures of a COLIBREE design run are based on a fixed scaffold and variable linkers and side-chains. Linkers representing virtual chemical reactions and side-chain building blocks obtained from pseudo-retrosynthetic dissection of large compound databases are exchanged during optimization. The process of molecule design employs a discrete version of Particle Swarm Optimization (PSO) . Assembled compounds are scored according to their similarity to known reference ligands. Distance to reference molecules is computed in the space of the topological pharmacophore descriptor CATS . In a case study, the approach was applied to the de novo design of potential peroxisome proliferator-activated receptor (PPAR gamma) selective agonists. In a second approach, we developed the formal grammar Reaction-MQL  for the in silico representation and application of chemical reactions. Chemical transformation schemes are defined by functional groups participating in known organic reactions. The substructures are specified by the linear Molecular Query Language (MQL) . The developed software package contains a parser for Reaction-MQL-expressions and enables users to design, test and virtually apply chemical reactions. The program has already been used to create combinatorial libraries for virtual screening studies. It was also applied in fragmentation studies with different sets of retrosynthetic reactions and various compound libraries.
- Virtual screening for PPAR-gamma ligands using the ISOAK molecular graph kernel and gaussian processes (2009)
- For a virtual screening study, we introduce a combination of machine learning techniques, employing a graph kernel, Gaussian process regression and clustered cross-validation. The aim was to find ligands of peroxisome-proliferator activated receptor gamma (PPAR-y). The receptors in the PPAR family belong to the steroid-thyroid-retinoid superfamily of nuclear receptors and act as transcription factors. They play a role in the regulation of lipid and glucose metabolism in vertebrates and are linked to various human processes and diseases . For this study, we used a dataset of 176 PPAR-y agonists published by Ruecker et al . Gaussian process (GP) models can provide a confidence estimate for each individual prediction, thereby allowing to assess which compounds are inside of the model's domain of applicability. This feature is useful in virtual screening, where a large fraction of the tested compounds may be outside of the model's domain of applicability. In cheminformatics, GPs have been applied to different classification and regression tasks using either radial basis function or rational quadratic kernels based on vectorial descriptors [4,5]. We used a graph kernel based on iterative similarity and optimal assignments (ISOAK, ) for non-linear Bayesian regression with Gaussian process priors (GP regression, ). A number of kernel-based learning algorithms (including GPs) are capable of multiple kernel learning , which allows combining heterogeneous information by using multiple kernels at the same time. In this work, we combined rational quadratic kernels for vectorial molecular descriptors (MOE2D, CATS2D and Ghose-Crippen fragment descriptors) with the ISOAK graph kernel. We evaluated our methodology in different ranking and regression settings. Ranking performance was assessed using the number of false positives within the top k predicted compounds. Predicted compounds were ranked based on both predicted binding affinity and the confidence in each prediction. In the regression setting, we employed standard loss functions like mean absolute error (MEA) and root mean squared error. The established linear ridge regression (LRR) and support vector regression (SVR) algorithms served as baseline methods. In addition to standard test/training splits and cross-validation, we used a clustered cross-validation strategy where clusters of compounds are left out when constructing training sets. This results in less optimistic results, but has the advantage of favouring more robust and potentially extrapolation-capable algorithms than standard training/test splits and normal cross-validation. In the regression setting, both GP and SVR models performed well, yielding MAEs as low as 0.66 +- 0.08 log units (clustered CV) and 0.51 +- 0.3 log units (normal CV). In the ranking setting, GPs slightly outperform SVR (0.21 +- 0.09 log units vs. 0.3 +- 0.08 log units). In conclusion, Gaussian process regression using simultaneously – via multiple kernel learning – the ISOAK molecular graph kernel and the rational quadratic kernel (with standard molecular descriptors) performs excellent in retrospective evaluation. A prospective evaluation study is currently in progress.
- Identification of Plk1 type II inhibitors by structure-based virtual screening (2009)
- Protein kinases are targets for drug development . Dysregulation of kinase activity leads to various diseases , e.g. cancer, inflammation, diabetes . Human polo-like kinase 1 (Plk1), a serine/threonine kinase, is a cancer-relevant gene and a potential drug target which attracts increasing attention in the field of cancer therapy. Plk1 is a key player in mitosis and modulates entry into mitosis and the spindle checkpoint at the meta-/anaphase transition. Plk1 overexpression is observed in various human tumors, and it is a negative prognostic factor for cancer patients . The same catalytical mechanism and the same co-substrate (ATP) lead to the problem of inhibitor selectivity. A strategy to solve this problem is represented by targeting the inactive conformation of kinases . Kinases undergo conformational changes between active and inactive conformation and thus an additional hydrophobic pocket is created in the inactive conformation where the surrounding amino acids are less conserved . A "homology model" of the inactive conformation of Plk1 was constructed, as the crystal structure in its inactive conformation is unknown. A crystal structure of Aurora A kinase served as template structure. With this homology model a receptor-based pharmacophore search was performed using SYBYL7.3 software. The raw hits were filtered using physico-chemical properties. The resulting hits were docked using Gold3.2 software, and 13 candidates for biological testing were manually selected. Three compounds of the 13 tested exhibit anti-proliferative effects in HeLa cancer cells. The most potent inhibitor, SBE13, was further tested in various other cancer cell lines of different origins and displayed EC50 values between 12 microM and 39 microM. Cancer cells incubated with SBE13 showed induction of apoptosis, detected by PARP (Poly-Adenosyl-Ribose-Polymerase) cleavage, caspase 9 activation and DAPI staining of apoptotic nuclei.
- Fuzzy virtual ligands for virtual screening (2009)
- A new method to bridge the gap between ligand and receptor-based methods in virtual screening (VS) is presented. We introduce a structure-derived virtual ligand (VL) model as an extension to a previously published pseudo-ligand technique : LIQUID  fuzzy pharmacophore virtual screening is combined with grid-based protein binding site predictions of PocketPicker . This approach might help reduce bias introduced by manual selection of binding site residues and introduces pocket shape information to the VL. It allows for a combination of several protein structure models into a single "fuzzy" VL representation, which can be used to scan screening compound collections for ligand structures with a similar potential pharmacophore. PocketPicker employs an elaborate grid-based scanning procedure to determine buried cavities and depressions on the protein's surface. Potential binding sites are represented by clusters of grid probes characterizing the shape and accessibility of a cavity. A rule-based system is then applied to project reverse pharmacophore types onto the grid probes of a selected pocket. The pocket pharmacophore types are assigned depending on the properties and geometry of the protein residues surrounding the pocket with regard to their relative position towards the grid probes. LIQUID is used to cluster representative pocket probes by their pharmacophore types describing a fuzzy VL model. The VL is encoded in a correlation vector, which can then be compared to a database of pre-calculated ligand models. A retrospective screening using the fuzzy VL and several protein structures was evaluated by ten fold cross-validation with ROC-AUC and BEDROC metrics, obtaining a significant enrichment of actives. Future work will be devoted to prospective screening using a novel protein target of Helicobacter pylori and compounds from commercial providers.
- Domain organization of long autotransporter signal sequences (2009)
- Bacterial autotransporters represent a diverse family of proteins that autonomously translocate across the inner membrane of Gram-negative bacteria via the Sec complex and across the outer bacterial membrane. They often possess exceptionally long N-terminal signal sequences. We analyzed 90 long signal sequences of bacterial autotransporters and members of the two-partner secretion pathway in silico and describe common domain organization found in 79 of these sequences. The domains are in agreement with previously published experimental data. Our algorithmic approach allows for the systematic identification of functionally different domains in long signal sequences. Keywords: bacterial autotransporter, sequence analysis, pattern, protein targeting, signal peptide, protein trafficking
- Prediction of type III secretion signals in genomes of gram-negative bacteria (2009)
- Background: Pathogenic bacteria infecting both animals as well as plants use various mechanisms to transport virulence factors across their cell membranes and channel these proteins into the infected host cell. The type III secretion system represents such a mechanism. Proteins transported via this pathway (‘‘effector proteins’’) have to be distinguished from all other proteins that are not exported from the bacterial cell. Although a special targeting signal at the N-terminal end of effector proteins has been proposed in literature its exact characteristics remain unknown. Methodology/Principal Findings: In this study, we demonstrate that the signals encoded in the sequences of type III secretion system effectors can be consistently recognized and predicted by machine learning techniques. Known protein effectors were compiled from the literature and sequence databases, and served as training data for artificial neural networks and support vector machine classifiers. Common sequence features were most pronounced in the first 30 amino acids of the effector sequences. Classification accuracy yielded a cross-validated Matthews correlation of 0.63 and allowed for genome-wide prediction of potential type III secretion system effectors in 705 proteobacterial genomes (12% predicted candidates protein), their chromosomes (11%) and plasmids (13%), as well as 213 Firmicute genomes (7%). Conclusions/Significance: We present a signal prediction method together with comprehensive survey of potential type III secretion system effectors extracted from 918 published bacterial genomes. Our study demonstrates that the analyzed signal features are common across a wide range of species, and provides a substantial basis for the identification of exported pathogenic proteins as targets for future therapeutic intervention. The prediction software is publicly accessible from our web server ( www.modlab.org ).