- SQUIRRELnovo : de novo design of a PPARalpha agonist by bioisosteric replacement (2009)
- Shape complementarity is a compulsory condition for molecular recognition . In our 3D ligand-based virtual screening approach called SQUIRREL, we combine shape-based rigid body alignment  with fuzzy pharmacophore scoring . Retrospective validation studies demonstrate the superiority of methods which combine both shape and pharmacophore information on the family of peroxisome proliferator-activated receptors (PPARs). We demonstrate the real-life applicability of SQUIRREL by a prospective virtual screening study, where a potent PPARalpha agonist with an EC50 of 44 nM and 100-fold selectivity against PPARgamma has been identified. SQUIRREL molecular superposition is based on a graph-matching routine  and allows partial matching. We used this advantage for searching for bioisosteric replacement suggestions in a database of molecular fragments derived from a collection of drug-like compounds . The bioisosteric groups suggested by our tool SQURRELnovo, can be used for ligand-based de novo design by a human expert. Using the fibrate derivative GW590735  as query, we designed a novel lead structure by substitution of the acidic head group and hydrophobic tail. The synthesis and following testing in a cell-based reporter gene assay [7,8] revealed that the designed structure activates PPARalpha with an EC50 of 510 nM.
- Inhibitors of Helicobacter pylori protease HtrA found by "virtual ligand" screening combat bacterial invasion of epithelia (2011)
- Background: The human pathogen Helicobacter pylori (H. pylori) is a main cause for gastric inflammation and cancer. Increasing bacterial resistance against antibiotics demands for innovative strategies for therapeutic intervention. Methodology/Principal Findings: We present a method for structure-based virtual screening that is based on the comprehensive prediction of ligand binding sites on a protein model and automated construction of a ligand-receptor interaction map. Pharmacophoric features of the map are clustered and transformed in a correlation vector (‘virtual ligand’) for rapid virtual screening of compound databases. This computer-based technique was validated for 18 different targets of pharmaceutical interest in a retrospective screening experiment. Prospective screening for inhibitory agents was performed for the protease HtrA from the human pathogen H. pylori using a homology model of the target protein. Among 22 tested compounds six block E-cadherin cleavage by HtrA in vitro and result in reduced scattering and wound healing of gastric epithelial cells, thereby preventing bacterial infiltration of the epithelium. Conclusions/Significance: This study demonstrates that receptor-based virtual screening with a permissive (‘fuzzy’) pharmacophore model can help identify small bioactive agents for combating bacterial infection.
- Spherical harmonics coeffcients for ligand-based virtual screening of cyclooxygenase inhibitors (2011)
- Background: Molecular descriptors are essential for many applications in computational chemistry, such as ligand-based similarity searching. Spherical harmonics have previously been suggested as comprehensive descriptors of molecular structure and properties. We investigate a spherical harmonics descriptor for shape-based virtual screening. Methodology/Principal Findings: We introduce and validate a partially rotation-invariant three-dimensional molecular shape descriptor based on the norm of spherical harmonics expansion coefficients. Using this molecular representation, we parameterize molecular surfaces, i.e., isosurfaces of spatial molecular property distributions. We validate the shape descriptor in a comprehensive retrospective virtual screening experiment. In a prospective study, we virtually screen a large compound library for cyclooxygenase inhibitors, using a self-organizing map as a pre-filter and the shape descriptor for candidate prioritization. Conclusions/Significance: 12 compounds were tested in vitro for direct enzyme inhibition and in a whole blood assay. Active compounds containing a triazole scaffold were identified as direct cyclooxygenase-1 inhibitors. This outcome corroborates the usefulness of spherical harmonics for representation of molecular shape in virtual screening of large compound collections. The combination of pharmacophore and shape-based filtering of screening candidates proved to be a straightforward approach to finding novel bioactive chemotypes with minimal experimental effort.
- Molecular similarity for machine learning in drug development : poster presentation (2008)
- Poster presentation In pharmaceutical research and drug development, machine learning methods play an important role in virtual screening and ADME/Tox prediction. For the application of such methods, a formal measure of similarity between molecules is essential. Such a measure, in turn, depends on the underlying molecular representation. Input samples have traditionally been modeled as vectors. Consequently, molecules are represented to machine learning algorithms in a vectorized form using molecular descriptors. While this approach is straightforward, it has its shortcomings. Amongst others, the interpretation of the learned model can be difficult, e.g. when using fingerprints or hashing. Structured representations of the input constitute an alternative to vector based representations, a trend in machine learning over the last years. For molecules, there is a rich choice of such representations. Popular examples include the molecular graph, molecular shape and the electrostatic field. We have developed a molecular similarity measure defined directly on the (annotated) molecular graph, a long-standing established topological model for molecules. It is based on the concepts of optimal atom assignments and iterative graph similarity. In the latter, two atoms are considered similar if their neighbors are similar. This recursive definition leads to a non-linear system of equations. We show how to iteratively solve these equations and give bounds on the computational complexity of the procedure. Advantages of our similarity measure include interpretability (atoms of two molecules are assigned to each other, each pair with a score expressing local similarity; this can be visualized to show similar regions of two molecules and the degree of their similarity) and the possibility to introduce knowledge about the target where available. We retrospectively tested our similarity measure using support vector machines for virtual screening on several pharmaceutical and toxicological datasets, with encouraging results. Prospective studies are under way.
- Ideenschmiede mit Praxisbezug : fünf Jahre Beilstein-Stiftungsprofessur für Chemieinformatik (2007)
- Eine Stiftungsprofessur ermöglicht die konzentrierte Forschung auf einem speziellen Fachgebiet und schafft den notwendigen Freiraum, Neues zu erproben. Insbesondere kann sie dazu dienen, Brücken zwischen Disziplinen zu errichten. Mit diesem Ziel wurde vor fünf Jahren die Beilstein-Stiftungsprofessur für Chemieinformatik an der Johann Wolfgang Goethe-Universität eingerichtet. Gefördert von dem in Frankfurt am Main ansässigen Beilstein-Institut zur Förderung der Chemischen Wissenschaften, wurde sie in enger Zusammenarbeit mit dem Institut für Organische Chemie und Chemische Biologie unter der Federführung von Prof. Dr. Michael Göbel konzipiert. Nachdem die Förderperiode von fünf Jahren im März 2007 ausgelaufen war, ist die Stiftungsprofessur nahtlos in den ordentlichen Universitätsbetrieb übernommen worden. Dies gibt Anlass, ein Fazit zu ziehen.
- Kernel learning for ligand-based virtual screening:discovery of a new PPARgamma agonist (2010)
- Poster presentation at 5th German Conference on Cheminformatics: 23. CIC-Workshop Goslar, Germany. 8-10 November 2009 We demonstrate the theoretical and practical application of modern kernel-based machine learning methods to ligand-based virtual screening by successful prospective screening for novel agonists of the peroxisome proliferator-activated receptor gamma (PPARgamma) . PPARgamma is a nuclear receptor involved in lipid and glucose metabolism, and related to type-2 diabetes and dyslipidemia. Applied methods included a graph kernel designed for molecular similarity analysis , kernel principle component analysis , multiple kernel learning , and, Gaussian process regression . In the machine learning approach to ligand-based virtual screening, one uses the similarity principle  to identify potentially active compounds based on their similarity to known reference ligands. Kernel-based machine learning  uses the "kernel trick", a systematic approach to the derivation of non-linear versions of linear algorithms like separating hyperplanes and regression. Prerequisites for kernel learning are similarity measures with the mathematical property of positive semidefiniteness (kernels). The iterative similarity optimal assignment graph kernel (ISOAK)  is defined directly on the annotated structure graph, and was designed specifically for the comparison of small molecules. In our virtual screening study, its use improved results, e.g., in principle component analysis-based visualization and Gaussian process regression. Following a thorough retrospective validation using a data set of 176 published PPARgamma agonists , we screened a vendor library for novel agonists. Subsequent testing of 15 compounds in a cell-based transactivation assay  yielded four active compounds. The most interesting hit, a natural product derivative with cyclobutane scaffold, is a full selective PPARgamma agonist (EC50 = 10 ± 0.2 microM, inactive on PPARalpha and PPARbeta/delta at 10 microM). We demonstrate how the interplay of several modern kernel-based machine learning approaches can successfully improve ligand-based virtual screening results.
- Optimized Particle Swarm Optimization (OPSO) and its application to artificial neural network training (2006)
- Background: Particle Swarm Optimization (PSO) is an established method for parameter optimization. It represents a population-based adaptive optimization technique that is influenced by several "strategy parameters". Choosing reasonable parameter values for the PSO is crucial for its convergence behavior, and depends on the optimization task. We present a method for parameter meta-optimization based on PSO and its application to neural network training. The concept of the Optimized Particle Swarm Optimization (OPSO) is to optimize the free parameters of the PSO by having swarms within a swarm. We assessed the performance of the OPSO method on a set of five artificial fitness functions and compared it to the performance of two popular PSO implementations. Results: Our results indicate that PSO performance can be improved if meta-optimized parameter sets are applied. In addition, we could improve optimization speed and quality on the other PSO methods in the majority of our experiments. We applied the OPSO method to neural network training with the aim to build a quantitative model for predicting blood-brain barrier permeation of small organic molecules. On average, training time decreased by a factor of four and two in comparison to the other PSO methods, respectively. By applying the OPSO method, a prediction model showing good correlation with training-, test- and validation data was obtained. Conclusion: Optimizing the free parameters of the PSO method can result in performance gain. The OPSO approach yields parameter combinations improving overall optimization performance. Its conceptual simplicity makes implementing the method a straightforward task.
- Prediction of extracellular proteases of the human pathogen Helicobacter pylori reveals proteolytic activity of the Hp1018/19 protein HtrA (2008)
- Exported proteases of Helicobacter pylori (H. pylori) are potentially involved in pathogen-associated disorders leading to gastric inflammation and neoplasia. By comprehensive sequence screening of the H. pylori proteome for predicted secreted proteases, we retrieved several candidate genes. We detected caseinolytic activities of several such proteases, which are released independently from the H. pylori type IV secretion system encoded by the cag pathogenicity island (cagPAI). Among these, we found the predicted serine protease HtrA (Hp1019), which was previously identified in the bacterial secretome of H. pylori. Importantly, we further found that the H. pylori genes hp1018 and hp1019 represent a single gene likely coding for an exported protein. Here, we directly verified proteolytic activity of HtrA in vitro and identified the HtrA protease in zymograms by mass spectrometry. Overexpressed and purified HtrA exhibited pronounced proteolytic activity, which is inactivated after mutation of Ser205 to alanine in the predicted active center of HtrA. These data demonstrate that H. pylori secretes HtrA as an active protease, which might represent a novel candidate target for therapeutic intervention strategies.
- Domain organization of long signal peptides of single-pass integral membrane proteins reveals multiple functional capacity (2008)
- Targeting signals direct proteins to their extra- or intracellular destination such as the plasma membrane or cellular organelles. Here we investigated the structure and function of exceptionally long signal peptides encompassing at least 40 amino acid residues. We discovered a two-domain organization ("NtraC model") in many long signals from vertebrate precursor proteins. Accordingly, long signal peptides may contain an N-terminal domain (N-domain) and a C-terminal domain (C-domain) with different signal or targeting capabilities, separable by a presumably turn-rich transition area (tra). Individual domain functions were probed by cellular targeting experiments with fusion proteins containing parts of the long signal peptide of human membrane protein shrew-1 and secreted alkaline phosphatase as a reporter protein. As predicted, the N-domain of the fusion protein alone was shown to act as a mitochondrial targeting signal, whereas the C-domain alone functions as an export signal. Selective disruption of the transition area in the signal peptide impairs the export efficiency of the reporter protein. Altogether, the results of cellular targeting studies provide a proof-of-principle for our NtraC model and highlight the particular functional importance of the predicted transition area, which critically affects the rate of protein export. In conclusion, the NtraC approach enables the systematic detection and prediction of cryptic targeting signals present in one coherent sequence, and provides a structurally motivated basis for decoding the functional complexity of long protein targeting signals.