- Spherical harmonics coeffcients for ligand-based virtual screening of cyclooxygenase inhibitors (2011)
- Background: Molecular descriptors are essential for many applications in computational chemistry, such as ligand-based similarity searching. Spherical harmonics have previously been suggested as comprehensive descriptors of molecular structure and properties. We investigate a spherical harmonics descriptor for shape-based virtual screening. Methodology/Principal Findings: We introduce and validate a partially rotation-invariant three-dimensional molecular shape descriptor based on the norm of spherical harmonics expansion coefficients. Using this molecular representation, we parameterize molecular surfaces, i.e., isosurfaces of spatial molecular property distributions. We validate the shape descriptor in a comprehensive retrospective virtual screening experiment. In a prospective study, we virtually screen a large compound library for cyclooxygenase inhibitors, using a self-organizing map as a pre-filter and the shape descriptor for candidate prioritization. Conclusions/Significance: 12 compounds were tested in vitro for direct enzyme inhibition and in a whole blood assay. Active compounds containing a triazole scaffold were identified as direct cyclooxygenase-1 inhibitors. This outcome corroborates the usefulness of spherical harmonics for representation of molecular shape in virtual screening of large compound collections. The combination of pharmacophore and shape-based filtering of screening candidates proved to be a straightforward approach to finding novel bioactive chemotypes with minimal experimental effort.
- Bioassays to monitor taspase1 function for the identification of pharmacogenetic inhibitors (2011)
- Background: Threonine Aspartase 1 (Taspase1) mediates cleavage of the mixed lineage leukemia (MLL) protein and leukemia provoking MLL-fusions. In contrast to other proteases, the understanding of Taspase1's (patho)biological relevance and function is limited, since neither small molecule inhibitors nor cell based functional assays for Taspase1 are currently available. Methodology/Findings: Efficient cell-based assays to probe Taspase1 function in vivo are presented here. These are composed of glutathione S-transferase, autofluorescent protein variants, Taspase1 cleavage sites and rational combinations of nuclear import and export signals. The biosensors localize predominantly to the cytoplasm, whereas expression of biologically active Taspase1 but not of inactive Taspase1 mutants or of the protease Caspase3 triggers their proteolytic cleavage and nuclear accumulation. Compared to in vitro assays using recombinant components the in vivo assay was highly efficient. Employing an optimized nuclear translocation algorithm, the triple-color assay could be adapted to a high-throughput microscopy platform (Z'factor = 0.63). Automated high-content data analysis was used to screen a focused compound library, selected by an in silico pharmacophor screening approach, as well as a collection of fungal extracts. Screening identified two compounds, N-[2-[(4-amino-6-oxo-3H-pyrimidin-2-yl)sulfanyl]ethyl]benzenesulfonamideand 2-benzyltriazole-4,5-dicarboxylic acid, which partially inhibited Taspase1 cleavage in living cells. Additionally, the assay was exploited to probe endogenous Taspase1 in solid tumor cell models and to identify an improved consensus sequence for efficient Taspase1 cleavage. This allowed the in silico identification of novel putative Taspase1 targets. Those include the FERM Domain-Containing Protein 4B, the Tyrosine-Protein Phosphatase Zeta, and DNA Polymerase Zeta. Cleavage site recognition and proteolytic processing of these substrates were verified in the context of the biosensor. Conclusions: The assay not only allows to genetically probe Taspase1 structure function in vivo, but is also applicable for high-content screening to identify Taspase1 inhibitors. Such tools will provide novel insights into Taspase1's function and its potential therapeutic relevance.
- DOGS: reaction-driven de novo design of bioactive compounds (2012)
- We present a computational method for the reaction-based de novo design of drug-like molecules. The software DOGS (Design of Genuine Structures) features a ligand-based strategy for automated ‘in silico’ assembly of potentially novel bioactive compounds. The quality of the designed compounds is assessed by a graph kernel method measuring their similarity to known bioactive reference ligands in terms of structural and pharmacophoric features. We implemented a deterministic compound construction procedure that explicitly considers compound synthesizability, based on a compilation of 25'144 readily available synthetic building blocks and 58 established reaction principles. This enables the software to suggest a synthesis route for each designed compound. Two prospective case studies are presented together with details on the algorithm and its implementation. De novo designed ligand candidates for the human histamine H4 receptor and γ-secretase were synthesized as suggested by the software. The computational approach proved to be suitable for scaffold-hopping from known ligands to novel chemotypes, and for generating bioactive molecules with drug-like properties.
- Unterwegs in chemischen Räumen : Chemieinformatik und Moleküldesign (2003)
- Wie findet man einen neuen Wirkstoff? Die pharmazeutisch-chemische Forschung steht mit diesem Vorhaben vor einer scheinbar unlösbaren Aufgabe, denn der "chemische Raum" aller wirkstoffartigen Moleküle ist unvorstellbar groß. So wurde geschätzt, dass man prinzipiell aus 1060 bis 10100 verschiedenen Verbindungen die geeigneten Kandidaten auswählen kann. Zum Vergleich: Seit dem Urknall sollen "nur" etwa 10 hoch 18 Sekunden, etwa 14 Milliarden Jahre, vergangen sein. Dies bedeutet, dass der chemische Raum praktisch unendlich ist. Aus dieser Überlegung lassen sich zumindest zwei Schlussfolgerungen ziehen: Zum einen gibt es die begründete Hoffnung, dass ein Molekül mit der gewünschten Aktivität existiert, zum anderen stellt sich die Frage, wie diese unvorstellbar große Zahl chemischer Verbindungen systematisch durchmustert werden kann? Doch die Situation ist nicht so hoffnungslos, wie sie auf den ersten Blick erscheint. Dies zeigt die erfolgreiche Entwicklung immer neuer Medikamente. Das Forschungsgebiet der Chemieinformatik befasst sich mit der Entwicklung von intelligenten Lösungsansätzen, die Chemikern bei dieser Suche nach den "Nadeln im riesigen Heuhaufen" helfen können.
- Correction: Prediction of type III secretion signals in genomes of gram-negative bacteria (2009)
- This corrects the article "Prediction of Type III Secretion Signals in Genomes of Gram-Negative Bacteria" in PLoS ONE, e5917. urn:nbn:de:hebis:30-82663 A file was unintentionally omitted from the Supporting Information section of the published article: "Text S1. Training data." The file can be viewed here.
- Molecular similarity for machine learning in drug development : poster presentation (2008)
- Poster presentation In pharmaceutical research and drug development, machine learning methods play an important role in virtual screening and ADME/Tox prediction. For the application of such methods, a formal measure of similarity between molecules is essential. Such a measure, in turn, depends on the underlying molecular representation. Input samples have traditionally been modeled as vectors. Consequently, molecules are represented to machine learning algorithms in a vectorized form using molecular descriptors. While this approach is straightforward, it has its shortcomings. Amongst others, the interpretation of the learned model can be difficult, e.g. when using fingerprints or hashing. Structured representations of the input constitute an alternative to vector based representations, a trend in machine learning over the last years. For molecules, there is a rich choice of such representations. Popular examples include the molecular graph, molecular shape and the electrostatic field. We have developed a molecular similarity measure defined directly on the (annotated) molecular graph, a long-standing established topological model for molecules. It is based on the concepts of optimal atom assignments and iterative graph similarity. In the latter, two atoms are considered similar if their neighbors are similar. This recursive definition leads to a non-linear system of equations. We show how to iteratively solve these equations and give bounds on the computational complexity of the procedure. Advantages of our similarity measure include interpretability (atoms of two molecules are assigned to each other, each pair with a score expressing local similarity; this can be visualized to show similar regions of two molecules and the degree of their similarity) and the possibility to introduce knowledge about the target where available. We retrospectively tested our similarity measure using support vector machines for virtual screening on several pharmaceutical and toxicological datasets, with encouraging results. Prospective studies are under way.
- Ideenschmiede mit Praxisbezug : fünf Jahre Beilstein-Stiftungsprofessur für Chemieinformatik (2007)
- Eine Stiftungsprofessur ermöglicht die konzentrierte Forschung auf einem speziellen Fachgebiet und schafft den notwendigen Freiraum, Neues zu erproben. Insbesondere kann sie dazu dienen, Brücken zwischen Disziplinen zu errichten. Mit diesem Ziel wurde vor fünf Jahren die Beilstein-Stiftungsprofessur für Chemieinformatik an der Johann Wolfgang Goethe-Universität eingerichtet. Gefördert von dem in Frankfurt am Main ansässigen Beilstein-Institut zur Förderung der Chemischen Wissenschaften, wurde sie in enger Zusammenarbeit mit dem Institut für Organische Chemie und Chemische Biologie unter der Federführung von Prof. Dr. Michael Göbel konzipiert. Nachdem die Förderperiode von fünf Jahren im März 2007 ausgelaufen war, ist die Stiftungsprofessur nahtlos in den ordentlichen Universitätsbetrieb übernommen worden. Dies gibt Anlass, ein Fazit zu ziehen.
- Kernel learning for ligand-based virtual screening:discovery of a new PPARgamma agonist (2010)
- Poster presentation at 5th German Conference on Cheminformatics: 23. CIC-Workshop Goslar, Germany. 8-10 November 2009 We demonstrate the theoretical and practical application of modern kernel-based machine learning methods to ligand-based virtual screening by successful prospective screening for novel agonists of the peroxisome proliferator-activated receptor gamma (PPARgamma) . PPARgamma is a nuclear receptor involved in lipid and glucose metabolism, and related to type-2 diabetes and dyslipidemia. Applied methods included a graph kernel designed for molecular similarity analysis , kernel principle component analysis , multiple kernel learning , and, Gaussian process regression . In the machine learning approach to ligand-based virtual screening, one uses the similarity principle  to identify potentially active compounds based on their similarity to known reference ligands. Kernel-based machine learning  uses the "kernel trick", a systematic approach to the derivation of non-linear versions of linear algorithms like separating hyperplanes and regression. Prerequisites for kernel learning are similarity measures with the mathematical property of positive semidefiniteness (kernels). The iterative similarity optimal assignment graph kernel (ISOAK)  is defined directly on the annotated structure graph, and was designed specifically for the comparison of small molecules. In our virtual screening study, its use improved results, e.g., in principle component analysis-based visualization and Gaussian process regression. Following a thorough retrospective validation using a data set of 176 published PPARgamma agonists , we screened a vendor library for novel agonists. Subsequent testing of 15 compounds in a cell-based transactivation assay  yielded four active compounds. The most interesting hit, a natural product derivative with cyclobutane scaffold, is a full selective PPARgamma agonist (EC50 = 10 ± 0.2 microM, inactive on PPARalpha and PPARbeta/delta at 10 microM). We demonstrate how the interplay of several modern kernel-based machine learning approaches can successfully improve ligand-based virtual screening results.
- PhAST : pharmacophore alignment search tool (2009)
- We developed the Pharmacophore Alignment Search Tool (PhAST), a text-based technique for rapid hit and lead structure searching in large compound databases. For each molecule, a two-dimensional graph of potential pharmacophoric points (PPPs) is created, which has an identical topology as the original molecule with implicit hydrogen atoms. Each vertex is coloured by a symbol representing the corresponding PPP. The vertices of the graph are canonically labelled . The symbols associated with the vertices are combined to a so-called PhAST-Sequence beginning with the vertex with the lowest canonical label. Due to the canonical labelling the created PhAST-Sequence is characteristic for each molecule. For similarity assessment, PhAST-Sequences are compared using the sequence identity in their global pairwise alignment . The alignment score lies between 0 (no similarity) and 1 (identical PhAST-Sequences). In order to use global pairwise sequence alignment, a score matrix for pharmacophoric symbols was developed and gap penalties were optimized. PhAST performed comparably and sometimes superior to other similarity search tools (CATS2D , MOE pharmacophore quadruples ) in retrospective virtual screenings using the COBRA  collection of drugs and lead structures. Most importantly, the PhAST alignment technique allows for the computation of significance estimates that help prioritize a virtual hit list.