- Text-based similarity searching for hit- and lead-candidate identification (2010)
- This work investigated the applicability of global pairwise sequence alignment to the detection of functional analogues in virtual screening. This variant of sequence comparison was developed for the identification of homologue proteins based on amino acid or nucleotide sequences. Because of the significant differences between biopolymers and small molecules several aspects of this approach for sequence comparison had to be adapted. All proposed concepts were implemented as the ‘Pharmacophore Alignment Search Tool’ (PhAST) and evaluated in retrospective experiments on the COBRA dataset in version 6.1. The aim to identify functional analogues raised the necessity for identification and classification of functional properties in molecular structures. This was realized by fragment-based atom-typing, where one out of nine functional properties was assigned to each non-hydrogen atom in a structure. These properties were pre-assigned to atoms in the fragments. Whenever a fragment matched a substructure in a molecule, the assigned properties were transferred from fragment atoms to structure atoms. Each functional property was represented by exactly one symbol. Unlike amino acid or nucleotide sequences, small drug-like molecules contain branches and cycles. This was a major obstacle in the application of sequence alignment to virtual screening, since this technique can only be applied to linear sequences of symbols. The best linearization technique was shown to be Minimum Volume Embedding. To the best of knowledge, this work represents the first application of dimensionality reduction to graph linearization. Sequence alignment relies on a scoring system that rates symbol equivalences (matches) and differences (mismatches) based on functional properties that correspond to rated symbols. Existing scoring schemes are applicable only to amino acids and nucleotides. In this work, scoring schemes for functional properties in drug-like molecules were developed based on property frequencies and isofunctionality judged from chemical experience, pairwise sequence alignments, pairwise kernel-based assignments and stochastic optimization. The scoring system based on property frequencies and isofunctionality proved to be the most powerful (measured in enrichment capability). All developed scoring systems performed superior compared to simple scoring approaches that rate matches and mismatches uniformly. The frameworks proposed for score calculations can be used to guide modifications to the atom-typing in promising directions. The scoring system was further modified to allow for emphasis on particular symbols in a sequence. It was proven that the application of weights to symbols that correspond to key interaction points important to receptor-ligand-interaction significantly improves screening capabilities of PhAST. It was demonstrated that the systematic application of weights to all sequence positions in retrospective experiments can be used for pharmacophore elucidation. A scoring system based on structural instead of functional similarity was investigated and found to be suitable for similarity searches in shape-constrained datasets. Three methods for similarity assessment based on alignments were evaluated: Sequence identity, alignment score and significance. PhAST achieved significantly higher enrichment with alignment scores compared to sequence identity. p-values as significance estimates were calculated in a combination of Marcov Chain Monte Carlo Simulation and Importance Sampling. p-values were adapted to library size in a Bonferroni correction, yielding E-values. A significance threshold of an E-value of 1*10-5 was proposed for the application in prospective screenings. PhAST was compared to state-of-the-art methods for virtual screening. The unweighted version was shown to exhibit comparable enrichment capabilities. Compound rankings obtained with PhAST were proven to be complementary to those of other methods. The application to three-dimensional instead of two-dimensional molecular representations resulted in altered compound rankings without increased enrichment. PhAST was employed in two prospective applications. A screening for non-nucleoside analogue inhibitors of bacterial thymidin kinase yielded a hit with a distinct structural framework but only weak activity. The search for drugs not member of the NSAID (non-steroidal anti-inflammatory drug) class as modulators of gamma-secretase resulted in a potent modulator with clear structural distiction from the reference compound. The calculation of significance estimates, emphasizing on key interactions, the pharmacophore elucidation capabilities and the unique compound rannkings set PhAST apart from other screening techniques.
- PhAST : pharmacophore alignment search tool (2009)
- We developed the Pharmacophore Alignment Search Tool (PhAST), a text-based technique for rapid hit and lead structure searching in large compound databases. For each molecule, a two-dimensional graph of potential pharmacophoric points (PPPs) is created, which has an identical topology as the original molecule with implicit hydrogen atoms. Each vertex is coloured by a symbol representing the corresponding PPP. The vertices of the graph are canonically labelled . The symbols associated with the vertices are combined to a so-called PhAST-Sequence beginning with the vertex with the lowest canonical label. Due to the canonical labelling the created PhAST-Sequence is characteristic for each molecule. For similarity assessment, PhAST-Sequences are compared using the sequence identity in their global pairwise alignment . The alignment score lies between 0 (no similarity) and 1 (identical PhAST-Sequences). In order to use global pairwise sequence alignment, a score matrix for pharmacophoric symbols was developed and gap penalties were optimized. PhAST performed comparably and sometimes superior to other similarity search tools (CATS2D , MOE pharmacophore quadruples ) in retrospective virtual screenings using the COBRA  collection of drugs and lead structures. Most importantly, the PhAST alignment technique allows for the computation of significance estimates that help prioritize a virtual hit list.