Refine
Document Type
- Doctoral Thesis (2)
- Working Paper (2)
- Article (1)
Has Fulltext
- yes (5) (remove)
Is part of the Bibliography
- no (5) (remove)
Keywords
- Hauptkomponentenanalyse (5) (remove)
Institute
We investigate the utility of modern kernel-based machine learning methods for ligand-based virtual screening. In particular, we introduce a new graph kernel based on iterative graph similarity and optimal assignments, apply kernel principle component analysis to projection error-based novelty detection, and discover a new selective agonist of the peroxisome proliferator-activated receptor gamma using Gaussian process regression. Virtual screening, the computational ranking of compounds with respect to a predicted property, is a cheminformatics problem relevant to the hit generation phase of drug development. Its ligand-based variant relies on the similarity principle, which states that (structurally) similar compounds tend to have similar properties. We describe the kernel-based machine learning approach to ligand-based virtual screening; in this, we stress the role of molecular representations, including the (dis)similarity measures defined on them, investigate effects in high-dimensional chemical descriptor spaces and their consequences for similarity-based approaches, review literature recommendations on retrospective virtual screening, and present an example workflow. Graph kernels are formal similarity measures that are defined directly on graphs, such as the annotated molecular structure graph, and correspond to inner products. We review graph kernels, in particular those based on random walks, subgraphs, and optimal vertex assignments. Combining the latter with an iterative graph similarity scheme, we develop the iterative similarity optimal assignment graph kernel, give an iterative algorithm for its computation, prove convergence of the algorithm and the uniqueness of the solution, and provide an upper bound on the number of iterations necessary to achieve a desired precision. In a retrospective virtual screening study, our kernel consistently improved performance over chemical descriptors as well as other optimal assignment graph kernels. Chemical data sets often lie on manifolds of lower dimensionality than the embedding chemical descriptor space. Dimensionality reduction methods try to identify these manifolds, effectively providing descriptive models of the data. For spectral methods based on kernel principle component analysis, the projection error is a quantitative measure of how well new samples are described by such models. This can be used for the identification of compounds structurally dissimilar to the training samples, leading to projection error-based novelty detection for virtual screening using only positive samples. We provide proof of principle by using principle component analysis to learn the concept of fatty acids. The peroxisome proliferator-activated receptor (PPAR) is a nuclear transcription factor that regulates lipid and glucose metabolism, playing a crucial role in the development of type 2 diabetes and dyslipidemia. We establish a Gaussian process regression model for PPAR gamma agonists using a combination of chemical descriptors and the iterative similarity optimal assignment kernel via multiple kernel learning. Screening of a vendor library and subsequent testing of 15 selected compounds in a cell-based transactivation assay resulted in 4 active compounds. One compound, a natural product with cyclobutane scaffold, is a full selective PPAR gamma agonist (EC50 = 10 +/- 0.2 muM, inactive on PPAR alpha and PPAR beta/delta at 10 muM). The study delivered a novel PPAR gamma agonist, de-orphanized a natural bioactive product, and, hints at the natural product origins of pharmacophore patterns in synthetic ligands.
Diese praxisbezogene Einführung stellt Möglichkeiten und Grenzen des Einsatzes multivariater statistischer Verfahren in der Feldornithologie vor. Hauptkomponentenanalyse, Diskriminanzanalyse und Clusteranalyse gehören zu den wichtigsten multivariaten Verfahren in der ökologischen Forschung. Dieser Artikel liefert die theoretischen Grundlagen und ist gleichzeitig eine Orientierungshilfe für die Anwendung dieser Verfahren. Außerdem werden für jedes Verfahren Indikatoren für die Qualität der Analyse sowie Möglichkeiten der Interpretation diskutiert und anhand eines Fallbeispiels demonstriert.
Classically, encoding of images by only a few, important components is done by the Principal Component Analysis (PCA). Recently, a data analysis tool called Independent Component Analysis (ICA) for the separation of independent influences in signals has found strong interest in the neural network community. This approach has also been applied to images. Whereas the approach assumes continuous source channels mixed up to the same number of channels by a mixing matrix, we assume that images are composed by only a few image primitives. This means that for images we have less sources than pixels. Additionally, in order to reduce unimportant information, we aim only for the most important source patterns with the highest occurrence probabilities or biggest information called „Principal Independent Components (PIC)“. For the example of a synthetic picture composed by characters this idea gives us the most important ones. Nevertheless, for natural images where no a-priori probabilities can be computed this does not lead to an acceptable reproduction error. Combining the traditional principal component criteria of PCA with the independence property of ICA we obtain a better encoding. It turns out that this definition of PIC implements the classical demand of Shannon’s rate distortion theory.
This work presents a contribution to the literature on methods in search of lowdimensional models that yield insight into the equilibrium and kinetic behavior of peptides and small proteins. A deep understanding of various methods for projecting the sampled configurations of molecular dynamics simulations to obtain a low-dimensional free energy landscape is acquired. Furthermore low-dimensional dynamic models for the conformational dynamics of biomolecules in reduced dimensionality are presented. As exemplary systems, mainly short alanine chains are studied. Due to their size they allow for performing long simulations. They are simple, yet nontrivial systems, as due to their flexibility they are rapidly interconverting conformers. Understanding these polypeptide chains in great detail is of considerable interest for getting insight in the process of protein folding. For example, K. Dill et al. conclude in their review [28] about the protein folding problem that "the once intractable Levinthal puzzle now seems to have a very simple answer: a protein can fold quickly and solve its large global optimization puzzle simply through piecewise solutions of smaller component puzzles".
Gradient capital allocation, also known as Euler allocation, is a technique used to redistribute diversified capital requirements among different segments of a portfolio. The method is commonly employed to identify dominant risks, assessing the risk-adjusted profitability of segments, and installing limit systems. However, capital allocation can be misleading in all these applications because it only accounts for the current portfolio composition and ignores how diversification effects may change with a portfolio restructuring. This paper proposes enhancing the gradient capital allocation by adding “orthogonal convexity scenarios” (OCS). OCS identify risk concentrations that potentially drive portfolio risk and become relevant after restructuring. OCS have strong ties with principal component analysis (PCA), but they are a more general concept and compatible with common empirical patterns of risk drivers being fat-tailed and increasingly dependent in market downturns. We illustrate possible applications of OCS in terms of risk communication and risk limits.