Refine
Document Type
- Doctoral Thesis (1)
- Working Paper (1)
Language
- English (2) (remove)
Has Fulltext
- yes (2)
Is part of the Bibliography
- no (2)
Keywords
- Bayes-Lernen (2) (remove)
Institute
We investigate the utility of modern kernel-based machine learning methods for ligand-based virtual screening. In particular, we introduce a new graph kernel based on iterative graph similarity and optimal assignments, apply kernel principle component analysis to projection error-based novelty detection, and discover a new selective agonist of the peroxisome proliferator-activated receptor gamma using Gaussian process regression. Virtual screening, the computational ranking of compounds with respect to a predicted property, is a cheminformatics problem relevant to the hit generation phase of drug development. Its ligand-based variant relies on the similarity principle, which states that (structurally) similar compounds tend to have similar properties. We describe the kernel-based machine learning approach to ligand-based virtual screening; in this, we stress the role of molecular representations, including the (dis)similarity measures defined on them, investigate effects in high-dimensional chemical descriptor spaces and their consequences for similarity-based approaches, review literature recommendations on retrospective virtual screening, and present an example workflow. Graph kernels are formal similarity measures that are defined directly on graphs, such as the annotated molecular structure graph, and correspond to inner products. We review graph kernels, in particular those based on random walks, subgraphs, and optimal vertex assignments. Combining the latter with an iterative graph similarity scheme, we develop the iterative similarity optimal assignment graph kernel, give an iterative algorithm for its computation, prove convergence of the algorithm and the uniqueness of the solution, and provide an upper bound on the number of iterations necessary to achieve a desired precision. In a retrospective virtual screening study, our kernel consistently improved performance over chemical descriptors as well as other optimal assignment graph kernels. Chemical data sets often lie on manifolds of lower dimensionality than the embedding chemical descriptor space. Dimensionality reduction methods try to identify these manifolds, effectively providing descriptive models of the data. For spectral methods based on kernel principle component analysis, the projection error is a quantitative measure of how well new samples are described by such models. This can be used for the identification of compounds structurally dissimilar to the training samples, leading to projection error-based novelty detection for virtual screening using only positive samples. We provide proof of principle by using principle component analysis to learn the concept of fatty acids. The peroxisome proliferator-activated receptor (PPAR) is a nuclear transcription factor that regulates lipid and glucose metabolism, playing a crucial role in the development of type 2 diabetes and dyslipidemia. We establish a Gaussian process regression model for PPAR gamma agonists using a combination of chemical descriptors and the iterative similarity optimal assignment kernel via multiple kernel learning. Screening of a vendor library and subsequent testing of 15 selected compounds in a cell-based transactivation assay resulted in 4 active compounds. One compound, a natural product with cyclobutane scaffold, is a full selective PPAR gamma agonist (EC50 = 10 +/- 0.2 muM, inactive on PPAR alpha and PPAR beta/delta at 10 muM). The study delivered a novel PPAR gamma agonist, de-orphanized a natural bioactive product, and, hints at the natural product origins of pharmacophore patterns in synthetic ligands.
This paper proposes a new approach for modeling investor fear after rare disasters. The key element is to take into account that investors’ information about fundamentals driving rare downward jumps in the dividend process is not perfect. Bayesian learning implies that beliefs about the likelihood of rare disasters drop to a much more pessimistic level once a disaster has occurred. Such a shift in beliefs can trigger massive declines in price-dividend ratios. Pessimistic beliefs persist for some time. Thus, belief dynamics are a source of apparent excess volatility relative to a rational expectations benchmark. Due to the low frequency of disasters, even an infinitely-lived investor will remain uncertain about the exact probability. Our analysis is conducted in continuous time and offers closed-form solutions for asset prices. We distinguish between rational and adaptive Bayesian learning. Rational learners account for the possibility of future changes in beliefs in determining their demand for risky assets, while adaptive learners take beliefs as given. Thus, risky assets tend to be lower-valued and price-dividend ratios vary less under adaptive versus rational learning for identical priors. Keywords: beliefs, Bayesian learning, controlled diffusions and jump processes, learning about jumps, adaptive learning, rational learning. JEL classification: D83, G11, C11, D91, E21, D81, C61