Refine
Document Type
- Doctoral Thesis (2) (remove)
Language
- English (2)
Has Fulltext
- yes (2)
Is part of the Bibliography
- no (2)
Keywords
- Machine Learning (2) (remove)
Institute
In the recent past, we are making huge progress in the field of Artificial Intelligence. Since the rise of neural networks, astonishing new frontiers are continuously being discovered. The development is so fast that overall no major technical limits are in sight. Hence, digitization has expanded from the base of academia and industry to such an extent that it is prevalent in the politics, mass media and even popular arts. The DFG-funded project Specialized Information Service for Biodiversity Research and the BMBF-funded project Linked Open Tafsir can be placed exactly in that overall development. Both projects aim to build an intelligent, up-to-date, modern research infrastructure on biodiversity and theological studies for scholars researching in these respective fields of historical science. Starting from digitized German and Arabic historical literature containing so far unavailable valuable knowledge on biodiversity and theological studies, at its core, our dissertation targets to incorporate state-of-the-art Machine Learning methods for analyzing natural language texts of low-resource languages and enabling foundational Natural Language Processing tasks on them, such as Sentence Boundary Detection, Named Entity Recognition, and Topic Modeling. This ultimately leads to paving the way for new scientific discoveries in the historical disciplines of natural science and humanities. By enriching the landscape of historical low-resource languages with valuable annotation data, our work becomes part of the greater movement of digitizing the society, thus allowing people to focus on things which really matter in science and industry.
Machine Learning (ML) is so pervasive in our todays life that we don't even realise that, more often than expected, we are using systems based on it. It is also evolving faster than ever before. When deploying ML systems that make decisions on their own, we need to think about their ignorance of our uncertain world. The uncertainty might arise due to scarcity of the data, the bias of the data or even a mismatch between the real world and the ML-model. Given all these uncertainties, we need to think about how to build systems that are not totally ignorant thereof. Bayesian ML can to some extent deal with these problems. The specification of the model using probabilities provides a convenient way to quantify uncertainties, which can then be included in the decision making process.
In this thesis, we introduce the Bayesian ansatz to modeling and apply Bayesian ML models in finance and economics. Especially, we will dig deeper into Gaussian processes (GP) and Gaussian process latent variable model (GPLVM). Applied to the returns of several assets, GPLVM provides the covariance structure and also a latent space embedding thereof. Several financial applications can be build upon the output of the GPLVM. To demonstrate this, we build an automated asset allocation system, a predictor for missing asset prices and identify other structure in financial data.
It turns out that the GPLVM exhibits a rotational symmetry in the latent space, which makes it harder to fit. Our second publication reports, how to deal with that symmetry. We propose another parameterization of the model using Householder transformations, by which the symmetry is broken. Bayesian models are changed by reparameterization, if the prior is not changed accordingly. We provide the correct prior distribution of the new parameters, such that the model, i.e. the data density, is not changed under the reparameterization. After applying the reparametrization on Bayesian PCA, we show that the symmetry of nonlinear models can also be broken in the same way.
In our last project, we propose a new method for matching quantile observations, which uses order statistics. The use of order statistics as the likelihood, instead of a Gaussian likelihood, has several advantages. We compare these two models and highlight their advantages and disadvantages. To demonstrate our method, we fit quantiled salary data of several European countries. Given several candidate models for the fit, our method also provides a metric to choose the best option.
We hope that this thesis illustrates some benefits of Bayesian modeling (especially Gaussian processes) in finance and economics and its usage when uncertainties are to be quantified.