Refine
Document Type
- Working Paper (6)
- Bachelor Thesis (2)
- Doctoral Thesis (2)
- Master's Thesis (2)
- Article (1)
- Part of a Book (1)
- Preprint (1)
Has Fulltext
- yes (15)
Is part of the Bibliography
- no (15) (remove)
Keywords
- Machine Learning (15) (remove)
Institute
- Center for Financial Studies (CFS) (6)
- Wirtschaftswissenschaften (6)
- Informatik und Mathematik (5)
- Sustainable Architecture for Finance in Europe (SAFE) (5)
- Institute for Monetary and Financial Stability (IMFS) (3)
- House of Finance (HoF) (2)
- Extern (1)
- Foundation of Law and Finance (1)
- Frankfurt Institute for Advanced Studies (FIAS) (1)
- Mathematik (1)
The aim of this bachelor thesis is to compare and empirically test the use of classification to improve the topic models Latent Dirichlet Allocation (LDA) and Author Topic Modeling
(ATM) in the context of the social media platform Twitter. For this purpose, a corpus was classified with the Dewey Decimal Classification (DDC) and then used to train the topic models. A second dataset, the unclassified corpus, was used for comparison. The assumption that the use of classification could improve the topic models did not prove true for the LDA topic model. Here, a sufficiently good improvement of the models could not be achieved. The ATM model, on the other hand, could be improved by using the classification. In general, the ATM model performed significantly better than the LDA model. In the context of the social media platform Twitter, it can thus be seen that the ATM model is superior to the LDA model and can additionally be improved by classifying the data.
Art-related non-fungible tokens (NFTs) took the digital art space by storm in 2021, generating massive amounts of volume and attracting a large number of users to a previously obscure part of blockchain technology. Still, very little is known about the attributes that influence the price of these digital assets. This paper attempts to evaluate the level of speculation associated with art NFTs, comprehend the characteristics that confer value on them and design a profitable trading strategy based on our findings. We analyze 860,067 art NFTs that have been deployed on the Ethereum blockchain and have been involved in 317,950 sales using machine learning methods to forecast the probability of sale, the trade frequency and the average price. We find that NFTs are highly speculative assets and that their price and recurrence of sale are heavily determined by the floor and the last sale prices, independent of any fundamental value.
Central bank intervention in the form of quantitative easing (QE) during times of low interest rates is a controversial topic. The author introduces a novel approach to study the effectiveness of such unconventional measures. Using U.S. data on six key financial and macroeconomic variables between 1990 and 2015, the economy is estimated by artificial neural networks. Historical counterfactual analyses show that real effects are less pronounced than yield effects.
Disentangling the effects of the individual asset purchase programs, impulse response functions provide evidence for QE being less effective the more the crisis is overcome. The peak effects of all QE interventions during the Financial Crisis only amounts to 1.3 pp for GDP growth and 0.6 pp for inflation respectively. Hence, the time as well as the volume of the interventions should be deliberated.
Bei der Bekleidungsmodellierung geht es um den Entwurf von Bekleidung von Personen, die beispielsweise in Szenen dargestellt werden können. Dabei stützt sich der Entwurf auf Informationen aus einer Datengrundlage. Die Darstellung von Szenen, in denen Personen dargestellt werden, stellt sich grundsätzlich als Zusammenspiel komplexer Teilaspekte dar. Dabei wird die Nachvollziehbarkeit einer modellierten Szene oder modellierter Avatare im Auge des Betrachters ganz wesentlich durch den Faktor passend gewählter Kleidung bestimmt.
In dieser Arbeit werden Ansätze und Verfahren vorgestellt, die zur Bekleidungsmodellierung auf Grundlage von Textdokumenten basieren. Dafür werden Möglichkeiten erörtert, die es erlauben Informationen aus Texten zu extrahieren und für die Modellierung einzusetzen.
Zur Bearbeitung der Aufgabenstellung wird zunächst ein aus dem Machine Learning bekanntes kontextuelles Modell hinsichtlich einer Mehrklassen-Klassifizierung trainiert und angewendet. Daraufhin wird die Erstellung einer eigenen Wissensressource, die sich auf textlicher Ebene mit dem Thema der Bekleidung auseinandersetzt, aufgebaut und mit zahlreichen Informationen aus bereits bestehenden Ressourcen popularisiert. Die neue Ressource wird in Form einer Graphdatenbank entworfen. Dabei werden Relationen zwischen den einzelnen Elementen mithilfe von statischen Modellen sowie einem kontextuellen Modell, dem BERT-Modell, erstellt. Schließlich wird auf Grundlage der entwickelten Graphdatenbank ein in der Programmiersprache Python entwickeltes Programm vorgestellt, dass Eingabetexte unter Hinzunahme der Informationen und Relationen innerhalb der Graphdatenbank verarbeitet und Kleidungsstücke detektiert.
Nach der theoretischen Aufarbeitung der entwickelten Ansätze werden die daraus resultierenden Ergebnisse diskutiert und bestehende Problematiken bei der Bearbeitung der Aufgabenstellung angesprochen. Abschließend wird die Arbeit zusammengefasst und Anregungen für die weitere Bearbeitung dieser Thematik vorgestellt.
Industry concentration and markups in the US have been rising over the last 3-4 decades. However, the causes remain largely unknown. This paper uses machine learning on regulatory documents to construct a novel dataset on compliance costs to examine the effect of regulations on market power. The dataset is comprehensive and consists of all significant regulations at the 6-digit NAICS level from 1970-2018. We find that regulatory costs have increased by $1 trillion during this period. We document that an increase in regulatory costs results in lower (higher) sales, employment, markups, and profitability for small (large) firms. Regulation driven increase in concentration is associated with lower elasticity of entry with respect to Tobin's Q, lower productivity and investment after the late 1990s. We estimate that increased regulations can explain 31-37% of the rise in market power. Finally, we uncover the political economy of rulemaking. While large firms are opposed to regulations in general, they push for the passage of regulations that have an adverse impact on small firms.
Part-of-Speech tagging is generally performed by Markov models, based on bigram or trigram models. While Markov models have a strong concentration on the left context of a word, many languages require the inclusion of right context for correct disambiguation. We show for German that the best results are reached by a combination of left and right context. If only left context is available, then changing the direction of analysis and going from right to left improves the results. In a version of MBT (Daelemans et al., 1996) with default parameter settings, the inclusion of the right context improved POS tagging accuracy from 94.00% to 96.08%, thus corroborating our hypothesis. The version with optimized parameters reaches 96.73%.
High impact events, political changes and new technologies are reflected in our language and lead to constant evolution of terms, expressions and names. Not knowing about names used in the past for referring to a named entity can severely decrease the performance of many computational linguistic algorithms. We propose NEER, an unsupervised method for named entity evolution recognition independent of external knowledge sources. We find time periods with high likelihood of evolution. By analyzing only these time periods using a sliding window co-occurrence method we capture evolving terms in the same context. We thus avoid comparing terms from widely different periods in time and overcome a severe limitation of existing methods for named entity evolution, as shown by the high recall of 90% on the New York Times corpus. We compare several relatedness measures for filtering to improve precision. Furthermore, using machine learning with minimal supervision improves precision to 94%.
This paper contributes a multivariate forecasting comparison between structural models and Machine-Learning-based tools. Specifically, a fully connected feed forward non-linear autoregressive neural network (ANN) is contrasted to a well established dynamic stochastic general equilibrium (DSGE) model, a Bayesian vector autoregression (BVAR) using optimized priors as well as Greenbook and SPF forecasts. Model estimation and forecasting is based on an expanding window scheme using quarterly U.S. real-time data (1964Q2:2020Q3) for 8 macroeconomic time series (GDP, inflation, federal funds rate, spread, consumption, investment, wage, hours worked), allowing for up to 8 quarter ahead forecasts. The results show that the BVAR improves forecasts compared to the DSGE model, however there is evidence for an overall improvement of predictions when relying on ANN, or including them in a weighted average. Especially, ANN-based inflation forecasts improve other predictions by up to 50%. These results indicate that nonlinear data-driven ANNs are a useful method when it comes to macroeconomic forecasting.
For medicine to fulfill its promise of personalized treatments based on a better understanding of disease biology, computational and statistical tools must exist to analyze the increasing amount of patient data that becomes available. A particular challenge is that several types of data are being measured to cope with the complexity of the underlying systems, enhance predictive modeling and enrich molecular understanding.
Here we review a number of recent approaches that specialize in the analysis of multimodal data in the context of predictive biomedicine. We focus on methods that combine different OMIC measurements with image or genome variation data. Our overview shows the diversity of methods that address analysis challenges and reveals new avenues for novel developments.
Goal-Conditioned Reinforcement Learning (GCRL) is a popular framework for training agents to solve multiple tasks in a single environment. It is cru- cial to train an agent on a diverse set of goals to ensure that it can learn to generalize to unseen downstream goals. Therefore, current algorithms try to learn to reach goals while simultaneously exploring the environment for new ones (Aubret et al., 2021; Mendonca et al., 2021). This creates a form of the prominent exploration-exploitation dilemma. To relieve the pres- sure of a single agent having to optimize for two competing objectives at once, this thesis proposes the novel algorithm family Goal-Conditioned Re- inforcement Learning with Prior Intrinsic Exploration (GC-π), which sep- arates exploration and goal learning into distinct phases. In the first ex- ploration phase, an intrinsically motivated agent explores the environment and collects a rich dataset of states and actions. This dataset is then used to learn a representation space, which acts as the distance metric for the goal- conditioned reward signal. In the final phase, a goal-conditioned policy is trained with the help of the representation space, and its training goals are randomly sampled from the dataset collected during the exploration phase. Multiple variations of these three phases have been extensively evaluated in the classic AntMaze MuJoCo environment (Nachum et al., 2018). The fi- nal results show that the proposed algorithms are able to fully explore the environment and solve all downstream goals while using every dimension of the state space for the goal space. This makes the approach more flexible compared to previous GCRL work, which only ever uses a small subset of the dimensions for the goals (S. Li et al., 2021a; Pong et al., 2020).