Twitter author topic modeling : comparative and classifactory topic analysis using latent Dirichlet allocation

  • The aim of this bachelor thesis is to compare and empirically test the use of classification to improve the topic models Latent Dirichlet Allocation (LDA) and Author Topic Modeling (ATM) in the context of the social media platform Twitter. For this purpose, a corpus was classified with the Dewey Decimal Classification (DDC) and then used to train the topic models. A second dataset, the unclassified corpus, was used for comparison. The assumption that the use of classification could improve the topic models did not prove true for the LDA topic model. Here, a sufficiently good improvement of the models could not be achieved. The ATM model, on the other hand, could be improved by using the classification. In general, the ATM model performed significantly better than the LDA model. In the context of the social media platform Twitter, it can thus be seen that the ATM model is superior to the LDA model and can additionally be improved by classifying the data.

Download full text files

Export metadata

Author:Natalie Förster
Place of publication:Frankfurt am Main
Referee:Alexander MehlerORCiDGND
Advisor:Alexander Mehler, Giuseppe Abrami
Document Type:Bachelor Thesis
Date of Publication (online):2021/11/16
Year of first Publication:2021
Publishing Institution:Universitätsbibliothek Johann Christian Senckenberg
Granting Institution:Johann Wolfgang Goethe-Universität
Date of final exam:2021/07/22
Release Date:2021/11/19
Tag:Machine Learning; Topic Model
Page Number:69
Institutes:Informatik und Mathematik / Mathematik
Dewey Decimal Classification:5 Naturwissenschaften und Mathematik / 51 Mathematik / 510 Mathematik
Licence (German):License LogoDeutsches Urheberrecht