Universitätspublikationen
Refine
Document Type
- Doctoral Thesis (2)
- Bachelor Thesis (1)
Has Fulltext
- yes (3)
Is part of the Bibliography
- no (3)
Keywords
- Information Retrieval (3) (remove)
Institute
- Informatik (2)
- Informatik und Mathematik (1)
In the recent past, we are making huge progress in the field of Artificial Intelligence. Since the rise of neural networks, astonishing new frontiers are continuously being discovered. The development is so fast that overall no major technical limits are in sight. Hence, digitization has expanded from the base of academia and industry to such an extent that it is prevalent in the politics, mass media and even popular arts. The DFG-funded project Specialized Information Service for Biodiversity Research and the BMBF-funded project Linked Open Tafsir can be placed exactly in that overall development. Both projects aim to build an intelligent, up-to-date, modern research infrastructure on biodiversity and theological studies for scholars researching in these respective fields of historical science. Starting from digitized German and Arabic historical literature containing so far unavailable valuable knowledge on biodiversity and theological studies, at its core, our dissertation targets to incorporate state-of-the-art Machine Learning methods for analyzing natural language texts of low-resource languages and enabling foundational Natural Language Processing tasks on them, such as Sentence Boundary Detection, Named Entity Recognition, and Topic Modeling. This ultimately leads to paving the way for new scientific discoveries in the historical disciplines of natural science and humanities. By enriching the landscape of historical low-resource languages with valuable annotation data, our work becomes part of the greater movement of digitizing the society, thus allowing people to focus on things which really matter in science and industry.
Human readers have the ability to infer knowledge from text, even if that particular information is not explicitly stated. In this thesis, we address the phenomena of text-level implicit information and outline novel automated methods for its recovery.
The main focus of this work is on two types of unexpressed content that arises between sentences (implicit discourse relations) and within sentences (implicit semantic roles).
Traditional approaches mostly rely on costly rich linguistic features, e.g., sentiment or frame-based lexicons, and require heuristics or manual feature engineering.
As an improvement, we propose a collection of generic resource-lean methods, implemented in the form of statistical background knowledge or by means of neural architectures.
Our models are largely language-independent and produce state-of-the-art performance, e.g., in the classification of Chinese implicit discourse relations, or the detection of locally covert predicative arguments in free texts.
In novel experiments, we quantitatively demonstrate that both types of implicit information are mutually dependent insofar as, for instance, some implicit roles directly correlate with implicit discourse relations of similar properties.
We show that implicit information processing further benefits downstream applications and demonstrate its applicability to the higher-level task of narrative story understanding.
In the conclusion of the dissertation, we argue for the need of implicit information processing in order to realize the goal of true natural language understanding.
Das Internet als Informationsmedium ist Plattform für eine nie dagewesene Menge an Information, die für einen einzelnen Menschen nicht mehr zu überblicken ist.
Moderne Web-Suchmaschinen greifen auf die Methoden des Information Retrieval zurück um einem NutzerWerkzeuge anzubieten die zu ihrem Informationsbedürfnis relevanten Dokumente im Internet zu finden. Visualisierungen können diese Dokumentenmenge effektiver durch den Nutzer verarbeitbar machen. Eine komplexe Suchanfrage zu formulieren oder ein Suchergebnis nach bestimmten Kriterien zu filtern ist jedoch heute noch denjenigen vorbehalten die bereit sind, die erweiterten Funktionen der Suchmaschinen zu lernen.
Der in dieser Arbeit vorgestellte Ansatz möchte durch die Kombination der Visualisierung, die einen effektiven Überblick über den Suchergebnisraum gibt, mit den mächtigen Filtermöglichkeiten moderner Suchmaschinen die einfache Filterung von Suchergebnismengen durch ein Direct Maniuplation Interface ermöglichen.