• search hit 18 of 31590
Back to Result List

HospLetExtractor : a pipeline for automated analysis of German hospital letters

  • This bachelor thesis developed a pipeline for automatic processing of scanned hospital letters: HospLetExtractor. Hospital letters can contain valuable information about potential adverse drug reactions and useful case information relevant to pharmacovigilance. To make this data accessible, this thesis presents a pipeline consisting of image pre-processing, optical character recognition and post-processing. Pre-processing deskews the images, removes lines and rectangles, reduces noise and applies super-resolution. For the post-processing a spell checking system was set up including a newly built word frequency dictionary for german medical terms based on a created corpus of german medical texts. Furthermore, classical and deep learning models for the classification of hospital letters were compared, in which the transformer-based models performed best. In order to train and test the models, a new gold standard was created. By making these medical documents accessible for automatic analysis, hopefully a contribution can be made to expand the scope of pharmacovigilance.

Download full text files

Export metadata

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Roman Christof
URN:urn:nbn:de:hebis:30:3-862421
Place of publication:Frankfurt am Main
Referee:Alexander Mehler
Document Type:Bachelor Thesis
Language:English
Date of Publication (online):2024/07/29
Year of first Publication:2023
Publishing Institution:Universitätsbibliothek Johann Christian Senckenberg
Granting Institution:Johann Wolfgang Goethe-Universität
Date of final exam:2023/03/08
Release Date:2024/07/29
Tag:Classification; NLP; OCR
Page Number:49
HeBIS-PPN:520199502
Institutes:Informatik und Mathematik
Dewey Decimal Classification:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Sammlungen:Universitätspublikationen
Licence (German):License LogoDeutsches Urheberrecht