Refine
Year of publication
- 2023 (1)
Document Type
- Bachelor Thesis (1)
Language
- English (1)
Has Fulltext
- yes (1)
Is part of the Bibliography
- no (1)
Keywords
- Classification (1)
- NLP (1)
- OCR (1)
Institute
This bachelor thesis developed a pipeline for automatic processing of scanned hospital letters: HospLetExtractor. Hospital letters can contain valuable information about potential adverse drug reactions and useful case information relevant to pharmacovigilance. To make this data accessible, this thesis presents a pipeline consisting of image pre-processing, optical character recognition and post-processing. Pre-processing deskews the images, removes
lines and rectangles, reduces noise and applies super-resolution. For the post-processing a spell checking system was set up including a newly built word frequency dictionary for german medical terms based on a created corpus of german medical texts. Furthermore, classical and deep learning models for the classification of hospital letters were compared, in which the transformer-based models performed best. In order to train and test the models, a new
gold standard was created. By making these medical documents accessible for automatic analysis, hopefully a contribution can be made to expand the scope of pharmacovigilance.