• Treffer 1 von 1
Zurück zur Trefferliste

Multiple annotation for biodiversity: developing an annotation framework among biology, linguistics and text technology

  • Biodiversity information is contained in countless digitized and unprocessed scholarly texts. Although automated extraction of these data has been gaining momentum for years, there are still innumerable text sources that are poorly accessible and require a more advanced range of methods to extract relevant information. To improve the access to semantic biodiversity information, we have launched the BIOfid project (www.biofid.de) and have developed a portal to access the semantics of German language biodiversity texts, mainly from the 19th and 20th century. However, to make such a portal work, a couple of methods had to be developed or adapted first. In particular, text-technological information extraction methods were needed, which extract the required information from the texts. Such methods draw on machine learning techniques, which in turn are trained by learning data. To this end, among others, we gathered the BIOfid text corpus, which is a cooperatively built resource, developed by biologists, text technologists, and linguists. A special feature of BIOfid is its multiple annotation approach, which takes into account both general and biology-specific classifications, and by this means goes beyond previous, typically taxon- or ontology-driven proper name detection. We describe the design decisions and the genuine Annotation Hub Framework underlying the BIOfid annotations and present agreement results. The tools used to create the annotations are introduced, and the use of the data in the semantic portal is described. Finally, some general lessons, in particular with multiple annotation projects, are drawn.

Volltext Dateien herunterladen

Metadaten exportieren

Metadaten
Verfasserangaben:Andy LückingORCiD, Christine DrillerORCiDGND, Manuel Stoeckel, Giuseppe AbramiORCiD, Adrian PachzeltORCiDGND, Alexander MehlerORCiDGND
URN:urn:nbn:de:hebis:30:3-581023
DOI:https://doi.org/10.1007/s10579-021-09553-5
Titel des übergeordneten Werkes (Englisch):Language resources and evaluation
Verlag:Springer
Dokumentart:Wissenschaftlicher Artikel
Sprache:Englisch
Datum der Veröffentlichung (online):04.08.2021
Datum der Erstveröffentlichung:04.08.2021
Veröffentlichende Institution:Universitätsbibliothek Johann Christian Senckenberg
Datum der Freischaltung:11.05.2022
Freies Schlagwort / Tag:Annotation; BIOfid; Biodiversity; Inter-annotator agreement; Named entity recognition; Semantic portal; Specialized information service; Taxon
Jahrgang:2021
Seitenzahl:49
Erste Seite:1
Letzte Seite:49
HeBIS-PPN:495919926
Institute:Informatik und Mathematik
Zentrale Einrichtung / Universitätsbibliothek
Angeschlossene und kooperierende Institutionen / Senckenbergische Naturforschende Gesellschaft
DDC-Klassifikation:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
4 Sprache / 40 Sprache / 400 Sprache
5 Naturwissenschaften und Mathematik / 57 Biowissenschaften; Biologie / 570 Biowissenschaften; Biologie
Sammlungen:Universitätspublikationen
Lizenz (Deutsch):License LogoCreative Commons - Namensnennung 4.0