OPUS 4 | Search

Voting for POS tagging of latin texts: using the flair of FLAIR to better ensemble classifiers by example of latin (2020)

Stoeckel, Manuel ; Henlein, Alexander ; Hemati, Wahed ; Mehler, Alexander

Despite the great importance of the Latin language in the past, there are relatively few resources available today to develop modern NLP tools for this language. Therefore, the EvaLatin Shared Task for Lemmatization and Part-of-Speech (POS) tagging was published in the LT4HALA workshop. In our work, we dealt with the second EvaLatin task, that is, POS tagging. Since most of the available Latin word embeddings were trained on either few or inaccurate data, we trained several embeddings on better data in the first step. Based on these embeddings, we trained several state-of-the-art taggers and used them as input for an ensemble classifier called LSTMVoter. We were able to achieve the best results for both the cross-genre and the cross-time task (90.64% and 87.00%) without using additional annotated data (closed modality). In the meantime, we further improved the system and achieved even better results (96.91% on classical, 90.87% on cross-genre and 87.35% on cross-time).

When specialization helps: using pooled contextualized embeddings to detect chemical and biomedical entities in Spanish (2019)

Stoeckel, Manuel ; Hemati, Wahed ; Mehler, Alexander

The recognition of pharmacological substances, compounds and proteins is an essential preliminary work for the recognition of relations between chemicals and other biomedically relevant units. In this paper, we describe an approach to Task 1 of the PharmaCoNER Challenge, which involves the recognition of mentions of chemicals and drugs in Spanish medical texts. We train a state-of-the-art BiLSTM-CRF sequence tagger with stacked Pooled Contextualized Embeddings, word and sub-word embeddings using the open-source framework FLAIR. We present a new corpus composed of articles and papers from Spanish health science journals, termed the Spanish Health Corpus, and use it to train domain-specific embeddings which we incorporate in our model training. We achieve a result of 89.76% F1-score using pre-trained embeddings and are able to improve these results to 90.52% F1-score using specialized embeddings.

TextAnnotator: a UIMA based tool for the simultaneous and collaborative annotation of texts (2020)

Abrami, Giuseppe ; Stoeckel, Manuel ; Mehler, Alexander

The annotation of texts and other material in the field of digital humanities and Natural Language Processing (NLP) is a common task of research projects. At the same time, the annotation of corpora is certainly the most time- and cost-intensive component in research projects and often requires a high level of expertise according to the research interest. However, for the annotation of texts, a wide range of tools is available, both for automatic and manual annotation. Since the automatic pre-processing methods are not error-free and there is an increasing demand for the generation of training data, also with regard to machine learning, suitable annotation tools are required. This paper defines criteria of flexibility and efficiency of complex annotations for the assessment of existing annotation tools. To extend this list of tools, the paper describes TextAnnotator, a browser-based, multi-annotation system, which has been developed to perform platform-independent multimodal annotations and annotate complex textual structures. The paper illustrates the current state of development of TextAnnotator and demonstrates its ability to evaluate annotation quality (inter-annotator agreement) at runtime. In addition, it will be shown how annotations of different users can be performed simultaneously and collaboratively on the same document from different platforms using UIMA as the basis for annotation.

Public report and proceedings of the third European Privacy Open Space held at the Europahaus in Vienna 26th and 27th October 2009 in conjunction with the Austrian Big Brother Awards / Schallaböck, Jan [Hrsg.] (2009)

The Project European Privacy Open Space (PrivacyOS) aims at bringing together industry, SMEs, Government, Academia and Civil Society to foster development and deployment of privacy infrastructures for Europe. The general objectives of PrivacyOS are to create a longterm collaboration in the thematic network and establish collective interfaces with other EU projects. Participants exchange research and best practices, as well as develop strategies and joint projects following four core policy goals: Awareness-rising, enabling privacy on the Web, fostering privacy-friendly Identity Management, and stipulating research. ... This report focuses on the 3rd PrivacyOS conference, which was held in Vienna, October 26th and 27th 2009, co-located with the Austrian Big Brother Awards. 50 participants attended the conference and devised the agenda with 21 presentations in two parallel tracks. The topics of the presentations discussed included, amongst others: data protection awareness, data protection in healthcare, data protection in the Web 2.0, privacy-related technologies such as EnCoRe, TOR or Microformats as well as regulatory, cultural and sociological implications of data protection. Also at the 3rd PrivacyOS conference the software product “KiwiSecurity” was awarded the EuroPriSe Seal (European Privacy Seal, www.european-privacy-seal.eu). EuroPriSe is an initiative of the data protection authority Unabhängiges Landeszentrum für Datenschutz Schleswig-Holstein (ULD), Germany. It has been started as a European Project under the eTEN programme.

Open Access

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Institute

4 search hits