OPUS 4 | 004 Datenverarbeitung; Informatik

On gender specific perception of data sharing in Japan (2016)

Tschersich, Markus ; Kiyomoto, Shinsaku ; Pape, Sebastian ; Nakamura, Toru ; Bal, Gökhan ; Takasaki, Haruo ; Rannenberg, Kai

Privacy and its protection is an important part of the culture in the USA and Europe. Literature in this field lacks empirical data from Japan. Thus, it is difficult– especially for foreign researchers – to understand the situation in Japan. To get a deeper understanding we examined the perception of a topic that is closely related to privacy: the perceived benefits of sharing data and the willingness to share in respect to the benefits for oneself, others and companies. We found a significant impact of the gender to each of the six analysed constructs.

Improving data quality by rules: a numismatic example (2017)

Tolle, Karsten ; Wigg-Wolf, David

The archaeological data dealt with in our database solution Antike Fundmünzen in Europa (AFE), which records finds of ancient coins, is entered by humans. Based on the Linked Open Data (LOD) approach, we link our data to Nomisma.org concepts, as well as to other resources like Online Coins of the Roman Empire (OCRE). Since information such as denomination, material, etc. is recorded for each single coin, this information should be identical for coins of the same type. Unfortunately, this is not always the case, mostly due to human errors. Based on rules that we implemented, we were able to make use of this redundant information in order to detect possible errors within AFE, and were even able to correct errors in Nomimsa.org. However, the approach had the weakness that it was necessary to transform the data into an internal data model. In a second step, we therefore developed our rules within the Linked Open Data world. The rules can now be applied to datasets following the Nomisma. org modelling approach, as we demonstrated with data held by Corpus Nummorum Thracorum (CNT). We believe that the use of methods like this to increase the data quality of individual databases, as well as across different data sources and up to the higher levels of OCRE and Nomisma.org, is mandatory in order to increase trust in them.

Terminology evolution in web archiving: Open issues (2008)

Tahmasebi, Nina ; Iofciu, Tereza ; Risse, Thomas ; Niederée, Claudia ; Siberski, Wolf

The correspondence between the terminology used for querying and the one used in content objects to be retrieved, is a crucial prerequisite for effective retrieval technology. However, as terminology is evolving over time, a growing gap opens up between older documents in (long-term) archives and the active language used for querying such archives. Thus, technologies for detecting and systematically handling terminology evolution are required to ensure "semantic" accessibility of (Web) archive content on the long run. As a starting point for dealing with terminology evolution this paper formalizes the problem and discusses issues, first ideas and relevant technologies.

Resolving anaphoric references on deficient syntactic descriptions (1997)

Stuckardt, Roland

Syntactic coindexing restrictions are by now known to be of central importance to practical anaphor resolution approaches. Since, in particular due to structural ambiguity, the assumption of the availability of a unique syntactic reading proves to be unrealistic, robust anaphor resolution relies on techniques to overcome this deficiency. In this paper, two approaches are presented which generalize the verification of coindexing constraints to de cient descriptions. At first, a partly heuristic method is described, which has been implemented. Secondly, a provable complete method is specified. It provides the means to exploit the results of anaphor resolution for a further structural disambiguation. By rendering possible a parallel processing model, this method exhibits, in a general sense, a higher degree of robustness. As a practically optimal solution, a combination of the two approaches is suggested.

Anaphor resolution and the scope of syntactic constraints (1996)

Stuckardt, Roland

An anaphor resolution algorithm is presented which relies on a combination of strategies for narrowing down and selecting from antecedent sets for re exive pronouns, nonre exive pronouns, and common nouns. The work focuses on syntactic restrictions which are derived from Chomsky's Binding Theory. It is discussed how these constraints can be incorporated adequately in an anaphor resolution algorithm. Moreover, by showing that pragmatic inferences may be necessary, the limits of syntactic restrictions are elucidated.

Coreference-Based Summarization and Question Answering: a Case for High Precision Anaphor Resolution (2003)

Stuckardt, Roland

Approaches to Text Summarization and Question Answering are known to benefit from the availability of coreference information. Based on an analysis of its contributions, a more detailed look at coreference processing for these applications will be proposed: it should be considered as a task of anaphor resolution rather than coreference resolution. It will be further argued that high precision approaches to anaphor resolution optimally match the specific requirements. Three such approaches will be described and empirically evaluated, and the implications for Text Summarization and Question Answering will be discussed.

Machine-learning-based vs. manually designed approaches to anaphor resolution: the best of two worlds (2002)

Stuckardt, Roland

In the last years, much effort went into the design of robust anaphor resolution algorithms. Many algorithms are based on antecedent filtering and preference strategies that are manually designed. Along a different line of research, corpus-based approaches have been investigated that employ machine-learning techniques for deriving strategies automatically. Since the knowledge-engineering effort for designing and optimizing the strategies is reduced, the latter approaches are considered particularly attractive. Since, however, the hand-coding of robust antecedent filtering strategies such as syntactic disjoint reference and agreement in person, number, and gender constitutes a once-for-all effort, the question arises whether at all they should be derived automatically. In this paper, it is investigated what might be gained by combining the best of two worlds: designing the universally valid antecedent filtering strategies manually, in a once-for-all fashion, and deriving the (potentially genre-specific) antecedent selection strategies automatically by applying machine-learning techniques. An anaphor resolution system ROSANA-ML, which follows this paradigm, is designed and implemented. Through a series of formal evaluations, it is shown that, while exhibiting additional advantages, ROSANAML reaches a performance level that compares with the performance of its manually designed ancestor ROSANA.

Three Algorithms for Competence-Oriented Anaphor Resolution (2004)

Stuckardt, Roland

In the last decade, much effort went into the design of robust third-person pronominal anaphor resolution algorithms. Typical approaches are reported to achieve an accuracy of 60-85%. Recent research addresses the question of how to deal with the remaining difficult-toresolve anaphors. Lappin (2004) proposes a sequenced model of anaphor resolution according to which a cascade of processing modules employing knowledge and inferencing techniques of increasing complexity should be applied. The individual modules should only deal with and, hence, recognize the subset of anaphors for which they are competent. It will be shown that the problem of focusing on the competence cases is equivalent to the problem of giving precision precedence over recall. Three systems for high precision robust knowledge-poor anaphor resolution will be designed and compared: a ruleset-based approach, a salience threshold approach, and a machine-learning-based approach. According to corpus-based evaluation, there is no unique best approach. Which approach scores highest depends upon type of pronominal anaphor as well as upon text genre.

Voting for POS tagging of latin texts: using the flair of FLAIR to better ensemble classifiers by example of latin (2020)

Stoeckel, Manuel ; Henlein, Alexander ; Hemati, Wahed ; Mehler, Alexander

Despite the great importance of the Latin language in the past, there are relatively few resources available today to develop modern NLP tools for this language. Therefore, the EvaLatin Shared Task for Lemmatization and Part-of-Speech (POS) tagging was published in the LT4HALA workshop. In our work, we dealt with the second EvaLatin task, that is, POS tagging. Since most of the available Latin word embeddings were trained on either few or inaccurate data, we trained several embeddings on better data in the first step. Based on these embeddings, we trained several state-of-the-art taggers and used them as input for an ensemble classifier called LSTMVoter. We were able to achieve the best results for both the cross-genre and the cross-time task (90.64% and 87.00%) without using additional annotated data (closed modality). In the meantime, we further improved the system and achieved even better results (96.91% on classical, 90.87% on cross-genre and 87.35% on cross-time).

When specialization helps: using pooled contextualized embeddings to detect chemical and biomedical entities in Spanish (2019)

Stoeckel, Manuel ; Hemati, Wahed ; Mehler, Alexander

The recognition of pharmacological substances, compounds and proteins is an essential preliminary work for the recognition of relations between chemicals and other biomedically relevant units. In this paper, we describe an approach to Task 1 of the PharmaCoNER Challenge, which involves the recognition of mentions of chemicals and drugs in Spanish medical texts. We train a state-of-the-art BiLSTM-CRF sequence tagger with stacked Pooled Contextualized Embeddings, word and sub-word embeddings using the open-source framework FLAIR. We present a new corpus composed of articles and papers from Spanish health science journals, termed the Spanish Health Corpus, and use it to train domain-specific embeddings which we incorporate in our model training. We achieve a result of 89.76% F1-score using pre-trained embeddings and are able to improve these results to 90.52% F1-score using specialized embeddings.

Lower bounds for multi-pass processing of multiple data streams (2009)

Schweikardt, Nicole

This paper gives a brief overview of computation models for data stream processing, and it introduces a new model for multi-pass processing of multiple streams, the so-called mp2s-automata. Two algorithms for solving the set disjointness problem with these automata are presented. The main technical contribution of this paper is the proof of a lower bound on the size of memory and the number of heads that are required for solving the set disjointness problem with mp2s-automata.

Open Educational Resources and Digital Higher Education (2022)

Schulze, Uwe

Modeling saltwater intrusion scenarios for a coastal aquifer at the German North Sea (2018)

Schneider, Anke ; Zhao, Hong ; Wolf, Jens ; Logashenko, Dmitry ; Reiter, Sebastian ; Howahr, M. ; Eley, Malte ; Gelleszun, Marlene ; Wiederhold, Helga

A 3d regional density-driven flow model of a heterogeneous aquifer system at the German North Sea Coast is set up within the joint project NAWAK (“Development of sustainable adaption strategies for the water supply and distribution infrastructure on condition of climatic and demographic change”). The development of the freshwater-saltwater interface is simulated for three climate and demographic scenarios. Groundwater flow simulations are performed with the finite volume code d3f++ (distributed density driven flow) that has been developed with a view to the modelling of large, complex, strongly density-influenced aquifer systems over long time periods.

Simulation in the call-by-need lambda-calculus with letrec (2010)

Schmidt-Schauß, Manfred ; Sabel, David ; Machkasova, Elena

This paper shows the equivalence of applicative similarity and contextual approximation, and hence also of bisimilarity and contextual equivalence, in the deterministic call-by-need lambda calculus with letrec. Bisimilarity simplifies equivalence proofs in the calculus and opens a way for more convenient correctness proofs for program transformations. Although this property may be a natural one to expect, to the best of our knowledge, this paper is the first one providing a proof. The proof technique is to transfer the contextual approximation into Abramsky’s lazy lambda calculus by a fully abstract and surjective translation. This also shows that the natural embedding of Abramsky’s lazy lambda calculus into the call-by-need lambda calculus with letrec is an isomorphism between the respective term-models. We show that the equivalence property proven in this paper transfers to a call-by-need letrec calculus developed by Ariola and Felleisen. 1998 ACM Subject Classification: F.4.2, F.3.2, F.3.3, F.4.1. Key words and phrases: semantics, contextual equivalence, bisimulation, lambda calculus, call-by-need, letrec.

Optimizing space of parallel processes (2019)

Schmidt-Schauß, Manfred ; Dallmeyer, Nils

This paper is a contribution to exploring and analyzing space-improvements in concurrent programming languages, in particular in the functional process-calculus CHF. Space-improvements are defined as a generalization of the corresponding notion in deterministic pure functional languages. The main part of the paper is the O(n ·logn) algorithm SPOPTN for offline space optimization of several parallel independent processes. Applications of this algorithm are: (i) affirmation of space improving transformations for particular classes of program transformations; (ii) support of an interpreter-based method for refuting space-improvements; and (iii) as a stand-alone offline-optimizer for space (or similar resources) of parallel processes.

Mobile qualified electronic signatures for secure mobile brokerage (2004)

Roßnagel, Heiko

Despite a legal framework being in place for several years, the market share of qualified electronic signatures is disappointingly low. Mobile Signatures provide a new and promising opportunity for the deployment of an infrastructure for qualified electronic signatures. We that SIM-based signatures are the most secure and convenient solution. However, using the SIM-card as a secure signature creation device (SSCD) raises new challenges, because it would contain the user’s private key as well as the subscriber identification. Combining both functions in one card raises the question who will have the control over the keys and certificates. We propose a protocol called Certification on Demand (COD) that separates certification services from subscriber identification information and allows consumers to choose their appropriate certification services and service providers based on their needs. This infrastructure could be used to enable secure mobile brokerage services that can ommit the necessity of TAN lists and therefore allow a better integration of information and transaction services.

German Studies in the United States : Vortrag am 6th Frankfurt Scientific Symposium, GNARP und wie sie die Welt sieht: Aussichten transatlantischer Partnerschaft im digitalen Zeitalter: 5.10.2006 - 7.10.2006 (2006)

Remak-Honnef, Elisabeth

To stimulate further discussion, I would like to briefly tackle the following questions: * How can one become informed about what is going on in German Studies in the US? * What kinds of American guides to German resources are available? * What kinds of German Studies resources are being produced in the US? * What do we know about how scholars are using (or not) these guides and resources?

Encoding induction in correctness proofs of program transformations as a termination problem (2012)

Rau, Conrad ; Sabel, David ; Schmidt-Schauß, Manfred

The diagram-based method to prove correctness of program transformations consists of computing complete set of (forking and commuting) diagrams, acting on sequences of standard reductions and program transformations. In many cases, the only missing step for proving correctness of a program transformation is to show the termination of the rearrangement of the sequences. Therefore we encode complete sets of diagrams as term rewriting systems and use an automated tool to show termination, which provides a further step in the automation of the inductive step in correctness proofs.

Generating practical random hyperbolic graphs in near-linear time and with sub-linear memory (2017)

Penschuck, Manuel

Random graph models, originally conceived to study the structure of networks and the emergence of their properties, have become an indispensable tool for experimental algorithmics. Amongst them, hyperbolic random graphs form a well-accepted family, yielding realistic complex networks while being both mathematically and algorithmically tractable. We introduce two generators MemGen and HyperGen for the G_{alpha,C}(n) model, which distributes n random points within a hyperbolic plane and produces m=n*d/2 undirected edges for all point pairs close by; the expected average degree d and exponent 2*alpha+1 of the power-law degree distribution are controlled by alpha>1/2 and C. Both algorithms emit a stream of edges which they do not have to store. MemGen keeps O(n) items in internal memory and has a time complexity of O(n*log(log n) + m), which is optimal for networks with an average degree of d=Omega(log(log n)). For realistic values of d=o(n / log^{1/alpha}(n)), HyperGen reduces the memory footprint to O([n^{1-alpha}*d^alpha + log(n)]*log(n)). In an experimental evaluation, we compare HyperGen with four generators among which it is consistently the fastest. For small d=10 we measure a speed-up of 4.0 compared to the fastest publicly available generator increasing to 29.6 for d=1000. On commodity hardware, HyperGen produces 3.7e8 edges per second for graphs with 1e6 < m < 1e12 and alpha=1, utilising less than 600MB of RAM. We demonstrate nearly linear scalability on an Intel Xeon Phi.

Assessing privacy policies of internet of things services (2018)

Paul, Niklas ; Tesfay, Welderufael Berhane ; Kipker, Dennis-Kenji ; Stelter, Mattea ; Pape, Sebastian

This paper provides an assessment framework for privacy policies of Internet of Things Services which is based on particular GDPR requirements. The objective of the framework is to serve as supportive tool for users to take privacy-related informed decisions. For example when buying a new fitness tracker, users could compare different models in respect to privacy friendliness or more particular aspects of the framework such as if data is given to a third party. The framework consists of 16 parameters with one to four yes-or-no-questions each and allows the users to bring in their own weights for the different parameters. We assessed 110 devices which had 94 different policies. Furthermore, we did a legal assessment for the parameters to deal with the case that there is no statement at all regarding a certain parameter. The results of this comparative study show that most of the examined privacy policies of IoT devices/services are insufficient to address particular GDPR requirements and beyond. We also found a correlation between the length of the policy and the privacy transparency score, respectively.

Open Access

004 Datenverarbeitung; Informatik

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Institute

48 search hits