Refine
Document Type
- Article (5)
- Doctoral Thesis (2)
- Working Paper (2)
- Book (1)
Language
- English (7)
- German (2)
- Multiple languages (1)
Has Fulltext
- yes (10)
Is part of the Bibliography
- no (10)
Keywords
- Big Data (10) (remove)
Institute
- Rechtswissenschaft (3)
- Center for Financial Studies (CFS) (2)
- House of Finance (HoF) (2)
- Medizin (2)
- Sustainable Architecture for Finance in Europe (SAFE) (2)
- Wirtschaftswissenschaften (2)
- DFG-Forschergruppen (1)
- Exzellenzcluster Makromolekulare Komplexe (1)
- Foundation of Law and Finance (1)
- Gesellschaftswissenschaften (1)
The value of plant ecological datasets with hundreds or thousands of species is principally determined by the taxonomic accuracy of their plant names. However, combining existing lists of species to assemble a harmonized dataset that is clean of taxonomic errors can be a difficult task for non-taxonomists. Here, we describe the range of taxonomic difficulties likely to be encountered during dataset assembly and present an easy-to-use taxonomic cleaning protocol aimed at assisting researchers not familiar with the finer details of taxonomic cleaning. The protocol produces a final dataset (FD) linked to a companion dataset (CD), providing clear details of the path from existing lists to the FD taken by each cleaned taxon. Taxa are checked off against ten categories in the CD that succinctly summarize all taxonomic modifications required. Two older, publicly-available lists of naturalized Asteraceae in Australia were merged into a harmonized dataset as a case study to quantify the impacts of ignoring the critical process of taxonomic cleaning in invasion ecology. Our FD of naturalized Asteraceae contained 257 species and infra-species. Without implementation of the full cleaning protocol, the dataset would have contained 328 taxa, a 28% overestimate of taxon richness by 71 taxa. Our naturalized Asteraceae CD described the exclusion of 88 names due to nomenclatural issues (e.g. synonymy), the inclusion of 26 updated currently accepted names and four taxa newly naturalized since the production of the source datasets, and the exclusion of 13 taxa that were either found not to be in Australia or were in fact doubtfully naturalized. This study also supports the notion that automated processes alone will not be enough to ensure taxonomically clean datasets, and that manual scrutiny of data is essential. In the long term, this will best be supported by increased investment in taxonomy and botany in university curricula.
Despite advances in myocardial reperfusion therapies, acute myocardial ischaemia/reperfusion injury and consequent ischaemic heart failure represent the number one cause of morbidity and mortality in industrialized societies. Although different therapeutic interventions have been shown beneficial in preclinical settings, an effective cardioprotective or regenerative therapy has yet to be successfully introduced in the clinical arena. Given the complex pathophysiology of the ischaemic heart, large scale, unbiased, global approaches capable of identifying multiple branches of the signalling networks activated in the ischaemic/reperfused heart might be more successful in the search for novel diagnostic or therapeutic targets. High-throughput techniques allow high-resolution, genome-wide investigation of genetic variants, epigenetic modifications, and associated gene expression profiles. Platforms such as proteomics and metabolomics (not described here in detail) also offer simultaneous readouts of hundreds of proteins and metabolites. Isolated omics analyses usually provide Big Data requiring large data storage, advanced computational resources and complex bioinformatics tools. The possibility of integrating different omics approaches gives new hope to better understand the molecular circuitry activated by myocardial ischaemia, putting it in the context of the human ‘diseasome’. Since modifications of cardiac gene expression have been consistently linked to pathophysiology of the ischaemic heart, the integration of epigenomic and transcriptomic data seems a promising approach to identify crucial disease networks. Thus, the scope of this Position Paper will be to highlight potentials and limitations of these approaches, and to provide recommendations to optimize the search for novel diagnostic or therapeutic targets for acute ischaemia/reperfusion injury and ischaemic heart failure in the post-genomic era.
Freie, öffentlichen Meinungsbildung ist das Herzstück der Demokratie. Doch digitale Kommunikation und datengetriebene Kuratierung von Inhalten verändern das der Demokratie eigene Konzept von Öffentlichkeit und erfordern neue gesetzliche Rahmenbedingungen. In diesem Sammelband führen Expert:innen der Rechts- und Politikwissenschaften, der Soziologie und Datenwissenschaft in die Materie ein und weisen Wege zur Stärkung der Demokratie in der Digitalisierung.
Hintergrund: Die digitale Transformation des Gesundheitssystems verändert den Beruf des Arztes. Data Literacy wird hierbei als eine der führenden Zukunftskompetenzen erachtet, findet jedoch derzeit weder in den implementierten Curricula des Medizinstudiums noch in den aktuell laufenden Reformprozessen (Masterplan Medizinstudium 2020 und Nationaler Kompetenzbasierter Lernzielkatalog) Beachtung.
Ziel: Der Beitrag möchte zum einen die Aspekte beleuchten, die im Begriff der Data Literacy im medizinischen Kontext gebündelt werden. Zum andern wird ein Lehrkonzept vorgestellt, das Data Literacy im Zeichen der digitalen Transformation erstmals im Medizinstudium abbildet.
Material und Methoden: Das Blended-Learning-Curriculum „Medizin im digitalen Zeitalter“ adressiert in 5 Modulen den diversen Transformationsprozess der Medizin von digitaler Kommunikation über Smart Devices und medizinische Apps, Telemedizin, virtuelle/augmentierte und robotische Chirurgie bis hin zu individualisierter Medizin und Big Data. Diese Arbeit stellt Konzept und Erfahrungen der erstmaligen Implementierung des 5. Moduls dar, welches transdisziplinär und integrativ den Aspekt Data Literacy erläutert.
Ergebnisse: Die Evaluation des Kurskonzepts erfolgte sowohl qualitativ als auch quantitativ und demonstriert einen Kompetenzgewinn in den Bereichen Wissen und Fertigkeiten sowie eine differenziertere Haltung nach Kursabschluss.
Schlussfolgerungen: Die curriculare Integration von Data Literacy ist eine transdisziplinäre und longitudinale Aufgabe. Bei der Entwicklung dieser Curricula sollten die hohe Geschwindigkeit des Veränderungsprozesses der digitalen Transformation beachtet und die curriculare Anpassung im Sinne eines Agility by Design bereits bei der Konzeption adressiert werden.
Mapping a public discourse with the tools of computational text analysis comes with many contingencies in the areas of corpus curation, data processing and analysis, and visualisation. However, the complexity of algorithmic assemblies and the beauty of resulting images give the impression of ‘objectivity’. Instead of concealing uncertainties and artefacts in order to tell a coherent and all-encompassing story, retaining the variety of alternative assemblies may actually strengthen the method. By utilising the mobility of digital devices, we could create mutable mobiles that allow access to our laboratories and enable challenging rearrangements and interpretations.
Research in the field of Digital Humanities, also known as Humanities Computing, has seen a steady increase over the past years. Situated at the intersection of computing science and the humanities, present efforts focus on making resources such as texts, images, musical pieces and other semiotic artifacts digitally available, searchable and analysable. To this end, computational tools enabling textual search, visual analytics, data mining, statistics and natural language processing are harnessed to support the humanities researcher. The processing of large data sets with appropriate software opens up novel and fruitful approaches to questions in the traditional humanities. This report summarizes the Dagstuhl seminar 14301 on “Computational Humanities - bridging the gap between Computer Science and Digital Humanities”.
1998 ACM Subject Classification I.2.7 Natural Language Processing, J.5 Arts and Humanities
The main contribution of the thesis is in helping to understand which software system parameters mostly affect the performance of Big Data Platforms under realistic workloads. In detail, the main research contributions of the thesis are:
1. Definition of the new concept of heterogeneity for Big Data Architectures (Chapter 2);
2. Investigation of the performance of Big Data systems (e.g. Hadoop) in virtualized environments (Section 3.1);
3. Investigation of the performance of NoSQL databases versus Hadoop distributions (Section 3.2);
4. Execution and evaluation of the TPCx-HS benchmark (Section 3.3);
5. Evaluation and comparison of Hive and Spark SQL engines using benchmark queries (Section 3.4);
6. Evaluation of the impact of compression techniques on SQL-on-Hadoop engine performance (Section 3.5);
7. Extensions of the standardized Big Data benchmark BigBench (TPCx-BB)(Section 4.1 and 4.3);
8. Definition of a new benchmark, called ABench (Big Data Architecture Stack Benchmark), that takes into account the heterogeneity of Big Data architectures (Section 4.5).
The thesis is an attempt to re-define system benchmarking taking into account the new requirements posed by the Big Data applications. With the explosion of Artificial Intelligence (AI) and new hardware computing power, this is a first step towards a more holistic approach to benchmarking.
Der technische Fortschritt ermöglicht die Auswertung großer Datenmengen mittels Algorithmen zur Feststellung bislang unbekannter Korrelationen. Schlagwortartig werden solche Analysen unter dem Begriff Big Data zusammengefasst. Häufig sind personenbezogene Daten Gegenstand von Big-Data-Anwendungen, sei es als Grundlage oder Ergebnis einer Auswertung. In diesen Fällen ist das Datenschutzrecht zu beachten.
Der Zweckbindungsgrundsatz fordert die Angabe eines Verarbeitungszwecks bereits bei Erhebung der Daten und eine Bindung des weiteren Datenumgangs an diesen Zweck. Damit besteht ein Spannungsverhältnis zu Big-Data-Anwendungen, die zu Verarbeitungsbeginn den Zweck allenfalls unspezifisch anzugeben vermögen. Auf Grundlage des alten Bundesdatenschutzgesetzes mit einzelnen Ausblicken auf die Datenschutzgrundverordnung untersucht die Arbeit die datenschutzrechtlichen Anforderungen an die Zweckfestlegung und welche Bindungen aus ihr folgen. Zudem nimmt der Autor mögliche Lösungen des Konflikts zwischen Big-Data-Anwendungen und dem Zweckbindungsgrundsatz in den Blick.
Lack of privacy due to surveillance of personal data, which is becoming ubiquitous around the world, induces persistent conformity to the norms prevalent under the surveillance regime. We document this channel in a unique laboratory---the widespread surveillance of private citizens in East Germany. Exploiting localized variation in the intensity of surveillance before the fall of the Berlin Wall, we show that, at the present day, individuals who lived in high-surveillance counties are more likely to recall they were spied upon, display more conformist beliefs about society and individual interactions, and are hesitant about institutional and social change. Social conformity is accompanied by conformist economic choices: individuals in high-surveillance counties save more and are less likely to take out credit, consistent with norms of frugality. The lack of differences in risk aversion and binding financial constraints by exposure to surveillance helps to support a beliefs channel.
With Big Data, decisions made by machine learning algorithms depend on training data generated by many individuals. In an experiment, we identify the effect of varying individual responsibility for the moral choices of an artificially intelligent algorithm. Across treatments, we manipulated the sources of training data and thus the impact of each individual’s decisions on the algorithm. Diffusing such individual pivotality for algorithmic choices increased the share of selfish decisions and weakened revealed prosocial preferences. This does not result from a change in the structure of incentives. Rather, our results show that Big Data offers an excuse for selfish behavior through lower responsibility for one’s and others’ fate.