Filtern
Dokumenttyp
- Wissenschaftlicher Artikel (2) (entfernen)
Sprache
- Englisch (2) (entfernen)
Volltext vorhanden
- ja (2)
Gehört zur Bibliographie
- nein (2)
Schlagworte
- big data (2) (entfernen)
Data structures and advanced models of computation on big data : report from Dagstuhl seminar 14091
(2014)
This report documents the program and the outcomes of Dagstuhl Seminar 14091 "Data Structures and Advanced Models of Computation on Big Data". In today's computing environment vast amounts of data are processed, exchanged and analyzed. The manner in which information is stored profoundly influences the efficiency of these operations over the data. In spite of the maturity of the field many data structuring problems are still open, while new ones arise due to technological advances.
The seminar covered both recent advances in the "classical" data structuring topics as well as new models of computation adapted to modern architectures, scientific studies that reveal the need for such models, applications where large data sets play a central role, modern computing platforms for very large data, and new data structures for large data in modern architectures.
The extended abstracts included in this report contain both recent state of the art advances and lay the foundation for new directions within data structures research.
Publicly available compound and bioactivity databases provide an essential basis for data-driven applications in life-science research and drug design. By analyzing several bioactivity repositories, we discovered differences in compound and target coverage advocating the combined use of data from multiple sources. Using data from ChEMBL, PubChem, IUPHAR/BPS, BindingDB, and Probes & Drugs, we assembled a consensus dataset focusing on small molecules with bioactivity on human macromolecular targets. This allowed an improved coverage of compound space and targets, and an automated comparison and curation of structural and bioactivity data to reveal potentially erroneous entries and increase confidence. The consensus dataset comprised of more than 1.1 million compounds with over 10.9 million bioactivity data points with annotations on assay type and bioactivity confidence, providing a useful ensemble for computational applications in drug design and chemogenomics.