Refine
Document Type
- Article (1)
- Conference Proceeding (1)
- Part of Periodical (1)
Language
- English (3) (remove)
Has Fulltext
- yes (3) (remove)
Is part of the Bibliography
- no (3) (remove)
Keywords
- data quality (3) (remove)
Institute
- Informatik (1)
- Medizin (1)
Electroencephalography (EEG) represents a widely established method for assessing altered and typically developing brain function. However, systematic studies on EEG data quality, its correlates, and consequences are scarce. To address this research gap, the current study focused on the percentage of artifact-free segments after standard EEG pre-processing as a data quality index. We analyzed participant-related and methodological influences, and validity by replicating landmark EEG effects. Further, effects of data quality on spectral power analyses beyond participant-related characteristics were explored. EEG data from a multicenter ADHD-cohort (age range 6 to 45 years), and a non-ADHD school-age control group were analyzed (ntotal = 305). Resting-state data during eyes open, and eyes closed conditions, and task-related data during a cued Continuous Performance Task (CPT) were collected. After pre-processing, general linear models, and stepwise regression models were fitted to the data. We found that EEG data quality was strongly related to demographic characteristics, but not to methodological factors. We were able to replicate maturational, task, and ADHD effects reported in the EEG literature, establishing a link with EEG-landmark effects. Furthermore, we showed that poor data quality significantly increases spectral power beyond effects of maturation and symptom severity. Taken together, the current results indicate that with a careful design and systematic quality control, informative large-scale multicenter trials characterizing neurophysiological mechanisms in neurodevelopmental disorders across the lifespan are feasible. Nevertheless, results are restricted to the limitations reported. Future work will clarify predictive value.
The archaeological data dealt with in our database solution Antike Fundmünzen in Europa (AFE), which records finds of ancient coins, is entered by humans. Based on the Linked Open Data (LOD) approach, we link our data to Nomisma.org concepts, as well as to other resources like Online Coins of the Roman Empire (OCRE). Since information such as denomination, material, etc. is recorded for each single coin, this information should be identical for coins of the same type. Unfortunately, this is not always the case, mostly due to human errors. Based on rules that we implemented, we were able to make use of this redundant information in order to detect possible errors within AFE, and were even able to correct errors in Nomimsa.org. However, the approach had the weakness that it was necessary to transform the data into an internal data model. In a second step, we therefore developed our rules within the Linked Open Data world. The rules can now be applied to datasets following the Nomisma. org modelling approach, as we demonstrated with data held by Corpus Nummorum Thracorum (CNT). We believe that the use of methods like this to increase the data quality of individual databases, as well as across different data sources and up to the higher levels of OCRE and Nomisma.org, is mandatory in order to increase trust in them.
In the mid-2000s, molecular phylogenetics turned into phylogenomics, a development that improved the resolution of phylogenetic trees through a dramatic reduction in stochastic error. While some then predicted “the end of incongruence”, it soon appeared that analysing large amounts of sequence data without an adequate model of sequence evolution amplifies systematic error and leads to phylogenetic artefacts. With the increasing flood of (sometimes low-quality) genomic data resulting from the rise of high-throughput sequencing, a new type of error has emerged. Termed here “data errors”, it lumps together several kinds of issues affecting the construction of phylogenomic supermatrices (e.g., sequencing and annotation errors, contaminant sequences). While easy to deal with at a single-gene scale, such errors become very difficult to avoid at the genomic scale, both because hand curating thousands of sequences is prohibitively time-consuming and because the suitable automated bioinformatics tools are still in their infancy. In this paper, we first review the pitfalls affecting the construction of supermatrices and the strategies to limit their adverse effects on phylogenomic inference. Then, after discussing the relative non-issue of missing data in supermatrices, we briefly present the approaches commonly used to reduce systematic error.