Refine
Document Type
- Article (6) (remove)
Language
- English (6)
Has Fulltext
- yes (6)
Is part of the Bibliography
- no (6)
Keywords
- Data processing (6) (remove)
Institute
A large body of evidence suggests that the 11+ warm-up programme is effective in preventing football-related musculoskeletal injuries. However, despite considerable efforts to promote and disseminate the programme, it is unclear as to whether team head coaches are familiar with the 11+ and how they rate its feasibility. The present study aimed to gather information on awareness and usage among German amateur level football coaches. A questionnaire was administered to 7893 individuals who were in charge of youth and adult non-professional teams. Descriptive and inferential statistics were used to analyse the obtained data. A total of 1223 coaches (16%) returned the questionnaire. There was no risk of a non-response bias (p>.05). At the time of the survey, nearly half of the participants (42.6%) knew the 11+. Among the coaches who were familiar with the programme, three of four reported applying it regularly (at least once per week). Holding a license (φ = .28, p < .0001), high competitive level (Cramer-V = .13, p = .007), and coaching a youth team (φ = .1, p = .001) were associated with usage of 11+. Feasibility and suitability of the 11+ were rated similarly by aware and unaware coaches. Although a substantial share of German amateur level coaches is familiar with the 11+, more than half of the surveyed participants did not know the programme. As the non-usage does not appear to stem from a lack of rated feasibility and suitability, existing communication strategies might need to be revised.
Most studies in the life sciences and other disciplines involve generating and analyzing numerical data of some type as the foundation for scientific findings. Working with numerical data involves multiple challenges. These include reproducible data acquisition, appropriate data storage, computationally correct data analysis, appropriate reporting and presentation of the results, and suitable data interpretation.
Finding and correcting mistakes when analyzing and interpreting data can be frustrating and time-consuming. Presenting or publishing incorrect results is embarrassing but not uncommon. Particular sources of errors are inappropriate use of statistical methods and incorrect interpretation of data by software. To detect mistakes as early as possible, one should frequently check intermediate and final results for plausibility. Clearly documenting how quantities and results were obtained facilitates correcting mistakes. Properly understanding data is indispensable for reaching well-founded conclusions from experimental results. Units are needed to make sense of numbers, and uncertainty should be estimated to know how meaningful results are. Descriptive statistics and significance testing are useful tools for interpreting numerical results if applied correctly. However, blindly trusting in computed numbers can also be misleading, so it is worth thinking about how data should be summarized quantitatively to properly answer the question at hand. Finally, a suitable form of presentation is needed so that the data can properly support the interpretation and findings. By additionally sharing the relevant data, others can access, understand, and ultimately make use of the results.
These quick tips are intended to provide guidelines for correctly interpreting, efficiently analyzing, and presenting numerical data in a useful way.
Optimal distribution-preserving downsampling of large biomedical data sets (opdisDownsampling)
(2021)
Motivation: The size of today’s biomedical data sets pushes computer equipment to its limits, even for seemingly standard analysis tasks such as data projection or clustering. Reducing large biomedical data by downsampling is therefore a common early step in data processing, often performed as random uniform class-proportional downsampling. In this report, we hypothesized that this can be optimized to obtain samples that better reflect the entire data set than those obtained using the current standard method. Results: By repeating the random sampling and comparing the distribution of the drawn sample with the distribution of the original data, it was possible to establish a method for obtaining subsets of data that better reflect the entire data set than taking only the first randomly selected subsample, as is the current standard. Experiments on artificial and real biomedical data sets showed that the reconstruction of the remaining data from the original data set from the downsampled data improved significantly. This was observed with both principal component analysis and autoencoding neural networks. The fidelity was dependent on both the number of cases drawn from the original and the number of samples drawn. Conclusions: Optimal distribution-preserving class-proportional downsampling yields data subsets that reflect the structure of the entire data better than those obtained with the standard method. By using distributional similarity as the only selection criterion, the proposed method does not in any way affect the results of a later planned analysis.
Finding subgroups in biomedical data is a key task in biomedical research and precision medicine. Already one-dimensional data, such as many different readouts from cell experiments, preclinical or human laboratory experiments or clinical signs, often reveal a more complex distribution than a single mode. Gaussian mixtures play an important role in the multimodal distribution of one-dimensional data. However, although fitting of Gaussian mixture models (GMM) is often aimed at obtaining the separate modes composing the mixture, current technical implementations, often using the Expectation Maximization (EM) algorithm, are not optimized for this task. This occasionally results in poorly separated modes that are unsuitable for determining a distinguishable group structure in the data. Here, we introduce “Distribution Optimization” an evolutionary algorithm to GMM fitting that uses an adjustable error function that is based on chi-square statistics and the probability density. The algorithm can be directly targeted at the separation of the modes of the mixture by employing additional criterion for the degree by which single modes overlap. The obtained GMM fits were comparable with those obtained with classical EM based fits, except for data sets where the EM algorithm produced unsatisfactory results with overlapping Gaussian modes. There, the proposed algorithm successfully separated the modes, providing a basis for meaningful group separation while fitting the data satisfactorily. Through its optimization toward mode separation, the evolutionary algorithm proofed particularly suitable basis for group separation in multimodally distributed data, outperforming alternative EM based methods.
Investigating three-dimensional (3D) structures of proteins in living cells by in-cell nuclear magnetic resonance (NMR) spectroscopy opens an avenue towards understanding the structural basis of their functions and physical properties under physiological conditions inside cells. In-cell NMR provides data at atomic resolution non-invasively, and has been used to detect protein-protein interactions, thermodynamics of protein stability, the behavior of intrinsically disordered proteins, etc. in cells. However, so far only a single de novo 3D protein structure could be determined based on data derived only from in-cell NMR. Here we introduce methods that enable in-cell NMR protein structure determination for a larger number of proteins at concentrations that approach physiological ones. The new methods comprise (1) advances in the processing of non-uniformly sampled NMR data, which reduces the measurement time for the intrinsically short-lived in-cell NMR samples, (2) automatic chemical shift assignment for obtaining an optimal resonance assignment, and (3) structure refinement with Bayesian inference, which makes it possible to calculate accurate 3D protein structures from sparse data sets of conformational restraints. As an example application we determined the structure of the B1 domain of protein G at about 250 μM concentration in living E. coli cells.
Specialized de novo assemblers for diverse datatypes have been developed and are in widespread use for the analyses of single-cell genomics, metagenomics and RNA-seq data. However, assembly of large sequencing datasets produced by modern technologies is challenging and computationally intensive. In-silico read normalization has been suggested as a computational strategy to reduce redundancy in read datasets, which leads to significant speedups and memory savings of assembly pipelines. Previously, we presented a set multi-cover optimization based approach, ORNA, where reads are reduced without losing important k-mer connectivity information, as used in assembly graphs. Here we propose extensions to ORNA, named ORNA-Q and ORNA-K, which consider a weighted set multi-cover optimization formulation for the in-silico read normalization problem. These novel formulations make use of the base quality scores obtained from sequencers (ORNA-Q) or k-mer abundances of reads (ORNA-K) to improve normalization further. We devise efficient heuristic algorithms for solving both formulations. In applications to human RNA-seq data, ORNA-Q and ORNA-K are shown to assemble more or equally many full length transcripts compared to other normalization methods at similar or higher read reduction values. The algorithm is implemented under the latest version of ORNA (v2.0, https://github.com/SchulzLab/ORNA).