Refine
Document Type
- Article (3)
- Working Paper (1)
Language
- English (4)
Has Fulltext
- yes (4)
Is part of the Bibliography
- no (4)
Keywords
- missing data (4) (remove)
Institute
he predictive likelihood is of particular relevance in a Bayesian setting when the purpose is to rank models in a forecast comparison exercise. This paper discusses how the predictive likelihood can be estimated for any subset of the observable variables in linear Gaussian state-space models with Bayesian methods, and proposes to utilize a missing observations consistent Kalman filter in the process of achieving this objective. As an empirical application, we analyze euro area data and compare the density forecast performance of a DSGE model to DSGE-VARs and reduced-form linear Gaussian models.
Premise: Both universal and family-specific targeted sequencing probe kits are becoming widely used for reconstruction of phylogenetic relationships in angiosperms. Within the pantropical Ochnaceae, we show that with careful data filtering, universal kits are equally as capable in resolving intergeneric relationships as custom probe kits. Furthermore, we show the strength in combining data from both kits to mitigate bias and provide a more robust result to resolve evolutionary relationships.
Methods: We sampled 23 Ochnaceae genera and used targeted sequencing with two probe kits, the universal Angiosperms353 kit and a family-specific kit. We used maximum likelihood inference with a concatenated matrix of loci and multispecies-coalescence approaches to infer relationships in the family. We explored phylogenetic informativeness and the impact of missing data on resolution and tree support.
Results: For the Angiosperms353 data set, the concatenation approach provided results more congruent with those of the Ochnaceae-specific data set. Filtering missing data was most impactful on the Angiosperms353 data set, with a relaxed threshold being the optimum scenario. The Ochnaceae-specific data set resolved consistent topologies using both inference methods, and no major improvements were obtained after data filtering. Merging of data obtained with the two kits resulted in a well-supported phylogenetic tree.
Conclusions: The Angiosperms353 data set improved upon data filtering, and missing data played an important role in phylogenetic reconstruction. The Angiosperms353 data set resolved the phylogenetic backbone of Ochnaceae as equally well as the family specific data set. All analyses indicated that both Sauvagesia L. and Campylospermum Tiegh. as currently circumscribed are polyphyletic and require revised delimitation.
The paper reports an investigation on whether valid results can be achieved in analyzing the structure of datasets although a large percentage of data is missing without replacement. Two types of confirmatory factor analysis (CFA) models were employed for this purpose: the missing data CFA model with an additional latent variable for representing the missing data and the semi-hierarchical CFA model that also includes the additional latent variable and reflects the hierarchical structure assumed to underlie the data. Whereas, the missing data CFA model assumes that the model is equally valid for all participants, the semi-hierarchical CFA model is implicitly specified differently for subgroups of participants with and without omissions. The comparison of these models with the regular one-factor model in investigating simulated binary data revealed that the modeling of missing data prevented negative effects of missing data on model fit. The investigation of the accuracy in estimating the factor loadings yielded the best results for the semi-hierarchical CFA model. The average estimated factor loadings for items with and without omissions showed the expected equal sizes. But even this model tended to underestimate the expected values.
Phylogenetic relationships of the primarily wingless insects are still considered unresolved. Even the most comprehensive phylogenomic studies that addressed this question did not yield congruent results. In order to get a grip on these problems, we here analyzed the sources of incongruence in these phylogenomic studies using an extended transcriptome dataset.Our analyses showed that unevenly distributed missing data can be severely misleading by inflating node support despite the absence of phylogenetic signal. In consequence, only decisive datasets should be used which exclusively comprise data blocks containing all taxa whose relationships are addressed. Additionally, we employed Four-cluster Likelihood-Mapping (FcLM) to measure the degree of congruence among genes of a dataset, as a measure of support alternative to bootstrap. FcLM showed incongruent signal among genes, which in our case is correlated with neither functional class assignment of these genes, nor with model misspecification due to unpartitioned analyses. The herein analyzed dataset is the currently largest dataset covering primarily wingless insects, but failed to elucidate their interordinal phylogenetic relationships. While this is unsatisfying from a phylogenetic perspective, we try to show that the analyses of structure and signal within phylogenomic data can protect us from biased phylogenetic inferences due to analytical artefacts.