- Short read Illumina data for the de novo assembly of a non-model snail species transcriptome (Radix balthica, Basommatophora, Pulmonata), and a comparison of assembler performance (2011)
- Background: Until recently, read lengths on the Solexa/Illumina system were too short to reliably assemble transcriptomes without a reference sequence, especially for non-model organisms. However, with read lengths up to 100 nucleotides available in the current version, an assembly without reference genome should be possible. For this study we created an EST data set for the common pond snail Radix balthica by Illumina sequencing of a normalized transcriptome. Performance of three different short read assemblers was compared with respect to: the number of contigs, their length, depth of coverage, their quality in various BLAST searches and the alignment to mitochondrial genes. Results: A single sequencing run of a normalized RNA pool resulted in 16,923,850 paired end reads with median read length of 61 bases. The assemblies generated by VELVET, OASES, and SeqMan NGEN differed in the total number of contigs, contig length, the number and quality of gene hits obtained by BLAST searches against various databases, and contig performance in the mt genome comparison. While VELVET produced the highest overall number of contigs, a large fraction of these were of small size (< 200bp), and gave redundant hits in BLAST searches and the mt genome alignment. The best overall contig performance resulted from the NGEN assembly. It produced the second largest number of contigs, which on average were comparable to the OASES contigs but gave the highest number of gene hits in two out of four BLAST searches against different reference databases. A subsequent meta-assembly of the four contig sets resulted in larger contigs, less redundancy and a higher number of BLAST hits. Conclusion: Our results document the first de novo transcriptome assembly of a non-model species using Illumina sequencing data. We show that de novo transcriptome assembly using this approach yields results useful for downstream applications, in particular if a meta-assembly of contig sets is used to increase contig quality. These results highlight the ongoing need for improvements in assembly methodology. Keywords: next generation sequencing; short read assembly; Mollusca
- Genomic basis of ecological niche divergence among cryptic sister species of non-biting midges (2013)
- Background: There is a lack of understanding the evolutionary forces driving niche segregation of closely related organisms. In addition, pinpointing the genes driving ecological divergence is a key goal in molecular ecology. Here, larval transcriptome sequences obtained by next-generation-sequencing are used to address these issues in a morphologically cryptic sister species pair of non-biting midges (Chironomus riparius and C. piger). Results: More than eight thousand orthologous open reading frames were screened for interspecific divergence and intraspecific polymorphisms. Despite a small mean sequence divergence of 1.53% between the sister species, 25.1% of 18,115 observed amino acid substitutions were inferred by α statistics to be driven by positive selection. Applying McDonald-Kreitman tests to 715 alignments of gene orthologues identified eleven (1.5%) genes driven by positive selection. Conclusions: Three candidate genes were identified as potentially responsible for the observed niche segregation concerning nitrite concentration, habitat temperature and water conductivity. Additionally, signs of positive selection in the hydrogen sulfide detoxification pathway were detected, providing a new plausible hypothesis for the species’ ecological differentiation. Finally, a divergently selected, nuclear encoded mitochondrial ribosomal protein may contribute to reproductive isolation due to cytonuclear coevolution.
- Factors and processes shaping the population structure and distribution of genetic variation across the species range of the freshwater snail Radix balthica (Pulmonata, Basommatophora) (2011)
- Background: Factors and processes shaping the population structure and spatial distribution of genetic diversity across a species' distribution range are important in determining the range limits. We comprehensively analysed the influence of recurrent and historic factors and processes on the population genetic structure, mating system and the distribution of genetic variability of the pulmonate freshwater snail Radix balthica. This analysis was based on microsatellite variation and mitochondrial haplotypes using Generalised Linear Statistical Modelling in a Model Selection framework. Results: Populations of R. balthica were found throughout North-Western Europe with range margins marked either by dispersal barriers or the presence of other Radix taxa. Overall, the population structure was characterised by distance independent passive dispersal mainly along a Southwest-Northeast axis, the absence of isolation-by-distance together with rather isolated and genetically depauperated populations compared to the variation present in the entire species due to strong local drift. A recent, climate driven range expansion explained most of the variance in genetic variation, reducing at least temporarily the genetic variability in this area. Other factors such as geographic marginality and dispersal barriers play only a minor role. Conclusions: To our knowledge, such a population structure has rarely been reported before. It might nevertheless be typical for passively dispersed, patchily distributed taxa (e.g. freshwater invertebrates). The strong local drift implied in such a structure is expected to erode genetic variation at both neutral and coding loci and thus probably diminish evolutionary potential. This study shows that the analysis of multiple factors is crucial for the inference of the processes shaping the distribution of genetic variation throughout species ranges. Additional files Additional file 1: Distribution of Radix taxa. Spatial distribution of the Radix MOTU as defined in Pfenninger et al. 2006 plus an additional, newly discovered taxon. This map is the basis for the inference of the species range of R. balthica. Additional file 2: Sampling site table and spatial distribution of diversity indices, selfing estimates and inferred population bottlenecks for R. balthica. Table of sampling site code, geographical position in decimal degrees latitude and longitude, number of individuals analysed with microsatellites (Nnuc), expected heterozygosity (HE) and standard deviation across loci, mean rarefied number of alleles per microsatellite locus (A) and their standard deviation, number of individuals analysed for mitochondrial variation (Nmt), rarefied number of mitochondrial COI haplotypes (Hmt), number of individuals measured for body size (Nsize). Figures A1 - A3 show a graphical representation of the spatial distribution of He, Hmt and, s, respectively. Additional file 3: Assessment of environmental marginality. PCA (principle component analysis) on 35 climatic parameters for the period from 1960 - 2000 from publicly availableWorldClim data. Additional file 4: Inference of a recent climate driven range expansion in R. balthica. Analysis of the freshwater benthos long term monitoring data of the Swedish national monitoring databases at the Swedish University of Agricultural Sciences SLU with canonical correspondence analysis.