Fri, 11/29/2013 at 8:29 AM
Customer is a tree biologist, and she wants to study hybridization/introgression among a few putative closely related tree species in Europe with RAD-seq. A draft genome of one of the species is available. An existing RAD-based study identified 4 loci as specific to one of the species, as they were homozygous for all individuals in the same species but were not present for any sequenced individuals from other species. Customer 2013.11.29.08.29 wants to extend this effort to further clarify historical hybridization/ingression events across the species.
Fri, 11/29/2013 at 5:05 PM
AccuraScience LB: The existing study documented 114 million reads from 15 individual plants. Assuming you are to use the same restriction enzyme, we would expect to have 266 million reads, that's a good number of reads for two lanes of Hiseq experiments.
A question about your project design. The existing study was able to identify only 4 species-specific loci with 2 individuals within the species and 13 individuals from other species. With 5 individuals within the species and 30 individuals from other species in your project design, you would expect even smaller number of homozygous loci to be discovered. This prospect could jeopardize your project.
Sat, 11/30/2013 at 8:58 AM
Customer: If it will be not possible to find 4 homozygous diagnostic loci for each species, that is also a result. It may suggest that hybridization is substantial and very recent. It will be, of course, a confusion for further analyses. I can, of course, relax the project conditions and adopt a criterion of at least 95% (the frequency of the most common allele). I think a such solution will be even necessary in the face of lack of full homozygosity.
Sat, 11/30/2013 at 10:01 AM
AccuraScience LB: Requiring full homozygosity is an overly simplistic method to define species-specific loci. There are variations within the same species or the same population. The SNV level across human individuals is ~0.1%, and that across individuals of the some tree species could be one order of magnitude higher - this is one of the difficulties in studying tree genomes. In Wang et al 2012, the size of the RAD marker was 300-600 nt. Assuming 0.7% SNV level, just by chance, 2-4 single nucleotide differences will occur between two parents in this 300-600 nt fragment. Simple math allows me to say with good confidence that with 5 individuals within the same species and 30 individuals of other species, applying the method described in this previous study, you would likely not get a single full homozygous locus.
The "correct" way of defining species-specific loci is to take the SNV levels within and between species into account, and work out a threshold of allowed different bases across the 10 chromosomes for the 5 individuals, this is more accurate, and will likely give you orders of magnitude higher number of species-specific loci.
Sat, 11/30/2013 at 10:59 AM
Customer: You are right that strict homozygous loci are very rare. I was also suspicious at this result. Information about SNV levels within and between birch species will be obtained during analysis of my sequenced material.
Back to Other Selected Recent Inquiries
Note: LB stands for Lead Bioinformatician. An AccuraScience LB is a senior bioinformatics expert and leader of an AccuraScience data analysis team.
Disclaimer: This text was selected and edited based on genuine communications that took place between a customer and AccuraScience data analysis team at specified dates and times. The editing was made to protect the customer’s privacy and for brevity. The edited text may or may not have been reviewed and approved by the customer. AccuraScience is solely responsible for the accuracy of the information reflected in this text.