Bulked Segregant Analysis for Weed Avirulence with Sequencing Data (11/27/2013)


Thu, 11/21/2013 at 3:16 AM

Customer is a plant biologist. His team is working on a project characterizing the avirulence gene in a parasite weed of a major crop. Molecular marker-based bulked segregant analysis (BSA) will not work due to very low level of polymorphism between the parent strains, and his team is looking into NGS-based solutions. There is practically no molecular information for the species of interest, and no existing NGS data. Thus they plan to sequence the whole genomes of the pooled F2 samples then ask AccuraScience to perform BSA analysis for them.

Thu, 11/21/2013 at 6:06 PM

AccuraScience LB: Do I understand it correctly that both parent strains and all F2 strains are of absolutely the same genotype everywhere else, and the gene of interest is the only exception? I am puzzled because if this is the case, you could have performed targeted sequencing of a small regions surrounding the single gene, rather than sequencing the whole genome.

A logical approach to analyzing the data to achieve your objective, given the available data, is to conduct genome assembly, followed by SNV detection then identification of SNVs highly correlated with the phenotypes. But genome assembly for a plant species is already a very major endeavor, with low probability of success: There is only a small handful of plant species with close to complete assembly.

If it is possible to get the gene sequence, or the sequence of a homolog gene in a related species, we might try to map the reads to this gene sequence directly, then identify the segregating SNVs in the mappable regions. This might or might not work, depending on the quality of the gene sequence - but this is much easier to try out than the genome assembly strategy.

Fri, 11/22/2013 at 3:15 AM

Customer: According to phenotypic evaluation, there is only one gene determining phenotypic differences. Targeted sequencing of a small region surrounding the gene of interest would be the best approach. The problem is that we do not have any information about the position of the gene, and so I think we do not have any tool for targeting the region.

Fri, 11/22/2013 at 5:17 PM

AccuraScience LB: If the whole-genome sequencing experiments have not been performed, you might consider whether RNA-seq would be a good alternative. It is possible to assemble RNA-seq data into transcripts without the reference genome, and this is way less challenging than assembling the whole-genome. After transcriptome assembly, we could attempt to map the reads for the two F2 pools back into the reference transcriptome sequences, then carry out BSA. In order to get the transcripts of the right gene, you might need to apply certain stimuli to the plants (I am guess it will be some kind of exposure to a proper pathogen). An advantage of the RNA-seq approach is, if you could induce the right response, the representation of the gene of interest in the sequencing data could be substantially boosted: in the genome, each gene is represented as exactly 2 copies a cell and this does not vary across genes, but in the transcriptome, a highly expressed gene could be represented by up to 100,000 fold higher than that of a lowly expressed gene.

If the whole-genome sequencing experiments have already been performed, or if there are difficulties that we are unaware of that make RNA-seq a bad option, and if there is no way to narrow down the sequence pattern in the region of interest, we will have to try to assemble the genome. As we discussed, the risk is high. The good news is, the BSA analysis does not require very long contigs to be assembled, that is, even if we do not get an assembly that would be regarded as a "success" by the current genome assembly standard, we might still be able to go ahead to do BSA successfully - given that it is indeed a base-level mutation that we are looking for.

Mon, 11/25/2013 at 3:26 AM

Customer sends two papers, one published by his group describing genetic analysis of the gene of interest, the other describing a computational pipeline for RNA-seq based BSA analysis. He says his group has not performed whole-genome sequencing experiments, and asks AccuraScience for comment on the computational pipeline cited in the second paper.

Wed, 11/27/2013 at 10:26 AM

AccuraScience LB: This paper from your group convincingly shows that the factor determining the different avirulence between the two strains is of Mendelian nature. However, available data are unable to further distinguish the type of this factor - whether it is a SNV, a short insertion/deletion, or a copy-number change, due to transposon activity or anther mechanism. Reading the description of how the new strain came to existence, I personally would give at least an equal bet on a CNV/transposon theory as on a SNV theory.

So NGS-based approaches are a good way to go - they at least offer the possibility of resolving the type and identity of the variation. But CNVs would be much more difficult to resolve than SNVs, even with properly generated NGS data.

The pipeline described in the second paper was developed for 454 platform, which is being discontinued by Roche. Unless you have access to a 454 sequencer and would like to take the risk of using the discontinued product, I would recommend Illumina instead (which is also substantially less expensive). Please see the following two papers: http://www.ncbi.nlm.nih.gov/pubmed/23299976 and http://www.ncbi.nlm.nih.gov/pubmed/23299975. Also see this comment article in Nature Reviews Genetics: http://www.nature.com/nrg/journal/v14/n3/full/nrg3432.html. If you are going to use Illumina sequencing, the pipeline designed for 454 would not be proper to use, because SNV calling for Illumina data is considerably more complicated. And something similar to described in the two Genome Research papers above would need to be developed for analyzing the data.

Back to Other Selected Recent Inquiries

Note: LB stands for Lead Bioinformatician. An AccuraScience LB is a senior bioinformatics expert and leader of an AccuraScience data analysis team.

Disclaimer: This text was selected and edited based on genuine communications that took place between a customer and AccuraScience data analysis team at specified dates and times. The editing was made to protect the customer’s privacy and for brevity. The edited text has been reviewed and approved by the customer.