Filling Gaps in Genome Assembled with Ion PGM and PacBio Data (11/21/2013)


Thu, 11/21/2013 at 9:19 AM

Customer is a USDA researcher. He talks about difficulty filling gaps in fungal genome assembled with Ion PGM and PacBio data.

Thu, 11/21/2013 at 2:41 PM

AccuraScience LB: We would like to know some parameters for the genome and assembly, e.g., the size of the genome (fungal genomes are typically about 30MB), the current contig N50 and average contig size, and coverage is for existing sequencing data.

If the coverage is low, then you might need to perform more sequencing experiments. If there are too many contigs, we could try to re-assemble it using different software/pipelines. If both contig parameters and coverage look good, then we could try to figure out if there are better ways to fill the gaps.

The options you have may also have to do what you are looking for. If you just want to get the gene/protein's sequences, then it may not be necessary for you to get very long contigs. Gene prediction could go ahead with the shorter contigs.

Wed, 12/04/2013 at 4:59 PM

Customer: Our genome is around 40Mb and our N50 is only 37Kb. Our main focus is to connect contigs to make positional cloning of genetically mapped genes possible. We are just trying to figure out what the best approach would be.

Thu, 12/05/2013 at 11:33 AM

AccuraScience LB: Is the 37Kb N50 for contigs or for scaffolds?

It would help for us to know how many scaffolds you have got in total, and what's the maximal scaffold size you've got.

If the maximal scaffold size is on the Mb level, then chances are, existing bioinformatics methods may not offer a lot of improvements to what you have already got. Some species are rich in repeats e.g. transposon elements, which sets limits for what bioinformatics can do.

Did you consider performing mate-pair experiments for the samples?

Thu, 12/05/2013 at 11:38 AM

Customer: The 37Kb N50 is for contigs and this is based on a single assembly done using the CLC genomics workbench de novo assembly. Our largest contig is 222Kb. Yes, we would like to do some pair-ended experiments in the future but we were hoping that the PacBio data that we have (10X coverage) could also help us pull contigs together.

Thu, 12/05/2013 at 12:05 PM

AccuraScience LB: Because your main difficulty is with making larger scale connections, we would recommend mate-pair experiments (which are different from pair-ended experiments). A contig N50 of 37Kb is already good.

The PacBio data you have could indeed help, and 10X coverage is quite good. But it would likely require some customized twisting through a trial-and-error process. Although PacBio offer some longer reads, their error rates are high.

Back to Other Selected Recent Inquiries

Note: LB stands for Lead Bioinformatician. An AccuraScience LB is a senior bioinformatics expert and leader of an AccuraScience data analysis team.

Disclaimer: This text was selected and edited based on genuine communications that took place between a customer and AccuraScience data analysis team at specified dates and times. The editing was made to protect the customer’s privacy and for brevity. The edited text may or may not have been reviewed and approved by the customer. AccuraScience is solely responsible for the accuracy of the information reflected in this text.