Analyzing Metagenomics Data for Taxonomic Composition Determination (9/8/2014)


Sun, 09/07/2014

Sun, 09/07/2014 at 2:56 PM

Customer: Does your group process metagenomics data sets? I am looking at processing of a single lane of metagenomics data from a HiSeq 2000 100 bp PE run. I am looking for filtering out of human reads, bacterial/viral/fungal identification and #hits at the genus (and species if sequence diversity allows) and family level, rarefaction curves to guide future sequencing depth and assembly of contigs for known human pathogens (up to 10).

Mon, 09/08/2014 at 4:45 PM

AccuraScience LB: Based on the information you have provided about this project, there are two general strategies that we might try out: (1) focusing only on the 16s rDNA genes portion of the data, and try the SILVA/arb pipeline. (2) following de novo assembly, run the assembled data through a comparative genomics system. MetaVelvet, GENOVO or Meta-IDBA could be used in the assembly step, and either MG-RAST or IMG/M could be used for the genome comparison system. It is hard to predict before-hand which strategy, and which combination of tools for the second strategy - will work out best.

Mon, 09/08/2014 at 4:59 PM

Customer: I don’t think the 16S strategy would work as the reads are very scattered and rare.

Mon, 09/08/2014 at 5:19 PM

AccuraScience LB: 16s rDNA accounts for one per a few thousand genes in a typical short-gun metagenomic sample. This translates to between 10,000 and 100,000 reads for 1 Hiseq lane worth of sequencing data, which is not bad. Illumina's deep coverage helps in this case, but the short read length could be a force against this strategy. It may not work, as you suggest, but it might be worth trying.

Back to Other Selected Recent Inquiries

Note: LB stands for Lead Bioinformatician. An AccuraScience LB is a senior bioinformatics expert and leader of an AccuraScience data analysis team.

Disclaimer: This text was selected and edited based on genuine communications that took place between a customer and AccuraScience data analysis team at specified dates and times. The editing was made to protect the customer’s privacy and for brevity. The edited text may or may not have been reviewed and approved by the customer. AccuraScience is solely responsible for the accuracy of the information reflected in this text.