Analysis of Bacterial Isolates from Hospital Patients (9/5/2014)


Sat, 08/30/2014

Sat, 08/30/2014 at 5:34 PM

Customer: I have a graduate student working on a PhD on a bacterial species. The project involves sequencing (NGS) and doing bioinformatics on approx. 200 isolates. The isolates are from patients in the intensive care unit. The isolates are serially collected from patients. Thus about 4-5 isolates will be from a single patient collected over a period of weeks to months. We are looking to study genetic changes in these serial isolates.

Mon, 09/01/2014 at 11:30 AM

AccuraScience LB: The genome sequences of a large number of strains are available for this species, thus the basic bioinformatics analysis would include identification of a proper strain whose genome can be used as a reference genome, and mapping of the reads to the reference genome, then carrying out SNP and Indel calling. These steps are relatively straightforward.

The real challenging part of this project is the functional genomics analysis, which might include (1) identification and characterization of virulence factors, drug resistance factors and adaptation mechanisms specific to each isolate, and a cross-comparison of these factors/mechanisms across isolates and with previously characterized strains; comparisons with other related species might be necessary to characterize horizontal transfer events, and (2) Phylogenetic analysis across isolates and with existing strains for purposes of identifying the path of transmission, among other things.

Customer and AccuraScience LB discuss the project over phone. Customer asks AccuraScience LB to provide a cost estimate of the sequencing experiments.

Fri, 09/05/2014 at 10:39 AM

AccuraScience LB: The genome size of this species is about 4 million bps. At least 50X coverage would be recommended for reliable SNP/Indel calling, thus the total amount of sequencing required is 4M X 50 X 200 = 1E11 bps. One Hiseq lane would produce ~100M reads, and if we use a typical 100bp pair-ended setting, it will correspond to 2E10 bps per lane. Thus it will take a total of 5 lanes of Hiseq experiments.

Each lane of experiment would cost ~$3500, plus barcoding or multiplexing cost (library preparation), and the latter is typically charged at $300 per sample. Thus the total barcoding or multiplexing cost would be $300 X 200 = $60K. The sequencing cost per se would cost $3500 X 5 = $17.5K. As it turns out, the barcoding or multiplexing cost would exceed the sequencing cost, just because of the large number of samples...

Back to Other Selected Recent Inquiries

Note: LB stands for Lead Bioinformatician. An AccuraScience LB is a senior bioinformatics expert and leader of an AccuraScience data analysis team.

Disclaimer: This text was selected and edited based on genuine communications that took place between a customer and AccuraScience data analysis team at specified dates and times. The editing was made to protect the customer’s privacy and for brevity. The edited text may or may not have been reviewed and approved by the customer. AccuraScience is solely responsible for the accuracy of the information reflected in this text.