Integration Site Identification (1/15/2014)

Wed, 01/15/2014

Customer wants to identify the integration sites of an episome in the host genome.

Wed, 01/15/2014 at 10:19 AM

AccuraScience LB: The analysis strategy for identifying the episome integration sites involves execution of the following pipeline: 1) The reads are mapped to combined reference genome (host+episome) by BWA. 2) From the BWA output, we find the read pairs in which one read is mapped to the host genome and the other is mapped to episome. 3) Cluster these read pairs by their locations on host and episome. Each cluster defines an integration region. However, at this step, we cannot determine the 'exact' integration site. 4) For each cluster (integration region), we generate the longest sequences (at the host genome side), determined by the mean and standard deviation of the insert size estimated from other read pairs. 5) Out of the unmapped reads from step 1), some are the 'clip-read', which cover the junctions of the integration sites. We map all unmapped reads to the junction sequence sets by BWA. It is expected that some of the unmapped reads can be mapped to the junction sequence sets, while some cannot (true garbage). Eventually, we can determine the exact integration sites.

Back to Other Selected Recent Inquiries

Note: LB stands for Lead Bioinformatician. An AccuraScience LB is a senior bioinformatics expert and leader of an AccuraScience data analysis team.

Disclaimer: This text was selected and edited based on genuine communications that took place between a customer and AccuraScience data analysis team at specified dates and times. The editing was made to protect the customer’s privacy and for brevity. The edited text may or may not have been reviewed and approved by the customer. AccuraScience is solely responsible for the accuracy of the information reflected in this text.

FAQs

Support

Integration Site Identification (1/15/2014)