Fri, 11/29/2013
Customer is a HIV researcher, and he has HIV integration sites sequenced in 454, and would like us to help analyze chromosome features around these integration sites.
Fri, 11/29/2013 at 7:22 AM
AccuraScience LB: what kind of data do you have about the chromatin features, and what features are you particularly interested in?
Wed, 12/11/2013 at 2:50 AM
Customer: I have 2 samples sequenced by 454. They are Jurkat WT and knock-down for a cellular factor and infected with HIV. We obtained >4000 integrations for WT and <1000 for the KD.
Now I need to map the sequence in the human genome to check if this sites are inside of genes (introns or exons) or outside (upstream or downstream to a gene) +/- 50 Kb.
I need to know even the chromatin features associated to these sequences like CpG island, DNaseI hypersensitive sites, Histone3K36me3, Histone3K27me3, Histone3K4me3, Histone3K27me3. You have to consider that the comparison between our sequences has to be done with the aforementioned chromatin features derived from Jurkat cells (I think that the data are available on line in some database).
I will need to know also if there is a difference in gene density between the sequences of the 2 samples.
Can all analysis will be followed by a statistic, like T student or others?
Wed, 12/11/2013 at 8:04 PM
AccuraScience LB: We looked up public data for the histone and other marks for Jurkat cells. Out of CpG island, DNaseI hypersensitive sites, H3K36me3, H3K27me3 and H3K4me3, DNaseI and H3K4me3 are available in ENCODE, and CpG islands can obtained from UCSC genome browser. However, there are no public data for H3K36me3 and H3K27me3 for this cell line.
T-test would not be proper to use because we would not be comparing two populations for differences in means. But we could perform Fisher's exact test using the hypergeometric distribution statistic, which will produce a p-value for significant difference in gene density between the two sets of integration sites.
Back to Other Selected Recent Inquiries
Note: LB stands for Lead Bioinformatician. An AccuraScience LB is a senior bioinformatics expert and leader of an AccuraScience data analysis team.
Disclaimer: This text was selected and edited based on genuine communications that took place between a customer and AccuraScience data analysis team at specified dates and times. The editing was made to protect the customer’s privacy and for brevity. The edited text may or may not have been reviewed and approved by the customer. AccuraScience is solely responsible for the accuracy of the information reflected in this text.