Integrated Analysis of RNA-seq and DNA Methylation Data (9/17/2014)


Wed, 09/10/2014

Customer explains his stem cell research and would like AccuraScience to analyze his RNA-seq and DNA methylation data in an integrated manner.

Wed, 09/10/2014 at 4:27 PM

AccuraScience LB: Could you tell me which experimental approach did you take to measure the DNA methylation profiles - was it bisulphite sequencing, bisulphite arrays, or an enrichment-based method (either antibody or restriction enzyme-based)? The analysis methods vary greatly across these different experimental approaches.

Wed, 09/10/2014 at 8:48 PM

Customer: We did whole genome methylation analysis (Illumina) (http://cgs.hku.hk/portal/index.php/iscan) for my DNA samples.

Fri, 09/12/2014 at 10:56 PM

AccuraScience LB: One of our Lead Bioinformaticians is a very good expert of the Infinium 450k arrays. In a nutshell, based on your expectation and available data, what we would propose to do includes the following:

(1) Differential expression analysis across the 3 cells, followed by pathway enrichment analysis for the identified differentially expressed cells: multiple p-value/FDR cut-offs should be tried in order to see an optimal pathway analysis results. The expected result is Wnt pathway is most significantly enriched in the differentailly expressed genes. There may be other pathways showing up as significant pathways (e.g., de-differentiation, or iPSC development), which might help you identify the next direction in your mechanistic studies.

(2) Identify the differentially methylated regions, and possibly differentially methylated sites between two of the cell lines, followed by pathway enrichment analysis, similarly to described in (1). I would expect Wnt pathway showing up at the top of the significant pathways too, but I am less certain on this than in the RNA-seq analysis. Once again, we have to try multiple stringency levels and identify the one that offers the optimal outcome. There might also be other interesting pathways showing up as significantly enriched - this is hard to predict, and this info might lead the way for your next-step mechanistic analysis just like the RNA-seq analysis result does.

(3) Integrated analysis of the RNA-seq and the DNA mehtylation array data. We would attempt to identify inverse correlation between expression of a gene and differentially methylated regions around the promoter of the gene. Chances are those genes possessing such inverse relationships are also enriched for Wnt pathway (and perhaps other pathways of potential interest).

(4) It is a little hard to predict what other results might come out, e.g., some of the highly differentially methylated regions or sites might ring a bell as a master regulator key to epithelial cell development; maybe one of the genes in the Yamanaka cocktail is highly changed in one of the cells but others are not; or maybe different splice isoforms are found in one of the cells (which can be seen through RNA-seq analysis) - which would point to new research directions. These will have to be tried one by one, and if anything promising comes out, we will design follow-up analysis on the fly.

One technical question: could you tell me how many replicates you used for the RNA-seq and Infinium array experiments?

Tue, 09/16/2014 at 10:06 PM

Customer: We strictly control the passages of tested cell lines, because we believe that culture conditions may influence expression files. Actually, we sent RNAs and DNAs from 2-3 different passages of each cell line for seq. analyses. They split RNAs and DNAs samples for seq analyses (Not like tissue samples, we expanded a lot of cells). Our Seq. Center only gave us one report for one passage. Hope that I answer your questions.

Wed, 09/17/2014 at 5:29 PM

AccuraScience LB: Replicate number is like "sample number" or "N number" in your experiments. For microarray-based work (including the Infinium arrays for DNA methylation analysis), established standards require a replicate number of at least 3, to be considered as acceptable. The replicate requirements for sequencing-based approaches (such as the RNA-seq work you have) is a little ambiguous, but some replicates would be needed: for pair-wise comparison, I would suggest a replicate number of 3. Without these replicates, it would be hard to evaluate statistical significance of any conclusions we make, and it would be hard for the work to be accepted for publication in a decent journal. These requirements are not much different between cell line samples and tissue samples.

Could you tell me whether you have multiple identical samples for the Infinium array and RNA-seq experiments?

Thu, 09/18/2014 at 2:32 AM

Customer: We did not have results from multiple identical samples for each cell lines, although we did send several different passages of cells for seq. It seems not possible to show statistical significance at single gene level. The major purpose of our plan was to show overall changes of pathways controlled by Wnt. Also, we have no plan to publish these seq. data only, and some other functional studies will be added.

Thu, 09/18/2014 at 10:23 AM

AccuraScience LB: When there is only one sample in each cell line, we can use fold change to denote the difference between cell lines (in either RNA expression or DNA methylation level). We can do the analysis this way, but some reviewers might give a hard time, even when this analysis is only a part of the results included in a manuscript. I just wish to make sure that you are aware of this risk, and have assessed the option of doing the RNA-seq and DNA methylation array experiments for more replicates - before letting us go ahead with the analysis.

Back to Other Selected Recent Inquiries

Note: LB stands for Lead Bioinformatician. An AccuraScience LB is a senior bioinformatics expert and leader of an AccuraScience data analysis team.

Disclaimer: This text was selected and edited based on genuine communications that took place between a customer and AccuraScience data analysis team at specified dates and times. The editing was made to protect the customer’s privacy and for brevity. The edited text may or may not have been reviewed and approved by the customer. AccuraScience is solely responsible for the accuracy of the information reflected in this text.