Tue, 09/09/2014
Tue, 09/09/2014 at 12:05 PM
Customer: We have used MethylMiner to enrich methylated genomic DNA from two related species. Both of these sequences can be referenced to a known, published crop genome sequence. We also have RNA seq data from the same leaf tissue. We would like to do a comparative analysis of methylated DNA sequences between the two species. We would also like to examine the if there is relationship between transcriptional activity and DNA methylation.
Here is what I would like to examine in the methylation analysis: 1) Characterize methylation patterns over the following regions. a) Determine criteria for methylation - FPKM level? b) Protein coding genes including 1 kb upstream and downstream of transcribed region. c) Small RNAs, d) Transposons, e) Repetitive DNA, f) Determine % of genome meeting criteria for methylation in each category for each species. 2) Identify regions of differential methylation (DMR) between the species - Is it possible to use the replicated data in a CuffDiff type of statistical analysis? There are 4 reps for each of the two species to be examined. a) Determine criteria for DMR - 2 fold difference? b) Protein coding genes including 1 kb upstream and downstream of transcribed region. c) Small RNAs, d) Transposons, e) Repetitive DNA, f) Functional analysis of protein coding genes, g) Association of DMR with differential gene expression, h) Association of DMR with differential expression small RNAs. The hypothesis tested would be that differential methylation is associated with differential gene expression and/or differential expression of small RNAs.
Wed, 09/10/2014 at 5:38 PM
AccuraScience LB: (1) Following MethylMiner enrichment, did you do bisulphate conversion and amplification then sequencing, or you did sequencing directly using the MethylMiner enriched samples?
(2) Could you tell me whether there are any kind of controls used? Are you more interested in compare between the two species (4 replicates each), or you are interested in looking at the correlation across the 4 samples for each species?
Now coming to the questions you raised. (1) The level of methylation is typically expressed as a Beta value (absolute proportion of methylated CpG bases) or a M-value (calculated from Beta value using a formula involving log transform). These are quite different from the RPKM concept known in the RNA-seq field. Bisulphite-based analysis often involves negative controls (samples without bilsulphite treatment), thus Beta values can be calculated in a relatively straight-forward manner. In contrast, enrichment-based analysis require an additional normalization step accounting for CpG density variation. Other important considerations include (a) what mappers to use (if no bisulphite treatment was done, BWA and Bowtie would work fine, but if bisulphite treatment was used, a specialized mapper such as BSMAP, RMAP or Bismark has to be used), (b) what variant caller should be used (Bis-SNP generally works fairly well), (c) whether additional bias-correcting procedures should be taken, e.g., the Bayesian-based method implemented in Batman, or the logistic regression model implemented with MEDME).
(2) There are a whole set of methods to identify DMRs, starting from the simplest t-test to sophisticated methods such as logistic M values, Shannon entropy and mixture models. Similarly to the methylation quantification step, there are optional steps to take to improve the accuracy of DMR calling, which I skip for now.
(3) Following methylation quantification and DMR analysis, integrated examination of DMRs and the list of things you specified (protein-coding genes, small RNAs, repeated elements etc) can be done, though some details need to be discussed further after we get answers to the above questions on the DNA methlation data.
(4) Please be aware that blindly looking at correlation between DNA methylation/DMRs and another item (e.g., protein coding genes) will likely result in an unmanageable number of pairs for downstream processing.
Mon, 09/15/2014 at 4:28 PM
Customer: 1) No we did not do bisulphite conversion. It is just MethylMiner enriched sequencing. 2) We are interested in comparing between the 2 species (4 reps each). We do have whole genome re-sequencing data for the two species, one with Illumina and the other with SOLiD.
We could limit the DMR analysis to differentially expressed protein coding genes and small RNAs. Here is paper with an analysis of gene and small RNA expression and DMR that I thought was good. http://www.pnas.org/content/109/32/E2183.full.pdf+html.
Tue, 09/16/2014 at 4:39 PM
AccuraScience LB: Since there is no bisulphite conversion experiments, the analysis would involve ChIP-seq like peak-calling type of pipeline. There might be issues with using the SOLiD-based resequencing data as control, because the mappability characteristics for Illumina and SOLiD data are different, and the big reason of using the control is to cancel off mappability bias...
Let me describe the list of tasks that we would propose to do for this project:
(1) Process the MethylMiner-enriched sequencing data, for the 8 samples, using the corresponding resequencing data (SOLiD- or Illumina-based), to obtain quantification of DNA methylation levels, and, develop and carry out methods to identify differentially methylated regions between the two species.
(2) Process the RNA-seq data, using published potato genome as reference (with Cufflinks pipeline), and obtain gene expression quantification and differentially expressed genes between the two species.
(3) Develop methods to analyze correlation between DNA methylation levels and genomics features, including (a) genic regions (regions where protein-coding genes reside) and non-genic regions, (b) promoter regions (e.g., defined as 1kb regions upstream of transcription start sites of a protein-coding gene), (c) regions coding 21-nt and 24-nt sRNAs, (d) transposon elements, and (e) other repeated regions, to determine what genome features are significantly associated with highly (or lowly) methylated regions.
(4) Develop methods to analyze correlation between differentially methylated regions (between the two species) and the genomic features described in (3), to determine what genomic features are significantly associated with differentially methylated regions (two-way analysis).
(5) Develop methods and perform analysis of correlation between DNA methylation levels and gene expression, and between differentially methylated regions and differentially expressed regions, between the two species.
Two additional notes: (1) Direct sequencing (without bisulphite conversion) following MethylMiner enrichment has a resolution limit of ~100 nt, thus we won't be able to obtain single-base level DNA methylation quantification. (2) Because MBD2 is sensitive to only CpG methylation, MethylMiner does not capture non-CpG methylation information. We wanted to make sure that you are aware of these limitations.
Back to Other Selected Recent Inquiries
Note: LB stands for Lead Bioinformatician. An AccuraScience LB is a senior bioinformatics expert and leader of an AccuraScience data analysis team.
Disclaimer: This text was selected and edited based on genuine communications that took place between a customer and AccuraScience data analysis team at specified dates and times. The editing was made to protect the customer’s privacy and for brevity. The edited text may or may not have been reviewed and approved by the customer. AccuraScience is solely responsible for the accuracy of the information reflected in this text.