Ten Critical Mistakes in EWAS Analysis - And How Experienced Bioinformaticians Handle Them

Introduction

Epigenome-wide association studies (EWAS) have become an important tool in human disease and exposure research. From cancer to neurodevelopmental disorders to environmental exposure assessment, many researchers now rely on EWAS to detect CpG sites or regions whose DNA methylation changes with phenotype or environment. But despite the simplicity of the design - methylation profiling across samples, followed by statistical association - the actual analysis is surprisingly fragile.

Over the years, we’ve helped recover many EWAS projects that looked fine on the surface: clean beta distributions, neat PCA plots, even a few significant hits. But when we went deeper - checking technical covariates, batch structures, probe reliability, or statistical assumptions - serious problems appeared. Worse, many such issues are not detected by standard pipelines. The danger is subtle: a few false discoveries may mislead the entire biological story, and reviewers now know where to look.

This article is not a tutorial for minfi or limma. Instead, we summarize ten major problems we see often in real-world EWAS data analysis - from preprocessing to interpretation. Each section explains the problem, why it happens, a real anonymized example, and what we do differently to avoid or correct it. We hope this can help others protect their EWAS results from technical artifacts and overconfidence.

1. Confounding by Cell Type Composition
2. Batch Effects Not Properly Modeled
3. Misinterpretation of Beta vs M-values
4. Inadequate Correction for Multiple Testing
5. Low-Quality or Cross-Hybridizing Probes Included
6. Misuse of Imputation or Normalization Methods
7. Overconfident Interpretation of Non-Genic CpGs
8. Circular Use of Surrogate Variables
9. Neglecting Sex, Age, or Cell-Type Interaction Effects
10. No External Validation or Biological Confirmation

EWAS analysis looks simple - but it's fragile. We catch confounding, batch effects, and probe issues before they mislead your biology. Request a free consultation →

1. Confounding by Cell Type Composition

The Problem

Observed methylation differences between cases and controls may reflect changes in underlying blood cell proportions, not true disease-related epigenetic changes.

Why It Happens

Most EWAS are performed on whole blood, which is a mixture of immune cells. Many diseases or exposures change cell type composition. If cell type is not accounted for properly, even perfect normalization cannot save the analysis.

Real Example

In one asthma EWAS using Illumina 450k, over 200 CpGs showed strong association with case status. But most were in granulocyte-specific regions. When we adjusted for estimated cell counts using the Houseman method, almost all signals disappeared.

What We Do Differently

We always estimate and include cell type proportions in the model. For blood, we apply reference-based deconvolution (e.g., minfi or EpiDISH). For cord blood or saliva, we use appropriate references. We also test residuals to confirm that confounding is removed.

2. Batch Effects Not Properly Modeled

The Problem

Hidden technical batch effects - from array slide, date, technician - introduce methylation differences unrelated to biology, creating false associations.

Why It Happens

Many labs block randomize samples by phenotype but forget to adjust for chip row, slide ID, or processing batch. Standard normalization may reduce some variance, but not all. Without explicit modeling or surrogate variables, batch can dominate.

Real Example

A study on depression and methylation in elderly subjects showed 50 significant CpGs, mostly near imprinted genes. But they all came from two slides processed one month apart. No batch variable was included. With ComBat adjustment, the hits vanished.

What We Do Differently

We explicitly model known technical covariates (slide, array, position) and apply ComBat or remove unwanted variation (RUV). If batch is confounded with outcome, we report it clearly and limit interpretation. We also avoid PCA plots that are not batch-aware.

Your “significant CpGs” may be technical artifacts. We rigorously model batch, cell composition, and multiple testing to ensure robustness. Request a free consultation →

3. Misinterpretation of Beta vs M-values

The Problem

Statistical models are run on beta values (0–1), which are bounded and heteroscedastic, leading to biased results especially for extremes near 0 or 1.

Why It Happens

Beta values are intuitive and easy to visualize, so many analysts just use them directly. But they violate assumptions of linear modeling. M-values (logit-transformed betas) are more appropriate for modeling but harder to interpret.

Real Example

One team analyzed 800K CpGs in beta scale, finding many hits with tiny differences (~0.02). But those sites had extremely low variance and high skewness. Switching to M-value modeling reduced their significance and revealed more robust CpGs elsewhere.

What We Do Differently

We run association testing using M-values and convert top hits back to beta scale for interpretation. We also report delta-beta for effect size understanding, but statistical inference is always based on M-values to control variance issues.

4. Inadequate Correction for Multiple Testing

The Problem

Thousands of CpGs are tested, but FDR or Bonferroni correction is not applied properly. This results in spurious “significant” hits.

Why It Happens

Some teams rank by unadjusted p-values or use ad hoc thresholds like p<0.001. Others use FDR but without adjusting for inflation or modeling assumptions.

Real Example

An EWAS on childhood obesity listed 37 CpGs with “suggestive” p<1e-4, but after applying proper FDR using the limma model with empirical Bayes shrinkage, none passed 5% threshold. The signal was weak, and the top hits could not replicate.

What We Do Differently

We apply limma or robust linear models with empirical Bayes variance moderation, followed by Benjamini-Hochberg FDR or permutation-based control. We also estimate genomic inflation and correct if needed before downstream pathway analysis.

5. Low-Quality or Cross-Hybridizing Probes Included

The Problem

Many probes on 450k/EPIC arrays are known to be unreliable: non-specific binding, polymorphic CpGs, SNP overlap, or poor detection p-values.

Why It Happens

Default pipelines sometimes skip these filters, or users think “more probes” is better. But poor-quality probes inflate noise and generate false hits.

Real Example

In a cardiovascular EWAS, the top CpG was cg05736175. But that probe is known to cross-hybridize with multiple chromosomes. It failed in multiple validation studies. When removed, none of the top 10 CpGs remained.

What We Do Differently

We filter out all cross-reactive, non-CpG, and polymorphic probes based on recent annotation (e.g., Zhou 2016). We also remove probes with detection p > 0.01 in more than 1% of samples. For EPIC arrays, we apply additional filters for type I/II bias.

Imputation and normalization can break EWAS. We test assumptions and compare methods before drawing conclusions. Request a free consultation →

6. Misuse of Imputation or Normalization Methods

The Problem

Inappropriate use of imputation, quantile normalization, or background correction can distort true signal, especially across heterogeneous samples.

Why It Happens

Many pipelines apply default normalization (e.g., SWAN, quantile) without checking assumptions. Imputation of low-confidence beta values adds false precision.

Real Example

A team used kNN imputation on 7% missing data in neonatal blood samples. After normalization, PCA showed artificial separation by gestational age, likely introduced during processing. Raw betas showed no such pattern.

What We Do Differently

We use functional normalization when appropriate and avoid imputation for high-missingness probes. We also compare different normalization strategies (Noob, Funnorm, BMIQ) and inspect their effect on data structure. Sometimes simpler is safer.

7. Overconfident Interpretation of Non-Genic CpGs

The Problem

Significant CpGs found in intergenic or enhancer regions are assigned to nearest gene and interpreted as affecting its expression, without evidence.

Why It Happens

Annotation tools often use distance-based assignment (nearest TSS). This can be misleading, especially for enhancers or long-range regulatory regions.

Real Example

In one environmental exposure EWAS, CpGs 60 kb upstream of the SOX9 gene were interpreted as directly affecting its function. But eQTM and chromatin data showed they belonged to a separate enhancer interacting with a different gene.

What We Do Differently

We annotate CpGs using gene context, enhancer annotation (e.g., FANTOM5, ENCODE), and known eQTM databases. For intergenic hits, we caution interpretation and recommend integrative analyses (e.g., expression-methylation correlation).

EWAS needs more than just a pipeline. We bring biological judgment, technical rigor, and experience from >100 EWAS studies. Request a free consultation →

8. Circular Use of Surrogate Variables

The Problem

Surrogate variable analysis (SVA) or RUV is used to adjust for hidden confounders - but the estimated components inadvertently remove biological signal.

Why It Happens

When the variable of interest is correlated with batch or cell type, SVA may absorb it. If analysts then test association after adjustment, real signals are lost.

Real Example

An aging EWAS used SVA with automatic selection of 7 components. After regression, no CpGs remained significant. But PCA showed that two components were almost perfectly correlated with age.

What We Do Differently

We test the correlation of SVA/RUV components with known variables. We exclude components that overlap with outcome. We prefer explicit covariate modeling where possible and interpret residual structure carefully.

9. Neglecting Sex, Age, or Cell-Type Interaction Effects

The Problem

Important modifiers like sex, age, or specific cell types are not modeled, leading to missed associations or inflated variability.

Why It Happens

Many EWAS use simple linear models without interaction terms or stratified analysis. But some methylation effects only appear in males, or only in monocytes, etc.

Real Example

In a smoking EWAS, only weak CpG associations were found overall. But when stratified by sex, strong signals appeared in males. Ignoring this diluted the effect and obscured interpretation.

What We Do Differently

We model sex, age, and other modifiers explicitly. For known biological hypotheses, we run interaction models or stratified EWAS. We also perform cell-type–specific analysis using deconvoluted profiles if relevant.

10. No External Validation or Biological Confirmation

The Problem

Top hits from EWAS are not validated - no replication cohort, no expression correlation, no functional follow-up.

Why It Happens

Validation is expensive and not always available. But even nominal replication or in silico checks can dramatically strengthen confidence.

Real Example

In an inflammatory disease EWAS, 12 CpGs were highlighted. But none had replication data. Later, a published dataset showed opposite directionality in 8/12 hits. The study lost credibility.

What We Do Differently

We check all top hits in public EWAS databases (e.g., EWAS Atlas, MRCIEU OpenGWAS). We also compute correlation with expression (if RNA-seq is available), and prioritize CpGs that show consistent effects. For some clients, we assist in designing targeted bisulfite validation or qPCR for nearest genes. Final Remarks EWAS is statistically elegant but biologically fragile. Many pipelines today can generate pretty volcano plots and significance tables - but if confounding, probe reliability, or interaction effects are not handled properly, these results may not survive even modest scrutiny. We believe EWAS should be done with the same care as GWAS or RNA-seq analysis - with transparent assumptions, rigorous modeling, and careful interpretation. Most important is to avoid overconfidence. When a CpG is highlighted, we should ask: is it technically reliable? Is the effect biologically plausible? Could it be driven by batch, cell type, or noise? Avoid the ten mistakes above, and your EWAS study will not only look convincing - it will stand strong against skepticism, and help uncover real biological insight.

Need help validating EWAS findings? We check EWAS Atlas, perform replication, and design targeted bisulfite assays. Request a free consultation →

Final Remarks

EWAS is statistically elegant but biologically fragile. Many pipelines today can generate pretty volcano plots and significance tables - but if confounding, probe reliability, or interaction effects are not handled properly, these results may not survive even modest scrutiny.

We believe EWAS should be done with the same care as GWAS or RNA-seq analysis - with transparent assumptions, rigorous modeling, and careful interpretation. Most important is to avoid overconfidence. When a CpG is highlighted, we should ask: is it technically reliable? Is the effect biologically plausible? Could it be driven by batch, cell type, or noise?

Avoid the ten mistakes above, and your EWAS study will not only look convincing - it will stand strong against skepticism, and help uncover real biological insight.

This blog article was authored by Justin Li, Ph.D., Lead Bioinformatician. To learn more about AccuraScience's Lead Bioinformaticians, visit https://www.accurascience.com/our_team.html.

FAQs

Company

Ten Critical Mistakes in EWAS Analysis - And How Experienced Bioinformaticians Handle Them

Introduction

Table of Contents

1. Confounding by Cell Type Composition

2. Batch Effects Not Properly Modeled

3. Misinterpretation of Beta vs M-values

4. Inadequate Correction for Multiple Testing

5. Low-Quality or Cross-Hybridizing Probes Included

6. Misuse of Imputation or Normalization Methods

7. Overconfident Interpretation of Non-Genic CpGs

8. Circular Use of Surrogate Variables

9. Neglecting Sex, Age, or Cell-Type Interaction Effects

10. No External Validation or Biological Confirmation

Final Remarks