Introduction
Genome-wide association studies (GWAS) have helped researchers discover thousands of genetic variants associated with disease and phenotype. But GWAS is not the end of the story. In recent years, new association frameworks—TWAS, EWAS, PWAS, and MWAS—have emerged to connect other molecular layers to complex traits. These methods offer a broader view of biology, going beyond DNA sequence to gene expression, epigenetic changes, proteins, and metabolites.
This article is the third in a three-part series:
TWAS: Connecting Genotype to Expression to Trait
Transcriptome-wide association studies (TWAS) aim to identify genes whose expression levels are associated with a phenotype. Unlike GWAS, which tests each variant directly, TWAS tests imputed gene expression levels derived from known eQTLs. In essence, it links gene expression (GReX) to trait, using genotype data as a bridge.
To run TWAS properly, researchers need access to reference panels like GTEx, which contain paired genotype-expression data across multiple tissues. Using predictive models—often based on elastic net regression or BLUP—tools like PrediXcan, FUSION, and SMR estimate each individual’s genetically regulated expression (GReX) and test it for association with the trait.
However, results depend heavily on tissue relevance, ancestral matching, and the quality of the model. Summary-based TWAS (like SMR) adds further assumptions and may suffer from confounding due to linkage or pleiotropy. Interpreting TWAS signals as “causal genes” without checking for colocalization can be misleading.
In practice, TWAS can be powerful but is technically nontrivial—especially when choosing between methods or modeling tissues with sparse expression data.
EWAS: Methylation and Epigenetic Regulation
Epigenome-wide association studies (EWAS) detect associations between phenotypes and epigenetic modifications, most often DNA methylation (e.g., at CpG sites). Unlike genotypes, methylation is dynamic—influenced by environment, aging, and disease states.
Standard EWAS workflows involve processing array-based data (e.g., Illumina 450K or EPIC) or bisulfite-sequencing datasets. Researchers typically use linear models (e.g., in limma) to associate methylation beta values with traits, adjusting for batch effects and covariates.
However, a major challenge is cell type heterogeneity. Especially in blood-based or tissue-level EWAS, different cell compositions can confound results. Tools like Houseman’s method, EpiDISH, and RefFreeEWAS attempt to estimate and adjust for this. Additionally, principal component adjustment, PEER factors, or surrogate variable analysis (SVA) can help account for technical and biological noise.
Causality remains difficult to establish. Many methylation changes are downstream effects, not drivers, of disease. Integrating genetic (meQTL), expression (eQTM), or chromatin interaction data may help build mechanistic support.
EWAS can be highly informative, but rigorous preprocessing and cautious interpretation are essential.
PWAS: Linking the Proteome to Traits
Proteome-wide association studies (PWAS) investigate how protein abundance correlates with disease or phenotype. Because proteins are often closer to phenotype than transcripts, PWAS has the potential to uncover more functionally direct drivers.
PWAS is still emerging, partly due to limited availability of large-scale protein quantitative trait loci (pQTL) reference datasets. The most commonly used panels include plasma proteomics from studies like INTERVAL or SCALLOP. These provide genotype-protein associations that can be used analogously to TWAS—predicting protein levels and associating them with traits.
Some tools (e.g., S-PrediXcan, PWAS-O) are being adapted for protein-based analysis. However, coverage is a major limitation: many proteins are not captured in current assays, and others may be poorly predicted from genotypes.
Another complication is post-translational modification, which can alter function without changing abundance—something current PWAS cannot account for.
Despite these challenges, PWAS is a promising path, especially in immune, metabolic, or neurodegenerative diseases where plasma proteomics data is growing.
MWAS: Associations with Metabolites and Microbiome Products
Metabolome-wide association studies (MWAS) examine correlations between phenotype and metabolite levels, often measured through NMR or mass spectrometry. These small molecules represent diverse biological pathways, including host-microbe interactions and dietary effects.
MWAS data can be targeted (focusing on known compounds) or untargeted (profiling hundreds or thousands of features). Analysis methods include linear models, sparse regression, and multivariate approaches like OPLS-DA.
However, interpretation is often limited by compound identification uncertainty, batch effects, and biological variability. Annotating unknown mass features remains difficult. Additionally, metabolite levels are sensitive to collection conditions, time of day, and diet—making reproducibility a challenge.
Still, MWAS can generate hypotheses about disrupted pathways or biomarker candidates. Integration with microbiome data (e.g., via mGWAS or bile acid profiling) is increasingly common, though still analytically demanding.
MWAS shines in exploratory analysis but benefits greatly from careful experimental design and downstream validation.
Multi-Omics Integration: The Future Direction
The real power of these “-WAS” methods emerges when integrated. For instance, a SNP associated with a disease in GWAS may also:
Several tools and frameworks are being developed to align these layers:
But integration is not plug-and-play. Challenges include matching samples, harmonizing formats, confounder control, and inferring directionality. Researchers often need help building statistically valid, biologically meaningful pipelines.
Conclusion
GWAS laid the foundation, but modern biology is multilayered. Whether you're investigating gene regulation with TWAS, epigenetic variation with EWAS, or downstream effects with PWAS and MWAS, these newer association studies offer powerful—but technically demanding—ways to uncover the molecular basis of complex traits.
This is the third part of our GWAS series. If you're just beginning, you may want to read Unraveling GWAS: A Researcher's 15-Minute Guide. If you've already worked with GWAS but run into problems, How to Do GWAS Right: Solving Five Common Analytical Challenges may help you overcome key barriers.
For those looking to move beyond GWAS, this article offers a roadmap—but success in TWAS, EWAS, and other “-WAS” methods depends on both deep domain knowledge and technical expertise.
AccuraScience helps researchers make sense of these complex layers—and connect signals to science.
About the Author: Justin T. Li received his Ph.D. in Neurobiology from the University of Wisconsin-Madison in 2000 and a M.S. in Computer Science from the University of Houston in 2001. Between 2004 and 2009, he served as an Assistant Professor in the Medical School at the University of Minnesota Twin Cities campus. From 2009 to 2013, he was the Chief Bioinformatics Officer at LC Sciences in Houston. In June 2013, Justin joined AccuraScience as a Lead Bioinformatician. He has published over 50 research articles in bioinformatics, computational biology, and related fields. More information about Justin can be found at https://www.accurascience.com/our_team.html.
Need help with TWAS, EWAS, PWAS, or MWAS? Learn more about how we can help, or visit our FAQ page.
Send us an inquiry, chat with us online (during business hours 9–5 Mon–Fri U.S. Central Time), or reach us in other ways!