Read the previous part: Unraveling GWAS: A Researcher's 15-Minute Guide, Part 2
Recent Improvements in GWAS Analysis Methodology Recent advancements in GWAS analysis methodology have focused on enhancing computational efficiency and statistical power within the mixed linear model framework. Interested readers can refer to Tibbs Cortes et al. (2021) for further details. Additionally, an alternative framework based on Bayesian statistics has gained attention. Interested readers can explore Stephens and Balding (2009) for more information on this alternative approach.
Ongoing Challenges in GWAS Analysis In current GWAS practices, several challenges continue to be discussed and addressed. One such challenge revolves around the handling of rare variants, which have low minor allele frequencies (MAF). Due to their nature, rare variants tend to generate lower p-values in linear model analysis. There has been a debate regarding the removal of rare variants before conducting a GWAS. While some argue for their exclusion upfront, rare variants can indeed be truly significant and often exhibit larger effect sizes. Current practices lean towards retaining rare variants for further examination after they are identified as significant in a GWAS.
Another challenge in GWAS analysis is selecting an appropriate method to control the multiple testing problem. Bonferroni correction was commonly employed in earlier GWASs, but it is known for its overly stringent nature, frequently leading to false negatives. False discovery rate (FDR) control has gained popularity as it strikes a balance between sensitivity and stringency. However, FDR control assumes independence among test statistics, which is not always the case in GWAS. Another alternative for controlling the multiple testing problem is conducting a permutation test, although this approach can be computationally intensive.
GWAS Extensions and Future Directions Genome-wide association studies (GWAS) have revolutionized the search for disease-associated genetic variants. Building on this success, a growing family of “-WAS” (wide association study) approaches has emerged to explore layers of biological regulation beyond DNA variants.
Epigenome-wide association studies (EWAS) aim to detect associations between disease and epigenetic changes, such as DNA methylation. Transcriptome-wide association studies (TWAS) link gene expression levels to phenotypes, integrating eQTL information. Similarly, PWAS and MWAS explore protein and metabolite levels, respectively.
More recently, Pan-genome-wide association studies (Pan-GWAS) have been developed to capture structural and sequence variation beyond the linear reference genome. This is particularly important for microbial genomics and complex plant/animal systems where reference bias can obscure true associations.
Ongoing advances in the field include:
- Machine learning–augmented GWAS, where algorithms are used to detect non-linear interactions and improve signal detection beyond linear modeling.
- Biobank-scale studies, leveraging datasets like UK Biobank or All of Us to combine GWAS with electronic health records (EHR), enabling population-level insights.
- Multi-omics integration, where GWAS is used in tandem with expression, epigenetic, proteomic, or microbiome data to refine causal inference.
- Fine-mapping and functional validation frameworks, combining statistical signals with CRISPR screens or epigenetic annotations to identify true causal variants.
These extensions reflect the evolving nature of GWAS as it adapts to the challenges of polygenic traits, rare variants, and biological complexity. Future success will increasingly depend on integrative approaches, robust modeling, and collaborative data sharing.
References
Ozaki K, Ohnishi Y, Iida A, Sekine A, Yamada R, Tsunoda T, Sato H, Sato H, Hori M, Nakamura Y, Tanaka T (2002) Functional SNPs in the lymphotoxin-alpha gene that are associated with susceptibility to myocardial infarction. Nat Genet 32:650-654.
Stephens M, Balding DJ (2009) Bayesian statistical methods for genetic association studies. Nat Rev Genet 10:681-690.
Tibbs Cortes L, Zhang Z, Yu J (2021) Status and prospects of genome-wide association studies in plants. Plant Genome 14:e20077.
VanRaden PM (2008) Efficient methods to compute genomic predictions. J Dairy Sci 91:4414-4423.
Yu J, Pressoir G, Briggs WH, Vroh Bi I, Yamasaki M, Doebley JF, McMullen MD, Gaut BS, Nielsen DM, Holland JB, Kresovich S, Buckler ES (2006) A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet 38:203-208.
Back to the beginning of the article: Unraveling GWAS: A Researcher's 15-Minute Guide, Part 1
About the Author: Justin T. Li received his B.S in Biophysics from Peking University in 1991, his Ph.D. in Neurobiology from the University of Wisconsin-Madison in 2000, and a M.S. in Computer Science from the University of Houston in 2001. He served as an Assistant Professor at the University of Minnesota Medical School from 2004 to 2009, and as Chief Bioinformatics Officer at LC Sciences in Houston from 2009 to 2013. In June 2013, Justin joined AccuraScience as a Lead Bioinformatician. He has published more than 50 research articles in bioinformatics, computational biology, and related fields. From 2013 to 2022, Justin led a team of bioinformaticians at AccuraScience in the completion of more than 120 bioinformatics research and development projects, including several GWAS, EWAS and TWAS projects. More information about Justin can be found at https://www.accurascience.com/our_team.html.
Need assistance in your GWAS, EWAS, TWAS or PWAS project? We may be able to help. Take a look at the intro to our bioinformatician team, see some of the advantages of using our team's help here, and check out our FAQ page!