blogs_7_0blogs_14_0blogs_15_0

Stuck in the Peaks? Troubleshooting ATAC-seq, CUT&Tag, CUT&RUN, ChIP-seq, and More

Struggling with ATAC-seq, CUT&Tag, ChIP-seq, or single-cell chromatin data? Our experts help diagnose tricky peak calling issues, QC problems, and replicate mismatches — and design robust, publication-ready pipelines for you. Request a free consultation →

Introduction

Many researchers searching for ATAC-seq analysis help, CUT&Tag troubleshooting, or ChIP-seq QC support find this practical guide useful for solving peak calling problems, replicate mismatch, and quality control challenges.

These days, more and more labs are using chromatin profiling assays - like ATAC-seq, CUT&Tag, CUT&RUN, ChIP-seq, and their advanced forms such as scATAC-seq, scChIP-seq, reChIP, and Co-ChIP - to study gene regulation. These techniques are powerful, but the analysis often becomes the bottleneck.

Even when the sequencing data looks OK, the results may not match expectation. Some researchers find that peaks are missing, or show up in strange locations. Others have trouble with differential analysis, or are not sure how to connect peaks to gene expression meaningfully.

In this article, we summarize several common problems we’ve seen when helping collaborators or clients analyze chromatin accessibility and histone modification data. We group the discussion by type of challenge, not just by assay, to avoid repeating the same advice in different places.

1. Chromatin Accessibility Assays: ATAC-seq and scATAC-seq

ATAC-seq gives a global view of open chromatin regions. Single-cell ATAC-seq (scATAC-seq) further allows cell-type-level resolution. But both share specific technical and analysis pitfalls.

⚠️ Common Issues

(I) Strange fragment size distribution. Good ATAC-seq data shows peaks at ~50 bp (nucleosome-free), ~200 bp, and ~400 bp. If this is missing, maybe over-tagmentation or DNA degradation occurred.

🟨 Callout: Real Case - Over-tagmentation TrapWe once saw a dataset where the TSS enrichment was excellent, but no clear nucleosome pattern was visible. It turned out the transposition time was extended beyond the protocol. The over-digestion masked nucleosomal features but preserved promoter signal.

(II) TSS enrichment is low. TSS enrichment score below 6 is a warning. This may reflect poor signal-to-noise or uneven fragmentation. Still, it depends on cell type.

(III) Peak calling is unstable. MACS2 is often used, but it assumes sharp peaks. Genrich and HMMRATAC may give better results for broader regions or clean nucleosome pattern—but can be sensitive to noise.

🟨 Tech TipGenrich works better when mitochondrial reads are properly removed. If not, it may inflate peaks near chrM-like sequences. Always run samtools idxstats and filter before calling.

(IV) Differential analysis does not agree with biology. Some teams use DESeq2 or edgeR on peak counts. But results depend heavily on how peaks were defined, batch effect, and replicate quality.

(V) scATAC-seq has data sparsity. In single-cell ATAC, each cell may only have ~10k fragments. This leads to sparse peak matrix. Clustering and dimensionality reduction need careful tuning.

🟨 Note on Peak Strategy for scATACSome pipelines merge peaks from all clusters, but we prefer cluster-wise peak calling in cases where rare cell types drive distinct chromatin landscapes. Otherwise, signal is lost in majority vote.

2. Targeted Enrichment Assays: CUT&Tag, CUT&RUN, ChIP-seq, reChIP, Co-ChIP

These assays enrich for specific chromatin features using antibodies. CUT&Tag and CUT&RUN have lower background than ChIP-seq, but the data is often sparse. ChIP-seq is more established, but more noisy. reChIP and Co-ChIP involve double enrichment, making signal even more fragile.

⚠️ Common Issues

(I) Sparse or uneven signal. Especially in CUT&Tag or CUT&RUN, the read counts may be very low in some regions. It is hard for some peak callers to handle this well.

🟨 Callout: Avoiding False Positives in CUT&TagLow background can be a double-edged sword - peaks called in regions with just 10–15 reads may not be real. Visual check is essential, and merging replicates often helps before calling.

(II) Peak calling tools give inconsistent results. SEACR is a popular choice for CUT&Tag, but may overcall weak signal. GoPeaks and MACS2 sometimes work better, but need proper tuning.

(III) Broad histone marks confuse analysis. Histone modifications like H3K27me3 show diffuse enrichment. Some tools assume narrow peaks and miss these signals.

🟨 Insight: Broad vs Narrow Mode in MACS2Always check whether your peak caller is using - broad mode (for H3K27me3, H3K9me3) vs default. The difference is not just peak width - the entire statistical model changes.

(IV) Double IP methods (reChIP, Co-ChIP) suffer from low yield. Signal may be too weak for confident peak calling. Manual validation and IGV checking is necessary.

(V) Replicates show poor agreement. This may happen due to variable antibody efficiency, sample prep, or PCR bias.

3. Single-Cell Chromatin Profiling: scATAC-seq, scCUT&Tag, scChIP-seq

These assays give exciting possibilities, but analysis is much harder than bulk. Most tools adapted from bulk are not suitable without modification.

⚠️ Common Issues

(I) Data sparsity and dropout. Each cell has low read count. Most peaks are zero in most cells. Need to rely on latent space methods like LSI, TF-IDF, or NMF.

🟨 Callout: The TF-IDF TrickTerm frequency-inverse document frequency (TF-IDF) normalization, borrowed from text mining, is very effective in scATAC. It balances peak-level variability with cell-level depth. ArchR and Signac both support it.

(II) Defining peaks is not straightforward. Global peak calling often misses cell-type-specific regions.

(III) Integration with scRNA-seq is hard. Joint embedding needs gene activity matrix or motif scores, which can be noisy. False correlation may arise if not careful.

🟨 Warning: Don’t Trust Activity Scores BlindlyIn many scATAC+scRNA papers, gene activity scores are shown as evidence of correlation. But these scores are usually summed accessibility near TSS ±2kb — not direct expression prediction.

(IV) Motif enrichment is unstable. Because of data sparsity, motif analysis in single-cell chromatin data is less reliable than in bulk.

4. Common Pitfalls Across All Assays

- Naïvely assigning peaks to nearest gene. This ignores chromatin looping and may mislead.

- Using inappropriate normalization. Many tools assume equal library size or noise distribution.

- Too few replicates. Comparing one sample per group is not enough for statistics.

- No visual inspection. Many false peaks can be filtered by simply viewing in IGV.

🟨 Final ReminderPeak calling is statistical, but interpretation is visual. Always validate your peak list with signal track viewing in IGV or UCSC browser before drawing conclusions.

Closing Thoughts

We have worked with many epigenomic datasets from different labs. Even when the protocol is standard, the data often behaves differently. Each assay - whether ATAC-seq or reChIP—has its own style. Each tool also has its own assumptions.

It is not necessary to use every latest tool. But it is important to understand what each tool expects, and what it may fail to detect. Sometimes, the problem is not the data, but the wrong pipeline for that data.

We hope this summary provides a useful reference, especially for those facing analysis trouble after generating good data. A small adjustment in pipeline or interpretation can make a big difference.

🔑 Need more than troubleshooting tips?
Our team has supported over 180 institutions with a wide range of epigenomic projects — from troubleshooting peak calling and replicate issues to designing custom pipelines and preparing publication-ready results. If you’d like personalized help, send us an inquiry for a free data review.
About the Author: Justin T. Li received his Ph.D. in Neurobiology from the University of Wisconsin-Madison in 2000 and a M.S. in Computer Science from the University of Houston in 2001. Between 2004 and 2009, he served as an Assistant Professor in the Medical School at the University of Minnesota Twin Cities campus. From 2009 to 2013, he was the Chief Bioinformatics Officer at LC Sciences in Houston. In June 2013, Justin joined AccuraScience as a Lead Bioinformatician. He has published over 50 research articles in bioinformatics, computational biology, and related fields. More information about Justin can be found at https://www.accurascience.com/our_team.html.


Need help with your epigenomics project? Learn more about how we can help, or visit our FAQ page.

"Send us an inquiry, chat with us online (during business hours 9–5 Mon–Fri U.S. Central Time), or reach us in other ways!

Chat Support Software