Single-cell multi-modal data is very exciting - we can measure RNA and chromatin, or RNA and surface proteins, even spatial location, in the same cells. This promise opens the door for much more complex and real biological understanding. But from our experience, working on many projects from both academia and biotech, we have found that many multi-modal analysis pipelines break down - not because of code error, but because the assumptions were wrong, or the methods do not fit the data well.
It’s very easy to make plots - Seurat, Scanpy, totalVI, Harmony - they all produce UMAPs and clusters. But do they reflect true biology? Are modalities truly aligned? Are dropout effects handled correctly? Often the answer is no.
In this blog, we do not teach how to run WNN or totalVI, but rather summarize ten technical and biological traps we often see - and explain how we detect and avoid them. These mistakes are subtle, but very real, and can change your interpretation or even destroy your biological story.
Multi-modal data offers power - but also pitfalls. We help you navigate complex assumptions and avoid misleading results. Request a free consultation →
The Problem
RNA and ATAC or protein signals are combined - but one dominates. Integration is driven by RNA expression, while ATAC or protein becomes just decoration.
Why It Happens
WNN and other methods try to balance modalities, but they often use shared-neighbor graphs based on uncorrected variance. If RNA is high quality but ATAC is sparse, RNA structure takes over. The resulting “multi-modal” integration is not balanced.
Real Example
In one 10X Multiome project, clusters matched RNA groups almost exactly. ATAC modality contributed little, though authors assumed joint structure. After reweighting and separately assessing modality influence, we found ATAC was too sparse to support structure and required imputation or restriction to high-signal peaks.
What We Do Differently
We compute per-cell modality contribution and run separate PCA and UMAP for RNA and ATAC. Only when both show consistent structures we proceed with integration. We adjust modality weights, exclude noisy features, and remove low-quality cells per modality before integration.
The Problem
Weighted Nearest Neighbor (WNN) graphs often look clean - beautiful UMAPs with crisp clusters. But actually, one or more modalities may be low quality or poorly normalized, and still they are forcibly fused. Researchers trust the WNN because it is popular, but they forget the reliability of the WNN depends heavily on the underlying QC of each modality.
Why It Happens
Many people preprocess RNA carefully - filtering cells with high mitochondrial genes, checking gene counts, etc. But they forget to apply similar QC to ATAC or protein data. For ATAC, cells with low peak fragments or abnormal TSS enrichment are kept. For CITE-seq, batch-normalized protein signal is assumed to be clean. When WNN is applied on such imbalanced data, the fusion graph gets biased.
Real Example
We helped a group working on epithelial cells using 10X Multiome. Their WNN graph looked excellent - nice separation of goblet, ciliated, basal cells. But when we decomposed the WNN weights, we saw RNA was contributing almost 90% in most cells. ATAC had low signal and high noise, but the team never filtered them out. In fact, TSS enrichment scores were <2 in many clusters.
What We Do Differently
We apply rigorous per-modality QC. For ATAC, we compute nucleosome signal, TSS enrichment, FRiP, and total unique fragments. For protein data, we check signal-to-noise ratio per marker. Only after removing cells that fail QC in any modality do we proceed with WNN construction. We also inspect the modality weights across cell types, to detect unbalanced contributions.
Joint clustering isn’t always the right answer. We assess modality-specific structure before merging anything. Request a free consultation →
The Problem
Protein signal from CITE-seq is assumed to be perfect - no dropout, no batch effect, no background. But that’s not true. Antibody staining has batch issues, barcode swapping can happen, and ambient contamination is real. If not corrected, the protein data mislead clustering or falsely separate subtypes.
Why It Happens
The reason is psychological - RNA has dropout and noise, but protein looks “clean” and numeric. So people assume it’s better. Tools like totalVI try to model the background noise, but people still input unfiltered ADT data into PCA and clustering.
Real Example
In a leukemia project, CITE-seq was used to separate B, T, and NK cells. CD3 and CD56 protein signals showed crisp separation. But a batch of samples had shifted CD56 signal, likely due to antibody batch issue. The authors missed it and declared NK cell expansion. After correction and background modeling, the NK cluster disappeared.
What We Do Differently
We model CITE-seq protein data using denoising tools like totalVI or dsb, and we visually inspect per-marker distributions across batches. We also assess correlation between RNA and protein for canonical markers. When discrepancies arise, we investigate further - instead of trusting protein blindly. Sometimes we even exclude certain markers if they show clear batch shifts.
The Problem
In ATAC or CITE-seq, many features are missing in many cells - but that doesn’t mean zero expression. When dropout is not modeled, the integration assumes absence of signal, and that drives misleading clustering or embedding.
Why It Happens
Many pipelines apply log-normalization or centered scaling to all features, regardless of dropout. For ATAC, binarized peak matrices are not enough. For protein, missing signals may arise from technical failure, not biology. But without explicit dropout modeling, these absences distort the weight matrix.
Real Example
In a glioblastoma study using RNA and ATAC integration, immune clusters appeared distinct, but ATAC modality showed “absence” of peaks in some T-cell genes. Upon checking, we realized that the peak detection threshold was too high and dropout was misinterpreted as true silence.
What We Do Differently
We model dropout explicitly. For RNA, we use tools like scVI or MAGIC for imputation when needed. For ATAC, we apply zero-inflated models or smooth over peaks in cis-regulatory modules. We never treat a missing signal as true zero unless we have strong evidence.
Not sure if your integration respects all modalities? We quantify imbalance and preserve signals others might flatten. Request a free consultation →
The Problem
Joint PCA or CCA is done before proper normalization or variance adjustment across modalities. This causes the shared space to be biased - not because biology is similar, but because preprocessing made them appear so.
Why It Happens
Some Seurat tutorials suggest using SCTransform on RNA and directly applying log-normalized ADT or LSI-transformed ATAC. But variance stabilization is not harmonized, and dimensionality reduction becomes modality-biased. This results in false proximity in joint embedding.
Real Example
In one Perturb-seq + protein experiment, joint PCA was used after SCTransform on RNA and CLR-normalized protein. The PCA clustered mostly by perturbation condition - but after reprocessing, we found the separation was due to normalization scale, not biology.
What We Do Differently
We normalize each modality independently with proper methods: SCTransform for RNA, LSI for ATAC, and centered log-ratio or totalVI for protein. Only after confirming variance structure and correcting batch effects, we construct joint space - using methods like CCA, MOFA, or WNN that respect modality differences.
The Problem
Some analysts apply clustering right after dimensionality reduction - even before proper alignment of modalities. They assume the clusters are biologically meaningful, but in reality, the structure reflects unaligned data space.
Why It Happens
This often happens due to impatience - or misunderstanding of the tools. Once UMAP is generated, people want to see clusters. But UMAP embedding is not guaranteed to be modality-aligned, especially if signal sparsity or batch effects are present.
Real Example
We reviewed a paper where authors used WNN and immediately performed Louvain clustering. They got 12 clusters and labeled them with known markers. But the cluster boundaries matched only RNA signal. Protein signal contradicted the structure in 3 clusters.
What We Do Differently
We delay clustering until after alignment is confirmed. We first visualize RNA and ATAC/protein UMAPs independently, assess modality agreement, and perform batch correction if needed. Then, we use consensus graph construction or multi-modal alignment (e.g., Harmony, LIGER, Seurat v5) before clustering.
Beautiful plots can be dangerously misleading. We audit the workflow underneath your UMAPs and clusters. Request a free consultation →
The Problem
ATAC peaks are assigned to genes purely based on proximity - e.g., closest TSS. But many peaks are distal enhancers, and simple distance fails to capture regulatory reality. The linked gene-peak network becomes unreliable.
Why It Happens
Tools like Signac, ArchR, or Cicero offer peak-to-gene linkage by distance or co-accessibility. But in practice, people use only the default: assign to nearest gene. This causes misinterpretation, especially in lineage or differentiation studies.
Real Example
A hematopoiesis paper showed increased accessibility at a distal peak near MYB, and claimed MYB activation. But that peak actually regulates a lncRNA 150kb away. They missed it because annotation was proximity-based.
What We Do Differently
We combine multiple linkage strategies - proximity, co-accessibility, and chromatin conformation data - to assign peaks to genes. This ensures that gene-peak links are biologically meaningful and not just based on distance.
The Problem
Some biological processes only manifest in one modality - for example, chromatin remodeling without immediate transcriptional change. But integration methods smooth over differences, making modality-specific regulatory transitions invisible.
Why It Happens
Joint embeddings and shared clustering force modalities into a common space, diluting modality-specific signals. This happens because integration algorithms aim to minimize modality distance, inadvertently suppressing unique patterns.
Real Example
In a T-cell activation dataset, ATAC profiling revealed chromatin accessibility changes in key enhancers before changes in gene expression. The integrated UMAP aligned cells by RNA structure, missing the early ATAC-defined activation trajectory.
What We Do Differently
We analyze modality-specific trajectories separately: first identifying patterns unique to ATAC or protein, then assessing their biological relevance before integration. If necessary, we integrate only after characterizing modality-specific signals to ensure they aren’t lost.
WNN, totalVI, MOFA - good tools need careful handling. We bring experience and judgment to make integration truly meaningful. Request a free consultation →
The Problem
When anchoring single-cell data to spatial datasets, mapping may be influenced by modality-specific batch or coverage differences, making the spatial projection misleading.
Why It Happens
Spatial RNA-seq often has lower sensitivity and fewer detected genes. Anchoring algorithms may overweight modality with higher coverage or quality, pulling mapping away from true biological locations.
Real Example
In a tumor–immune microenvironment study, single-cell RNA + protein data were mapped to spatial transcriptomics. RNA anchors misassigned some macrophages due to low RNA coverage, while protein modality would have mapped them correctly.
What We Do Differently
We evaluate modality match before anchoring: assessing sensitivity, coverage, and batch effects in spatial data. We also compare RNA- and protein-based mappings and validate with known histological zones to ensure accurate localization.
The Problem
Many teams apply totalVI and assume integration is complete - that the model automatically corrects all issues, regardless of input quality and hyperparameters. But totalVI doesn’t fix everything.
Why It Happens
totalVI is very popular and convenient, outputting denoised protein and RNA embeddings. However, training is stochastic and models require careful tuning; with small batches or poor QC, results can be unstable.
Real Example
A COVID PBMC study used totalVI to integrate RNA and protein. After publication, reanalysis showed the cytokine signature signal disappeared, proving the original finding was a model artifact rather than biological truth.
What We Do Differently
We treat totalVI as one tool rather than a magic fix, exploring parameter settings, inspecting model convergence, and validating results with independent QC metrics before accepting outputs. We cross-validate with alternative methods when possible.
Single-cell multi-modal integration is very promising - but failures often stem not from wrong tools, but from unprepared data or unmet assumptions. Recognizing pitfalls early saves time and resources.
We hope this article offers a clear warning and helpful insights, enabling you to produce results that stand up to peer review and lead to genuine biological discoveries.
When integration fails, it is often not obvious until downstream analyses; with careful QC, thoughtful modeling, and modality-aware workflows, multi-modal data can deliver on its promise.
Don’t let integration hide your true biology. We preserve modality-specific signals before smoothing anything out. Request a free consultation →