The problem:
After clustering and annotation, people expect stories to emerge. But often, they don’t. There’s no biological meaning, no functional theme - just a bunch of clusters with no narrative. This is not a data problem. It's an interpretation bottleneck.
Teams spend all their time running the pipeline, and almost none thinking critically about what the clusters actually mean. When the time comes to explain them, there's silence - or hand-waving.
What actually happens:
- UMAP shows 12 clusters, but the manuscript only discusses 3.
- A reviewer asks: “Why is cluster 6 interesting?” and the authors don’t know.
- Some clusters are transcriptionally distinct but never followed up.
- The discussion is vague: “This cluster may represent an intermediate state.”
Why this happens:
- No integration with biology: Teams don’t map clusters back to their experimental system or known pathways.
- No functional analysis: Clusters are described by a few marker genes but never analyzed further.
- Poor planning: There was never a real hypothesis - just data collection and hope.
What experienced analysts do differently:
- Use enrichment tools to assign pathway or gene module meaning to each cluster.
- Ask: what does each cluster teach us about the biology or intervention?
- If a cluster has no story and no reproducibility, consider merging or discarding it.
Hard lesson learned:
More clusters does not mean more insight. Unless you make sense of them, you’ve just created a catalog - not a study.
Need help avoiding common scRNA-seq pitfalls? Our experienced bioinformatics team has rescued dozens of projects from review disasters, hidden artifacts, and analysis missteps. Request a free consultation →
The problem:
Figures drive papers. But figures also mislead. Too often, visualizations in scRNA-seq are optimized for aesthetics, not clarity. What looks clean may hide key information - or distort it entirely.
Without careful legend design, color choices, and data layering, even honest figures can deceive. And reviewers will notice.
What actually happens:
- Color bar shows “gene expression” but doesn’t say which gene.
- UMAP points overlap, making density differences invisible.
- All clusters are same size - but one has 20 cells, another has 2,000.
- Boxplots hide key variation across donors.
Why this happens:
- Default plotting: People use ggplot or Scanpy with standard settings and never adjust.
- Lack of reviewer perspective: Analysts know what the figure shows, but outsiders don’t.
- No data stratification: Plots show aggregated values across cells, not per-donor variation.
What experienced analysts do differently:
- Label clearly. Always say what each color, axis, and shape represents.
- Use transparency, facetting, or dot size to reflect data density.
- Include donor-level views, even if just in supplemental.
Hard lesson learned:
Visuals are powerful - but dangerous. Once published, misleading figures spread confusion. Clarity beats beauty every time.
The problem:
In large scRNA-seq projects, weeks or months pass between steps. Teams reprocess data. Parameters change. People leave. Soon, no one knows which version is the “real” one.
Without version control and documentation, you can’t reproduce your own results. When a reviewer asks, you can't answer. This breaks trust - and delays publication.
What actually happens:
- Old UMAP doesn’t match current clustering.
- Differential expression table cannot be regenerated.
- Someone re-ran integration and got new batch correction artifacts.
- PI asks for a plot, but no one knows how it was made.
Why this happens:
- No pipeline tracking: Scripts are scattered, with hard-coded paths and random edits.
- Lack of notebooks or logs: People rerun steps manually, with no record.
- Team turnover: Graduate student who did the analysis has graduated - and no one else understands it.
What experienced analysts do differently:
- Use version-controlled pipelines (e.g., Snakemake, Nextflow) or well-documented R/Python notebooks.
- Store metadata: parameter settings, software versions, gene annotations.
- Assume someone else - maybe future you - will need to repeat everything from scratch.
Hard lesson learned:
The hardest part of scRNA-seq is not the tools. It’s project management. If your analysis can’t be traced, it can’t be trusted.
The problem:
Everything seems done. The figures are made. The paper is submitted. Then the reviewers come - and demand re-analysis. Sometimes it’s minor. Sometimes they ask to redo the entire pipeline with different tools or parameters. If your workflow isn’t ready, your project falls apart.
What actually happens:
- Reviewer 2 asks for SCTransform instead of log-normalization.
- Reviewer 1 wants all plots re-generated using Slingshot instead of Monocle.
- Code is missing. Person who ran it is gone.
- Response letter says: “We attempted to re-analyze but encountered issues...”
Why this happens:
- Fragile workflows: Pipelines are not modular or reproducible, making edits dangerous.
- Poor communication: PI doesn’t know how the data were processed, only the final plots.
- Time pressure: Under revision deadlines, there's no time to start over properly.
What experienced analysts do differently:
- Keep reanalysis-ready pipelines from the start. Modular code, clean structure, version control.
- Anticipate reviewer questions and test alternatives before submission.
- Communicate openly across the team so no one becomes a single point of failure.
Hard lesson learned:
Your real analysis begins after submission. Smart teams prepare for that. Others just hope - and scramble later.
Every section of this blog series comes from lived experience. We’ve seen these mistakes - not once, but again and again. In many cases, we made them ourselves, years ago. The scRNA-seq field is evolving fast, but the basics of good practice stay the same: be critical, be cautious, and always test your assumptions.
If you’ve read this far, we hope you take away not just a checklist, but a mindset. The best analysts aren’t those who memorize pipelines - they are the ones who ask better questions at every step.
This blog series was co-authored by Tom Xu, Ph.D., Lead Bioinformatician; Justin Li, Ph.D., Lead Bioinformatician; and Zack Tu, Ph.D., Lead Bioinformatician. To learn more about AccuraScience's Lead Bioinformaticians, visit https://www.accurascience.com/our_team.html.