blogs_1_3blogs_1_4blogs_1_5blogs_1_1blogs_1_2blogs_1_0

Nine Painful Traps in TCR/BCR Immune Repertoire Data Analysis - And How We Handle Them Properly

Introduction

Immune repertoire sequencing opens up a powerful window into the adaptive immune system. Whether you work on T cells or B cells, in cancer, infection, or vaccination, the ability to track clonotype diversity and lineage expansion can provide high-impact insight. However, immune repertoire analysis is not just about running MiXCR or Cell Ranger VDJ. Many teams we’ve worked with suffered major delays, retractions, or failed experiments - not because of wet lab quality, but because the data was interpreted in a wrong way.

In this article, we describe nine major traps we repeatedly observe in immune repertoire projects involving TCR or BCR sequencing - including bulk VDJ‑seq, 10x Genomics immune profiling, and MiXCR‑based workflows. We explain why they happen, give realistic examples, and describe how we prevent or fix them in professional practice. This article is not a tool tutorial - it is a diagnostic reference for experienced labs who want answers - not just pipelines.

Table of Contents


Overcounting “ghost” clonotypes distorts diversity metrics. We filter by read support, contig quality, and replicate consistency to retain only real clones. Request a free consultation →

1. Overcounting Clonotypes from Low-Quality Reads

The Problem

Too many clonotypes are reported, inflating diversity metrics. But many of these are sequencing artifacts or partially assembled, low-confidence reads.

Why It Happens

VDJ recombination produces diverse junctions, and short reads often fail to cover full CDR3. MiXCR or IgBlast sometimes calls partial clones from low-quality data. Without proper filtering (e.g., based on alignment score, read support), you end up counting “ghost” clonotypes that are not real.

Real Example

One group claimed >30,000 unique BCR clonotypes from 10x data. But over 40% had only one supporting read. After filtering for ≥2 reads and full-length contig, <10,000 remained. The “diversity” story collapsed - it was just noise.

What We Do Differently

We impose strict filters on read support, contig length, CDR3 completeness, and productivity status. For bulk data, we remove singleton clonotypes unless they appear across replicates. For 10x, we cross‑check with cell barcode UMI counts to distinguish real clones from sequencing dropout.

2. Ignoring Chain Pairing Problems in Single-Cell TCR/BCR Data

The Problem

Researchers report TCR or BCR sequences from single‑cell 10x data, but many of the “cells” have multiple TRA/TRB or IGH/IGL pairs - or none at all.

Why It Happens

The 10x platform captures transcripts, not full-length DNA. If expression is low or barcode assignment is ambiguous, Cell Ranger assigns multiple contigs or leaves chains unpaired. Many pipelines blindly report these results.

Real Example

In a COVID-19 T cell study, only 35% of cells had one productive TRA and one TRB. The rest had zero or more than two chains, creating noise in clonotype analysis. But the original publication didn’t mention this - until reviewers asked.

What We Do Differently

We classify all single cells into strict categories: 1α+1β, multiple chains, or unproductive. Only cells with single productive αβ pairs are used in downstream clonotype analysis and UMAP overlays. We also remove doublets using scrublet or cell‑type expression profiles.

Single-cell chain pairing errors corrupt clonotype analysis. We classify cells by chain status and remove doublets before embedding or expansion analysis. Request a free consultation →

3. Misinterpreting Clonotype Expansion Without Statistical Context

The Problem

Expanded clones are presented as biologically meaningful - but there’s no statistical test or control group. It’s impossible to tell if expansion is real or random.

Why It Happens

Researchers often sort by clonotype frequency and highlight the top ones. But in many tissues (e.g., tumor, PBMC), clonal expansion can happen by chance, especially in low-complexity libraries. Without proper background or permutation tests, such claims are weak.

Real Example

A team found a dominant BCR clone in tumor tissue and claimed local antigen-driven expansion. But the same IGH chain appeared in unrelated samples. It was a public clone - not tumor‑specific.

What We Do Differently

We use diversity metrics (e.g., Shannon, Simpson), rarefaction curves, and cross-sample clonotype overlaps to define statistical enrichment. When possible, we include matched normal or pre/post treatment samples to confirm real expansion. We also test against public clone databases.

4. Misusing MiXCR or IgBlast Defaults Without Tuning

The Problem

Researchers run MiXCR or IgBlast with default settings and trust the output - but miss parameter settings that drastically affect results.

Why It Happens

Tools like MiXCR are flexible - but complex. Different read lengths, organism (human/mouse), or chain types (TRA/TRB vs IGH/IGL) require different settings. Defaults may miss correct CDR3 boundaries or fail on trimmed reads.

Real Example

One team analyzed mouse BCR with MiXCR using human settings. The IGHV gene calls were all wrong, and the CDR3s were truncated. Downstream lineage trees made no sense.

What We Do Differently

We always verify species, read length, and chain type. We benchmark MiXCR vs IgBlast vs TRUST4 on the same data. We also inspect contig alignments visually to confirm CDR3 length, frame status, and VDJ gene usage consistency.

Clonal expansion claims often lack statistical support. We use diversity indices, background models, and public clone databases to validate enrichment. Request a free consultation →

5. Diversity Metrics Applied Without Normalization

The Problem

Diversity indices (e.g., Shannon, Chao1) are used to compare conditions - but without normalizing for read depth or clonotype count.

Why It Happens

Raw diversity values are sensitive to sequencing depth and filtering thresholds. If one sample has twice the reads, it naturally looks more diverse - even if biologically it’s not.

Real Example

In a vaccine response study, the “boosted” sample showed higher diversity than baseline. But it also had twice the reads. After downsampling, the diversity actually decreased.

What We Do Differently

We normalize by total read count or unique clonotype count before calculating diversity. We also use rarefaction curves and bootstrap confidence intervals to assess real differences. For BCR data, we control for isotype class too (e.g., IgM vs IgG).

6. UMAP or Cluster Embedding Done Without Controlling for Clonotype Overlap

The Problem

UMAP plots of single-cell data show apparent clusters of expanded clones - but these reflect technical or batch artifacts, not biology.

Why It Happens

If a large clone dominates a tissue, its transcriptomic profile skews clustering. Without regressing out clonotype identity or controlling for clone size, dimensionality reduction misleads interpretation.

Real Example

In a TIL (tumor‑infiltrating lymphocyte) dataset, one large TCR clone showed up as a tight UMAP cluster. It was interpreted as a new T cell subtype. But it was just a technical artifact from clonal dominance.

What We Do Differently

We test whether transcriptomic clustering is driven by TCR identity using permutation analysis. We also build separate UMAPs for TCR-naive vs TCR-clonal cells. If clone bias dominates the embedding, we clearly mark it and avoid overinterpretation.

MiXCR defaults can give completely wrong CDR3s. We verify species, chain type, and read length before tuning alignment and clone calling settings. Request a free consultation →

7. Annotating V and J Genes Without Accounting for Polymorphism

The Problem

VDJ gene usage is reported - but many of the gene calls are incorrect due to unaccounted allelic variation or germline gaps.

Why It Happens

Tools like IgBlast use reference databases (e.g., IMGT), which may not include all alleles for a given sample. Especially in non‑European cohorts, V gene calls can be biased or wrong.

Real Example

In a Brazilian cohort, TCR V gene usage showed skewed results. After local IG/TR gene genotyping, we found that MiXCR was calling pseudo‑genes due to missing alleles.

What We Do Differently

We annotate with multiple reference databases and compare across tools. For high‑resolution work, we recommend AIRR‑seq compliant annotation and local genotyping of IG/TR loci. We also collapse gene families to reduce overinterpretation.

8. Trying to Infer Antigen Specificity from Clonotypes Without Validation

The Problem

Researchers claim that a certain TCR clone is “antigen‑specific” - based on its expansion or CDR3 sequence - but no validation is provided.

Why It Happens

Some teams use public TCR databases (e.g., VDJdb) to infer antigen specificity. But these databases are sparse and context‑dependent. Clonotypes may be cross‑reactive or appear due to homeostatic expansion.

Real Example

One lab linked a dominant clone to EBV‑specific response using database lookup. But when validated with tetramer staining, it didn’t bind EBV antigen at all.

What We Do Differently

We treat antigen specificity as a hypothesis, not a conclusion. We use TCRdist or GLIPH2 to identify motif‑based clusters, but always label them as “candidate” specificity. Where possible, we recommend experimental validation using peptide–MHC assays or single‑cell cytokine profiling.

UMAP clustering often reflects clonal dominance - not cell state. We test for clonotype-driven bias and mark or adjust UMAPs accordingly. Request a free consultation →

9. Treating TCR and BCR Projects as the Same

The Problem

Some groups use the same pipeline for both TCR and BCR data - ignoring critical biological and technical differences.

Why It Happens

At first glance, TCR and BCR analysis seems similar: VDJ recombination, clonotype calling, diversity metrics. But BCR undergoes somatic hypermutation and class switching - which need different analysis logic.

Real Example

A lymphoma study used a TCR‑style clonotype clustering on BCR data. All mutated clones were treated as separate lineages. No phylogenetic trees were built, and SHM patterns were ignored.

What We Do Differently

We treat BCR data with specialized tools (e.g., partis, BRILIA) for SHM‑aware lineage tree construction. We track isotype usage and mutation load over time. For TCR, we emphasize chain pairing, clonal tracking, and cluster embedding. These are separate worlds.

Final Remarks

Immune repertoire analysis can offer deep biological insight - or complete misdirection. Tools like MiXCR, Cell Ranger, and IgBlast are powerful - but must be applied carefully. One wrong assumption, one missed filter, or one default parameter can silently distort everything.

Whether you’re profiling TCR in tumor‑infiltrating lymphocytes, or tracking BCR evolution after vaccination, the analysis needs both technical rigor and biological judgment. We’ve helped many labs fix immune repertoire results after they ran into peer review trouble, or failed to replicate findings. Avoid the traps above, and your project will not only survive scrutiny - it may lead to real discovery.

If your team is working on TCR or BCR repertoire analysis and wants to be absolutely confident in the results, we’re here to help.

This blog article was co-authored by Justin T. Li, Ph.D., Lead Bioinformatician and William Gong, Ph.D., Lead Bioinformatician. To learn more about AccuraScience's Lead Bioinformaticians, visit https://www.accurascience.com/our_team.html.

TCR/BCR repertoire analysis demands more than just running MiXCR or Cell Ranger. Silent errors - from chain pairing issues to false diversity inflation - can derail even high-quality datasets. Our team combines tool-level mastery with deep biological judgment to help you extract meaningful, publication-ready insight from complex immune profiling data. Request a free consultation →

Chat Support Software