Benchmarking Copy Number Aberrations Inference Tools Using Single-Cell Multi-Omics Datasets

1. Research Background and Significance

In the fields of oncology and genomics, chromosomal copy number alterations (CNAs) are a key type of genetic variation driving the occurrence and progression of cancer. CNAs not only determine tumor heterogeneity but also play a crucial role in early tumor detection, subclone evolution analysis, research on drug resistance mechanisms, and more. Traditional methods for detecting copy number variations mainly rely on single-cell DNA sequencing (scDNA-seq), which, despite its high resolution, is limited by high costs and low sequencing coverage, making it difficult to widely apply in large-scale, high-throughput real-world studies.

With the prevalence and accumulation of single-cell RNA sequencing (scRNA-seq) data, researchers have found that potential genomic copy number changes can also be inferred under certain conditions based on scRNA-seq data, greatly expanding the possibilities for mining genomic structural variations from existing transcriptomic data. Consequently, recent years have witnessed the emergence of a number of computational tools for inferring CNAs from scRNA-seq data, such as inferCNV, CopyKAT, SCEVAN, Numbat, and CASPER. These tools infer CNA features of tumor cells through different algorithms based on signals such as expression levels and allele frequency.

However, due to differences in algorithmic principles, parameter settings, input requirements, and application scenarios, there has so far been no independent and systematic benchmarking to objectively compare the performance, strengths, weaknesses, and usage recommendations for each tool. This has led to challenges in tool selection and result interpretation for tumor heterogeneity and single-cell spatial transcriptomics studies. Systematically utilizing real paired single-cell DNA/RNA multi-omics data to comprehensively and objectively benchmark major existing tools is therefore of significant scientific and practical value for standardizing this field and improving the overall quality of research.

2. Source of the Paper and Author Information

The research paper, titled “Benchmarking copy number aberrations inference tools using single-cell multi-omics datasets,” was jointly completed by Minfang Song, Shuai Ma, Gong Wang, Yukun Wang, Zhenzhen Yang, Bin Xie, Tongkun Guo, Xingxu Huang, and corresponding author Liye Zhang. The authors are primarily affiliated with Zhejiang Lab, School of Life Science and Technology of ShanghaiTech University, Shanghai Clinical Research and Trial Center, Yazhou Bay National Laboratory, among others. The paper was published in volume 26, issue 2 (2025) of the internationally renowned journal Briefings in Bioinformatics.

3. Study Design and Workflow Details

1. Overall Study Workflow

This benchmarking study innovatively utilizes single-cell multi-omics datasets capable of simultaneously acquiring DNA and RNA information from the same cell (i.e., parallel scRNA-seq and scDNA-seq). The CNAs defined by scDNA-seq are regarded as the “gold standard” for comparison. The study systematically evaluated the multi-dimensional performance of five mainstream scRNA-seq CNA inference tools through a core workflow including:

  • Integration of multi-omics datasets and sample selection;
  • Full pipeline execution and parameter optimization for five tools (inferCNV, CopyKAT, SCEVAN, Numbat, CASPER);
  • Quantitative comparison of each tool’s performance under varying conditions based on tasks including “tumor/normal cell classification,” “CNA profile accuracy,” “tumor subclone identification,” and “aneuploidy detection in non-malignant cells.”

2. Data Sources and Processing

The research team included real single-cell multi-omics datasets obtained from different public projects or collaborating authors, including:

  • 8 cases of colorectal cancer (CRC) samples (from the study by Zhou et al., 8 cases total);
  • 2 cases of acute lymphoblastic leukemia (ALL);
  • 1 glioma, 1 neuroendocrine tumor, 1 NPC43 cell line, and 1 HUVEC cell line (all from related works by Yu et al. or Cui et al.);

Each sample included paired RNA and DNA sequencing data of the same cell, with scDNA-seq results used to define true CNAs and scRNA-seq data serving as input for each software. See Supplementary Table S1 of the paper for specific data details.

3. Tools and Their Principles

The five tools evaluated are divided into two categories:

  • Expression matrix-based tools: inferCNV, CopyKAT, SCEVAN. The main principle is that when a chromosomal region in a tumor cell experiences copy number amplification or deletion, the average expression level of the corresponding genes will increase or decrease accordingly. Such algorithms use approaches like moving average, Bayesian segmentation, and segmentation optimization to capture spatial variation patterns of expression signals.
  • Tools integrating allele/heterozygous site information: Numbat and CASPER. In addition to expression matrices, these tools also analyze changes in B-allele frequency, allowing for the identification of subtler CNAs such as copy-number-neutral loss of heterozygosity (CNLOH). Numbat uses a haplotype-based hidden Markov model (HMM), while CASPER implements a multiscale signal analysis framework.

Each software was executed according to its official documentation and parameters were optimized based on empirical experience; for example, inferCNV used a “two-pass” method to optimize normalization baselines, and Numbat and CASPER required careful selection of reference cell types.

4. Evaluation Process and Index Design

  • Tumor/normal cell classification accuracy:

    • Using scDNA-seq-based cluster annotation as ground truth, each tool’s cell classification accuracy and F1 score were calculated.
    • The impact of different tumor purity levels (i.e., the proportion of tumor cells), inclusion of microenvironment cells, and sequencing depth on algorithm performance was examined.
  • CNA profile inference consistency:

    • The inferred single-cell (or population) CNA segments were compared with the ground truth, using indicators such as Pearson correlation coefficient to quantify spatial signal concordance.
    • The evaluation emphasized the tools’ ability to detect both large-scale and subtle variations, as well as the effect of optimizing parameters/procedures (e.g., two-pass method).
  • Breakpoint and subclone structure recognition:

    • Three tools with breakpoint recognition (inferCNV, SCEVAN, Numbat) were compared for their accuracy (F1-score, recall, etc.) in detecting main chromosomal structural breakpoints of tumor subclones.
    • Tumor subclones were analyzed using hierarchical clustering and similarity analysis to assess the consistency of inferred subclones with DNA-based ground truth.
  • Aneuploidy detection in non-malignant cells:

    • High-frequency aneuploid populations (such as fibroblasts, T/B cells, endothelial cells) were used to test the tools’ sensitivity for single-chromosome gain/loss detection.
  • Computational efficiency and applicability analysis:

    • Benchmarked each software’s memory/CPU consumption and runtime when processing datasets with thousands of cells, to assess practical scalability.

4. Major Results and Data Details

1. Automatic Tumor/Normal Cell Classification

  • Overall performance: Numbat displayed optimal tumor/normal classification when abundant multi-omics data were available; when limited to expression matrix-only input, CopyKAT showed the best and most stable performance and was robust to low sequencing depth.
  • The effect of tumor purity: At high tumor purity, inferCNV is prone to erroneously using the tumor background as the expression reference, resulting in “incorrect centering” (where CNA signals of tumor cells are treated as baseline and normal cells are misclassified as tumor). Conversely, SCEVAN performed poorly at low tumor purity. The inclusion of microenvironment cells significantly improved classification and CNA inference for both methods.
  • Simulation experiments: Downsampling (simulating tumor:normal ratios from 1:100 to 100:1) further validated the robustness of the tools. Numbat consistently maintained high accuracy, while inferCNV showed category reversal at extreme purity ratios.

2. Accuracy of CNA Profile Inference

  • Optimization of baseline setting: For inferCNV, a two-step process (identify normal cells as reference, then reanalyze) significantly enhanced segmental concordance with DNA ground truth (improved Pearson coefficient).
  • Inter-tool differences: Numbat and CASPER provide discrete integer CNA profiles—cleaner and easier to compare with DNA data—while the other tools output continuous signals. No single tool came out on top across all samples. Performance was optimal when tumor and normal cell numbers were balanced.
  • Breakpoint, abnormal segmentation, and LOH detection: SCEVAN achieved the best sensitivity and F1 for subclone structure breakpoint detection (i.e., recognizing complex chromosomal rearrangements). Numbat’s unique B-allele fusion analysis made it highly sensitive for CNLOH detection, but prone to misclassification (e.g., regions of large-scale copy gain miscalled as LOH).

3. Subclone Structure Inference

  • All tools, given accurate tumor cell classification, were able to reliably reflect DNA-based spatial evolution of subclones; in glioma and CRC cases, most methods’ subclone structure inferences matched DNA ground truth well (ARI >0.8), though some polarized samples required inclusion of microenvironment cells to assist. SCEVAN and inferCNV performed especially well.

4. Aneuploidy Detection in Non-malignant Cells

  • As CNAs in non-malignant cells are typically single-chromosome gains or losses, all tools performed consistently poorly in detecting these low-burden events. The main reasons include lower UMI/gene counts per cell compared to tumor cells, smaller expression perturbations, etc.—highlighting the need for algorithms specifically designed for low-burden CNV detection.

5. Computational Resources and Practicality

  • CopyKAT and SCEVAN showed the best execution efficiency, well-suited for analyzing thousand-cell datasets on consumer-grade computers; the complexity of Numbat and inferCNV suggests server environments for datasets above the thousand-cell scale.

5. Main Conclusions and Practical Value

This study is the first independent and systematic benchmark of scRNA-seq CNA inference tools, revealing the advantages and limitations of each tool under different scenarios and providing practical recommendations:

  • Tool selection should match experimental conditions: When B-allele data is available, the Numbat + SCEVAN/InferCNV combination is recommended. For expression-only data, the CopyKAT + SCEVAN/InferCNV combination, and using multiple tools for cross-validation can enhance result reliability.
  • Parameter optimization and appropriate reference selection are critical: Processes such as InferCNV’s two-pass workflow or the gamma parameter in Numbat need to be trialed multiple times in combination with sample characteristics to achieve optimal performance.
  • LOH and other special event detection results require cautious interpretation: These should be corroborated with independent DNA data.

The scientific value lies in providing a standardized and empirically validated tool selection guide for diverse tumor single-cell/spatial transcriptomics studies, thereby increasing the reliability and consistency of data interpretation in the field. The study also highlights the research chokepoint of low-burden CNA detection and low-expression resolution, suggesting that future algorithm development should focus on sensitivity and applicability.

6. Research Highlights and Innovations

  • Innovatively adopted authentic “same cell” multi-omics data, maximizing the real-world biological relevance of tool inference performance;
  • Performed a full-spectrum evaluation, including classification, breakpoint/subclone detection, resource consumption, and parameter tuning;
  • Clearly revealed the tendencies and pitfalls of different algorithms under specific conditions, providing a model for tool screening in emerging scenarios such as spatial transcriptomics.

7. Other Important Information

The authors have fully open-sourced all analysis code and scripts for reproducibility; all data used are publicly accessible. The research was funded by National Natural Science Foundation of China, the “Leading Goose” R&D Program of Zhejiang Province, Zhejiang Lab, etc. The author team has leading experience and data accumulation in domestic single-cell multi-omics and tumor heterogeneity research, and the manuscript was completed jointly by several academic research institutions.

This study lays an important empirical foundation and evaluation benchmark for the development and application of copy number variation inference technology in the scRNA-seq field and has wide reference significance for multiple frontier areas such as cancer bioinformatics.