Abstract
Recent advances in single-cell transcriptome sequencing and computational analysis methods have improved our understanding of cellular heterogeneity. However, associating different cell subsets with phenotypes remains challenging. Recently, Ren et al. introduced PENCIL, a supervised learning framework incorporating gene selection to discern phenotype-relevant cells. To assess PENCIL’s reproducibility and transferability, we conducted a comprehensive evaluation across 12 single-cell RNA sequencing datasets representing four distinct phenotypes. We identified a few caveats with the original version of PENCIL, such as sensitivity to input perturbation, the correction of which contributed to PENCIL’s enhanced reproducibility. We highlight that boosting PENCIL’s cell subsets identification with gene set variation analysis creates a cytotoxic T cell immunotherapy response signature (CyTIR) predictive of immune checkpoint blockade response in skin cancer across multiple datasets, with an area under curve >0.75 and accuracy >0.71. Overall, our assessments enhance PENCIL’s reproducibility and utility, further extending its potential for identifying phenotype-relevant cell subsets in diverse biomedical applications.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout



Similar content being viewed by others
Data availability
All the data used in this study are publicly available on Gene Expression Omnibus (GEO) via the following GEO accession numbers: GSE120575 (ref. 9), GSE123813 (ref. 14), GSE166181 (ref. 15), GSE115978 (ref. 30), GSE144236 (ref. 31), GSE145328 (ref. 32), GSE139324 (ref. 23), GSE164690 (ref. 24), GSE162025 (ref. 25), GSE180268 (ref. 26), GSE182227 (ref. 27) and GSE200996 (ref. 28).
Code availability
All code necessary to replicate these analyses is freely available at GitHub (https://github.com/rootchang/PENCIL_reusability_report) and Zenodo (https://doi.org/10.5281/zenodo.10121113)39.
References
Gavish, A. et al. Hallmarks of transcriptional intratumour heterogeneity across a thousand tumours. Nature 618, 598–606 (2023).
Cao, J. Y. et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature 566, 496–502 (2019).
Eraslan, G. et al. Single-nucleus cross-tissue molecular reference maps toward understanding disease gene function. Science 376, 712 (2022).
van de Sande, B. et al. A scalable SCENIC workflow for single-cell gene regulatory network analysis. Nat. Protoc. 15, 2247–2276 (2020).
Ren, T. et al. Supervised learning of high-confidence phenotypic subpopulations from single-cell data. Nat. Mach. Intell. 5, 528–541 (2023).
Dann, E., Henderson, N. C., Teichmann, S. A., Morgan, M. D. & Marioni, J. C. Differential abundance testing on single-cell data using k-nearest neighbor graphs. Nat. Biotechnol. 40, 245–253 (2022).
Zhao, J. et al. Detection of differentially abundant cell subpopulations in scRNA-seq data. Proc. Natl Acad. Sci. USA 118, e2100293118 (2021).
Burkhardt, D. B. et al. Quantifying the effect of experimental perturbations at single-cell resolution. Nat. Biotechnol. 39, 619–629 (2021).
Sade-Feldman, M. et al. Defining T cell states associated with response to checkpoint immunotherapy in melanoma. Cell 175, 998 (2018).
Conde, C. D. et al. Cross-tissue immune cell analysis reveals tissue-specific features in humans. Science 376, 713 (2022).
Aran, D. et al. Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nat. Immunol. 20, 163–172 (2019).
Ianevski, A., Giri, A. K. & Aittokallio, T. Fully-automated and ultra-fast cell-type identification using specific marker combinations from single-cell transcriptomic data. Nat. Commun. 13, 1246 (2022).
Abdelaal, T. et al. A comparison of automatic cell identification methods for single-cell RNA sequencing data. Genome Biol. 20, 194 (2019).
Yost, K. E. et al. Clonal replacement of tumor-specific T cells following PD-1 blockade. Nat. Med. 25, 1251–1259 (2019).
De Biasi, S. et al. Circulating mucosal-associated invariant T cells identify patients responding to anti-PD-1 therapy. Nat. Commun. 12, 1669 (2021).
Wherry, E. J. et al. Molecular signature of CD8(+) T cell exhaustion during chronic viral infection. Immunity 27, 670–684 (2007).
Rooney, M. S., Shukla, S. A., Wu, C. J., Getz, G. & Hacohen, N. Molecular and genetic properties of tumors associated with local immune cytolytic activity. Cell 160, 48–61 (2015).
Damotte, D. et al. The tumor inflammation signature (TIS) is associated with anti-PD-1 treatment benefit in the CERTIM pan-cancer cohort. J. Transl. Med. 17, 357 (2019).
Ayers, M. et al. IFN-γ-related mRNA profile predicts clinical response to PD-1 blockade. J. Clin. Invest. 127, 2930–2940 (2017).
Thommen, D. S. et al. A transcriptionally and functionally distinct PD-1(+) CD8(+) T cell pool with predictive potential in non-small-cell lung cancer treated with PD-1 blockade. Nat. Med. 24, 994–1004 (2018).
Chow, A. et al. The ectonucleotidase CD39 identifies tumor-reactive CD8+T cells predictive of immune checkpoint blockade efficacy in human lung cancer. Immunity 56, 93–106.e6 (2023).
Duhen, T. et al. Co-expression of CD39 and CD103 identifies tumor-reactive CD8 T cells in human solid tumors. Nat. Commun. 9, 2724 (2018).
Cillo, A. R. et al. Immune landscape of viral- and carcinogen-driven head and neck cancer. Immunity 52, 183–199.e9 (2020).
Kurten, C. H. L. et al. Investigating immune and non-immune cell interactions in head and neck tumors by single-cell RNA sequencing. Nat. Commun. 12, 7388 (2021).
Liu, Y. et al. Tumour heterogeneity and intercellular networks of nasopharyngeal carcinoma at single cell resolution. Nat. Commun. 12, 741 (2021).
Eberhardt, C. S. et al. Functional HPV-specific PD-1(+) stem-like CD8 T cells in head and neck cancer. Nature 597, 279–284 (2021).
Puram, S. V. et al. Cellular states are coupled to genomic and viral heterogeneity in HPV-related oropharyngeal carcinoma. Nat. Genet. 55, 640–650 (2023).
Luoma, A. M. et al. Tissue-resident memory and circulating T cells are early responders to pre-surgical cancer immunotherapy. Cell 185, 2918–2935 e29 (2022).
Cabrita, R. et al. Tertiary lymphoid structures improve immunotherapy and survival in melanoma (vol 577, 561, 2020). Nature 580, E1 (2020).
Jerby-Arnon, L. et al. A cancer cell program promotes T cell exclusion and resistance to checkpoint blockade. Cell 175, 984–997 (2018).
Ji, A. L. et al. Multimodal analysis of composition and spatial architecture in human squamous cell carcinoma. Cell 182, 497–514 (2020).
Frazzette, N. et al. Decreased cytotoxic T cells and TCR clonality in organ transplant recipients with squamous cell carcinoma. Npj Precis. Onc. 4, 13 (2020).
Sun, D. Q. et al. TISCH: a comprehensive web resource enabling interactive single-cell transcriptome visualization of tumor microenvironment. Nucleic Acids Res. 49, D1420–D1430 (2021).
Hanzelmann, S., Castelo, R. & Guinney, J. GSVA: gene set variation analysis for microarray and RNA-Seq data. BMC Bioinformatics 14, 7 (2013).
Yu, G. C. & He, Q. Y. ReactomePA: an R/Bioconductor package for reactome pathway analysis and visualization. Mol. Biosyst. 12, 477–479 (2016).
Kumar, B. V. et al. Human tissue-resident memory T cells are defined by core transcriptional and functional signatures in lymphoid and mucosal sites. Cell Reports 20, 2921–2934 (2017).
Ayers, M. et al. IFN-gamma-related mRNA profile predicts clinical response to PD-1 blockade. J. Clin. Invest. 127, 2930–2940 (2017).
Andreatta, M. & Carmona, S. J. UCell: robust and scalable single-cell gene signature scoring. Computational and Structural Biotechnology Journal 19, 3796–3798 (2021).
Cao, Y., Chang, T.G., Sahni, S. & Ruppin, E. PENCIL reusability report v1.0.0. Zenodo https://doi.org/10.5281/zenodo.10121113 (2023).
Acknowledgements
This research was supported in part by the NIH Intramural Research Program, National Cancer Institute. This work utilized the computational resources of the NIH HPC Biowulf cluster (http://hpc.nih.gov).
Author information
Authors and Affiliations
Contributions
E.R., Y.C. and T.-G.C. conceived and designed the study. Y.C., T.-G.C. and S.S. collected and managed the data. T.-G.C. and Y.C. performed the analyses. T.-G.C., Y.C. and E.R. wrote the paper. All authors critically revised the manuscript for important intellectual content.
Corresponding author
Ethics declarations
Competing interests
E.R. is a co-founder of MedAware, Metabomed and Pangea Biomed (divested) and an unpaid member of Pangea Biomed’s scientific advisory board. The other authors declare no competing interests.
Peer review
Peer review information
Nature Machine Intelligence thanks the anonymous reviewers for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Figs. 1–4 and Table 1.
Rights and permissions
About this article
Cite this article
Cao, Y., Chang, TG., Sahni, S. et al. Reusability report: Leveraging supervised learning to uncover phenotype-relevant biology from single-cell RNA sequencing data. Nat Mach Intell 6, 307–314 (2024). https://doi.org/10.1038/s42256-024-00804-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s42256-024-00804-y
This article is cited by
-
Hallmarks of artificial intelligence contributions to precision oncology
Nature Cancer (2025)
-
The rewards of reusable machine learning code
Nature Machine Intelligence (2024)