Abstract
Deciphering cellular responses to genetic perturbations is fundamental for a wide array of biomedical applications. However, there are three main challenges: predicting single-genetic-perturbation outcomes, predicting multiple-genetic-perturbation outcomes and predicting genetic outcomes across cell lines. Here we introduce Subtask Decomposition Modeling for Genetic Perturbation Prediction (STAMP), a flexible artificial intelligence strategy for genetic perturbation outcome prediction and downstream applications. STAMP formulates genetic perturbation prediction as a subtask decomposition problem by resolving three progressive subtasks in a problem decomposition manner, that is, identifying postperturbation differentially expressed genes, determining the expression change directions of differentially expressed genes and finally estimating the magnitudes of gene expression changes. STAMP exhibits a substantial improvement over the existing approaches on three subtasks and beyond, including the ability to identify key regulatory genes and pathways on small samples and to reveal precise genetic interactions of diverse types.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$99.00 per year
only $8.25 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout





Similar content being viewed by others
Data availability
For the training, testing and application demonstrations involving STAMP, the required scCRISPR datasets, gene embedding representations and other relevant data were sourced from publicly available databases and software platforms. Specifically, the K562_GW, K562_essential and RPE1_essential scCRISPR datasets were obtained from https://gwps.wi.mit.edu/. The TFatlas, PRJNA551220, PRJNA787633, PRJNA641125a (CRISPRa dataset in PRJNA641125), PRJNA609688 and PRJNA641353 scCRISPR datasets were retrieved from the National Center for Biotechnology Information (NCBI) at https://www.ncbi.nlm.nih.gov/, corresponding to BioProject IDs PRJNA893678, PRJNA551220, PRJNA787633, PRJNA641125, PRJNA609688 and PRJNA641353, respectively. The Perturb-CITE-seq scCRISPR dataset was downloaded from the Broad Institute’s Single Cell Portal and is available at https://singlecell.broadinstitute.org/single_cell/data/public/SCP1064/multi-modal-pooled-perturb-cite-seq-screens-in-patient-models-define-novel-mechanisms-of-cancer-immune-evasion?filename=RNA_expression.csv.gz. These benchmark datasets can be obtained from https://zenodo.org/records/12779567 (ref. 53). The gene embedding representations can be found in Supplementary Information. Furthermore, the GI data utilized in Uncovering diverse types of GI were sourced from https://www.science.org/doi/10.1126/science.aax4438. Source data are provided with this paper.
Code availability
STAMP is available on GitHub (https://github.com/bm2-lab/STAMP) and Zenodo (https://zenodo.org/records/12779567)53 together with usage documentation and comprehensive example testing datasets.
References
Tang, F. et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nat. Methods 6, 377–382 (2009).
Bock, C. et al. High-content CRISPR screening. Nat. Rev. Methods Primers 2, 8 (2022).
Dixit, A. et al. Perturb-seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell 167, 1853–1866.e17 (2016).
Datlinger, P. et al. Pooled CRISPR screening with single-cell transcriptome readout. Nat. Methods 14, 297–301 (2017).
Cheng, J. et al. Massively parallel CRISPR‐based genetic perturbation screening at single‐cell resolution. Adv. Sci. 10, 2204484 (2023).
Adamson, B. et al. A multiplexed single-cell CRISPR screening platform enables systematic dissection of the unfolded protein response. Cell 167, 1867–1882.e21 (2016).
Jaitin, D. A. et al. Dissecting immune circuits by linking CRISPR-pooled screens with single-cell RNA-seq. Cell 167, 1883–1896.e15 (2016).
Roohani, Y., Huang, K. & Leskovec, J. Predicting transcriptional outcomes of novel multigene perturbations with GEARS. Nat. Biotechnol. 42, 927–935 (2024).
Lotfollahi, M. et al. Predicting cellular responses to complex perturbations in high‐throughput screens. Mol. Syst. Biol. 19, e11517 (2023).
Ji, Y., Lotfollahi, M., Wolf, F. A. & Theis, F. J. Machine learning for perturbational single-cell omics. Cell Syst. 12, 522–537 (2021).
Gavriilidis, G. I., Vasileiou, V., Orfanou, A., Ishaque, N. & Psomopoulos, F. A mini-review on perturbation modelling across single-cell omic modalities. Comput. Struct. Biotechnol. J. 23, 1886–1896 (2024).
Dann, E., Henderson, N. C., Teichmann, S. A., Morgan, M. D. & Marioni, J. C. Differential abundance testing on single-cell data using k-nearest neighbor graphs. Nat. Biotechnol. 40, 245–253 (2022).
Dong, M. et al. Causal identification of single-cell experimental perturbation effects with CINEMA-OT. Nat. Methods 20, 1769–1779 (2023).
Burkhardt, D. B. et al. Quantifying the effect of experimental perturbations at single-cell resolution. Nat. Biotechnol. 39, 619–629 (2021).
Duan, B. et al. Model-based understanding of single-cell CRISPR screening. Nat. Commun. 10, 2233 (2019).
Papalexi, E. et al. Characterizing the molecular regulation of inhibitory immune checkpoints with multimodal single-cell screens. Nat. Genet. 53, 322–331 (2021).
Kamimoto, K. et al. Dissecting cell identity via network inference and in silico gene perturbation. Nature 614, 742–751 (2023).
Aibar, S. et al. SCENIC: single-cell regulatory network inference and clustering. Nat. Methods 14, 1083–1086 (2017).
Hetzel, L., Boehm, S., Kilbertus, N., Günnemann, S. & Theis, F. Predicting cellular responses to novel drug perturbations at a single-cell resolution. Adv. Neural Inf. Process. Syst. 35, 26711–26722 (2022).
Inecik, K., Uhlmann, A., Lotfollahi, M. & Theis, F. MultiCPA: multimodal compositional perturbation autoencoder. Preprint at bioRxiv https://doi.org/10.1101/2022.07.08.499049 (2022).
Yang, F. et al. scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nat. Mach. Intell. 4, 852–866 (2022).
Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618, 616–624 (2023).
Cui, H. et al. scGPT: towards building a foundation model for single-cell multi-omics using generative AI. Nat. Methods 21, 1470–1480 (2024).
Ma, J. et al. Few-shot learning creates predictive models of drug response that translate from high-throughput screens to individual patients. Nat. Cancer 2, 233–244 (2021).
Wei, J. et al. Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst. 35, 24824–24837 (2022).
Chang, O., Flokas, L., Lipson, H. & Spranger, M. Assessing SATNet’s ability to solve the symbol grounding problem. Adv. Neural Inf. Process. Syst. 33, 1428–1439 (2020).
Replogle, J. M. et al. Mapping information-rich genotype–phenotype landscapes with genome-scale Perturb-seq. Cell 185, 2559–2575.e28 (2022).
Joung, J. et al. A transcription factor atlas of directed differentiation. Cell 186, 209–229.e26 (2023).
Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).
Squair, J. W. et al. Confronting false discoveries in single-cell differential expression. Nat. Commun. 12, 5692 (2021).
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).
Barry, T., Wang, X., Morris, J. A., Roeder, K. & Katsevich, E. SCEPTRE improves calibration and sensitivity in single-cell CRISPR screen analysis. Genome Biol. 22, 344 (2021).
Norman, T. M. et al. Exploring genetic interaction manifolds constructed from rich single-cell phenotypes. Science 365, 786–793 (2019).
Frangieh, C. J. et al. Multimodal pooled Perturb-CITE-seq screens in patient models define mechanisms of cancer immune evasion. Nat. Genet. 53, 332–341 (2021).
Schmidt, R. et al. CRISPR activation and interference screens decode stimulation responses in primary human T cells. Science 375, eabj4008 (2022).
Gao, Y. et al. Pan-Peptide Meta Learning for T-cell receptor–antigen binding recognition. Nat. Mach. Intell. 5, 236–249 (2023).
Hospedales, T., Antoniou, A., Micaelli, P. & Storkey, A. Meta-learning in neural networks: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 44, 5149–5169 (2021).
Wang, W., Zheng, V. W., Yu, H. & Miao, C. A survey of zero-shot learning: settings, methods, and applications. ACM Trans. Intell. Syst. Technol. 10, 13 (2019).
Wang, Y., Yao, Q., Kwok, J. T. & Ni, L. M. Generalizing from a few examples: a survey on few-shot learning. ACM Comput. Surv. 53, 63 (2020).
Mimitou, E. P. et al. Multiplexed detection of proteins, transcriptomes, clonotypes and CRISPR perturbations in single cells. Nat. Methods 16, 409–412 (2019).
Young, M. D., Wakefield, M. J., Smyth, G. K. & Oshlack, A. Gene Ontology analysis for RNA-seq: accounting for selection bias. Genome Biol. 11, R14 (2010).
Moon, J. W. et al. IFNγ induces PD-L1 overexpression by JAK2/STAT1/IRF-1 signaling in EBV-positive gastric carcinoma. Sci. Rep. 7, 17810 (2017).
Garcia-Diaz, A. et al. Interferon receptor signaling pathways regulating PD-L1 and PD-L2 expression. Cell Rep. 19, 1189–1201 (2017).
De Ville, B. Decision trees. Wiley Interdiscip. Rev. Comput. Stat. 5, 448–455 (2013).
Bunne, C. et al. Learning single-cell perturbation responses using neural optimal transport. Nat. Methods 20, 1759–1768 (2023).
Yu, H. & Welch, J. D. PerturbNet predicts single-cell responses to unseen chemical and genetic perturbations. Preprint at bioRxiv https://doi.org/10.1101/2022.07.20.500854 (2022).
Piran, Z., Cohen, N., Hoshen, Y. & Nitzan, M. Disentanglement of single-cell data with biolord. Nat. Biotechnol. https://doi.org/10.1038/s41587-023-02079-x (2024).
Song, B. et al. Decoding heterogenous single-cell perturbation responses. Preprint at bioRxiv https://doi.org/10.1101/2023.10.30.564796 (2023).
Kana, O. et al. Generative modeling of single-cell gene expression for dose-dependent chemical perturbations. Patterns 4, 100817 (2023).
Tang, X. et al. Explainable multi-task learning for multi-modality biological data analysis. Nat. Commun. 14, 2546 (2023).
Ribeiro, M. T., Singh, S. & Guestrin, C. “Why should I trust you?”: explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 1135–1144 (Association for Computing Machinery, 2016).
Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. Adv. Neural Inf. Proc. Syst. 30, 6785–6795 (2017).
Gao, Y. et al. STAMP: toward subtask decomposition-based learning and benchmarking for genetic perturbation outcome prediction and beyond (v0.1.2). Zenodo https://doi.org/10.5281/zenodo.12779567 (2024).
Acknowledgements
This work was supported by National Natural Science Foundation of China (Grant No. T24250193, 32341008), the National Key Research and Development Program of China (grants 2021YFF1201200, 2021YFF1200900), Shanghai Pilot Program for Basic Research, Shanghai Science and Technology Innovation Action Plan-Key Specialization in Computational Biology, Shanghai Shuguang Scholars Project, Shanghai Excellent Academic Leader Project, Shanghai Science and Technology Innovation Action Plan—Key Specialization in Computational Biology and Fundamental Research Funds for the Central Universities, and Shanghai Municipal Science and Technology Major Project (Grant No. 2021SHZDZX0100).
Author information
Authors and Affiliations
Contributions
Q.L., Y.G. and Z.W. designed the framework of this work. Y.G., Z.W., K.D., K.C., J.Y. and G.C. performed the analyses. Y.G., Z.W. and Q.L. wrote the manuscript with the help of the other authors. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Computational Science thanks Georgios Gavriilidis, Stefan Peidli and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editor: Ananya Rastogi, in collaboration with the Nature Computational Science team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Notes 1–16, Discussion, Tables 1–29 and Figs. 1–23.
Source data
Source Data Fig. 2
Source data for Fig. 2.
Source Data Fig. 3
Source data for Fig. 3.
Source Data Fig. 4
Source data for Fig. 4.
Source Data Fig. 5
Source data for Fig. 5.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gao, Y., Wei, Z., Dong, K. et al. Toward subtask-decomposition-based learning and benchmarking for predicting genetic perturbation outcomes and beyond. Nat Comput Sci 4, 773–785 (2024). https://doi.org/10.1038/s43588-024-00698-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s43588-024-00698-1