Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Toward subtask-decomposition-based learning and benchmarking for predicting genetic perturbation outcomes and beyond

A preprint version of the article is available at bioRxiv.

Abstract

Deciphering cellular responses to genetic perturbations is fundamental for a wide array of biomedical applications. However, there are three main challenges: predicting single-genetic-perturbation outcomes, predicting multiple-genetic-perturbation outcomes and predicting genetic outcomes across cell lines. Here we introduce Subtask Decomposition Modeling for Genetic Perturbation Prediction (STAMP), a flexible artificial intelligence strategy for genetic perturbation outcome prediction and downstream applications. STAMP formulates genetic perturbation prediction as a subtask decomposition problem by resolving three progressive subtasks in a problem decomposition manner, that is, identifying postperturbation differentially expressed genes, determining the expression change directions of differentially expressed genes and finally estimating the magnitudes of gene expression changes. STAMP exhibits a substantial improvement over the existing approaches on three subtasks and beyond, including the ability to identify key regulatory genes and pathways on small samples and to reveal precise genetic interactions of diverse types.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Illustration of the STAMP framework.
Fig. 2: Challenge 1. Predicting single-genetic-perturbation outcomes in the RPE1_essential dataset.
Fig. 3: Challenge 2. Predicting multiple-genetic-perturbation outcomes in the PRJNA551220 dataset.
Fig. 4: Challenge 3. Predicting genetic perturbation outcomes across cell lines.
Fig. 5: GI subtype identification results of STAMP.

Similar content being viewed by others

Data availability

For the training, testing and application demonstrations involving STAMP, the required scCRISPR datasets, gene embedding representations and other relevant data were sourced from publicly available databases and software platforms. Specifically, the K562_GW, K562_essential and RPE1_essential scCRISPR datasets were obtained from https://gwps.wi.mit.edu/. The TFatlas, PRJNA551220, PRJNA787633, PRJNA641125a (CRISPRa dataset in PRJNA641125), PRJNA609688 and PRJNA641353 scCRISPR datasets were retrieved from the National Center for Biotechnology Information (NCBI) at https://www.ncbi.nlm.nih.gov/, corresponding to BioProject IDs PRJNA893678, PRJNA551220, PRJNA787633, PRJNA641125, PRJNA609688 and PRJNA641353, respectively. The Perturb-CITE-seq scCRISPR dataset was downloaded from the Broad Institute’s Single Cell Portal and is available at https://singlecell.broadinstitute.org/single_cell/data/public/SCP1064/multi-modal-pooled-perturb-cite-seq-screens-in-patient-models-define-novel-mechanisms-of-cancer-immune-evasion?filename=RNA_expression.csv.gz. These benchmark datasets can be obtained from https://zenodo.org/records/12779567 (ref. 53). The gene embedding representations can be found in Supplementary Information. Furthermore, the GI data utilized in Uncovering diverse types of GI were sourced from https://www.science.org/doi/10.1126/science.aax4438. Source data are provided with this paper.

Code availability

STAMP is available on GitHub (https://github.com/bm2-lab/STAMP) and Zenodo (https://zenodo.org/records/12779567)53 together with usage documentation and comprehensive example testing datasets.

References

  1. Tang, F. et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nat. Methods 6, 377–382 (2009).

    Article  Google Scholar 

  2. Bock, C. et al. High-content CRISPR screening. Nat. Rev. Methods Primers 2, 8 (2022).

    Article  Google Scholar 

  3. Dixit, A. et al. Perturb-seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell 167, 1853–1866.e17 (2016).

    Article  Google Scholar 

  4. Datlinger, P. et al. Pooled CRISPR screening with single-cell transcriptome readout. Nat. Methods 14, 297–301 (2017).

    Article  Google Scholar 

  5. Cheng, J. et al. Massively parallel CRISPR‐based genetic perturbation screening at single‐cell resolution. Adv. Sci. 10, 2204484 (2023).

    Article  Google Scholar 

  6. Adamson, B. et al. A multiplexed single-cell CRISPR screening platform enables systematic dissection of the unfolded protein response. Cell 167, 1867–1882.e21 (2016).

    Article  Google Scholar 

  7. Jaitin, D. A. et al. Dissecting immune circuits by linking CRISPR-pooled screens with single-cell RNA-seq. Cell 167, 1883–1896.e15 (2016).

    Article  Google Scholar 

  8. Roohani, Y., Huang, K. & Leskovec, J. Predicting transcriptional outcomes of novel multigene perturbations with GEARS. Nat. Biotechnol. 42, 927–935 (2024).

  9. Lotfollahi, M. et al. Predicting cellular responses to complex perturbations in high‐throughput screens. Mol. Syst. Biol. 19, e11517 (2023).

    Article  Google Scholar 

  10. Ji, Y., Lotfollahi, M., Wolf, F. A. & Theis, F. J. Machine learning for perturbational single-cell omics. Cell Syst. 12, 522–537 (2021).

    Article  Google Scholar 

  11. Gavriilidis, G. I., Vasileiou, V., Orfanou, A., Ishaque, N. & Psomopoulos, F. A mini-review on perturbation modelling across single-cell omic modalities. Comput. Struct. Biotechnol. J. 23, 1886–1896 (2024).

    Article  Google Scholar 

  12. Dann, E., Henderson, N. C., Teichmann, S. A., Morgan, M. D. & Marioni, J. C. Differential abundance testing on single-cell data using k-nearest neighbor graphs. Nat. Biotechnol. 40, 245–253 (2022).

    Article  Google Scholar 

  13. Dong, M. et al. Causal identification of single-cell experimental perturbation effects with CINEMA-OT. Nat. Methods 20, 1769–1779 (2023).

    Article  Google Scholar 

  14. Burkhardt, D. B. et al. Quantifying the effect of experimental perturbations at single-cell resolution. Nat. Biotechnol. 39, 619–629 (2021).

    Article  Google Scholar 

  15. Duan, B. et al. Model-based understanding of single-cell CRISPR screening. Nat. Commun. 10, 2233 (2019).

    Article  Google Scholar 

  16. Papalexi, E. et al. Characterizing the molecular regulation of inhibitory immune checkpoints with multimodal single-cell screens. Nat. Genet. 53, 322–331 (2021).

    Article  Google Scholar 

  17. Kamimoto, K. et al. Dissecting cell identity via network inference and in silico gene perturbation. Nature 614, 742–751 (2023).

    Article  Google Scholar 

  18. Aibar, S. et al. SCENIC: single-cell regulatory network inference and clustering. Nat. Methods 14, 1083–1086 (2017).

    Article  Google Scholar 

  19. Hetzel, L., Boehm, S., Kilbertus, N., Günnemann, S. & Theis, F. Predicting cellular responses to novel drug perturbations at a single-cell resolution. Adv. Neural Inf. Process. Syst. 35, 26711–26722 (2022).

    Google Scholar 

  20. Inecik, K., Uhlmann, A., Lotfollahi, M. & Theis, F. MultiCPA: multimodal compositional perturbation autoencoder. Preprint at bioRxiv https://doi.org/10.1101/2022.07.08.499049 (2022).

  21. Yang, F. et al. scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nat. Mach. Intell. 4, 852–866 (2022).

    Article  Google Scholar 

  22. Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618, 616–624 (2023).

  23. Cui, H. et al. scGPT: towards building a foundation model for single-cell multi-omics using generative AI. Nat. Methods 21, 1470–1480 (2024).

  24. Ma, J. et al. Few-shot learning creates predictive models of drug response that translate from high-throughput screens to individual patients. Nat. Cancer 2, 233–244 (2021).

    Article  Google Scholar 

  25. Wei, J. et al. Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst. 35, 24824–24837 (2022).

    Google Scholar 

  26. Chang, O., Flokas, L., Lipson, H. & Spranger, M. Assessing SATNet’s ability to solve the symbol grounding problem. Adv. Neural Inf. Process. Syst. 33, 1428–1439 (2020).

    Google Scholar 

  27. Replogle, J. M. et al. Mapping information-rich genotype–phenotype landscapes with genome-scale Perturb-seq. Cell 185, 2559–2575.e28 (2022).

    Article  Google Scholar 

  28. Joung, J. et al. A transcription factor atlas of directed differentiation. Cell 186, 209–229.e26 (2023).

    Article  Google Scholar 

  29. Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).

    Article  Google Scholar 

  30. Squair, J. W. et al. Confronting false discoveries in single-cell differential expression. Nat. Commun. 12, 5692 (2021).

    Article  Google Scholar 

  31. Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).

    Article  Google Scholar 

  32. Barry, T., Wang, X., Morris, J. A., Roeder, K. & Katsevich, E. SCEPTRE improves calibration and sensitivity in single-cell CRISPR screen analysis. Genome Biol. 22, 344 (2021).

    Article  Google Scholar 

  33. Norman, T. M. et al. Exploring genetic interaction manifolds constructed from rich single-cell phenotypes. Science 365, 786–793 (2019).

    Article  Google Scholar 

  34. Frangieh, C. J. et al. Multimodal pooled Perturb-CITE-seq screens in patient models define mechanisms of cancer immune evasion. Nat. Genet. 53, 332–341 (2021).

    Article  Google Scholar 

  35. Schmidt, R. et al. CRISPR activation and interference screens decode stimulation responses in primary human T cells. Science 375, eabj4008 (2022).

    Article  Google Scholar 

  36. Gao, Y. et al. Pan-Peptide Meta Learning for T-cell receptor–antigen binding recognition. Nat. Mach. Intell. 5, 236–249 (2023).

  37. Hospedales, T., Antoniou, A., Micaelli, P. & Storkey, A. Meta-learning in neural networks: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 44, 5149–5169 (2021).

    Google Scholar 

  38. Wang, W., Zheng, V. W., Yu, H. & Miao, C. A survey of zero-shot learning: settings, methods, and applications. ACM Trans. Intell. Syst. Technol. 10, 13 (2019).

    Article  Google Scholar 

  39. Wang, Y., Yao, Q., Kwok, J. T. & Ni, L. M. Generalizing from a few examples: a survey on few-shot learning. ACM Comput. Surv. 53, 63 (2020).

    Google Scholar 

  40. Mimitou, E. P. et al. Multiplexed detection of proteins, transcriptomes, clonotypes and CRISPR perturbations in single cells. Nat. Methods 16, 409–412 (2019).

    Article  Google Scholar 

  41. Young, M. D., Wakefield, M. J., Smyth, G. K. & Oshlack, A. Gene Ontology analysis for RNA-seq: accounting for selection bias. Genome Biol. 11, R14 (2010).

    Article  Google Scholar 

  42. Moon, J. W. et al. IFNγ induces PD-L1 overexpression by JAK2/STAT1/IRF-1 signaling in EBV-positive gastric carcinoma. Sci. Rep. 7, 17810 (2017).

    Article  Google Scholar 

  43. Garcia-Diaz, A. et al. Interferon receptor signaling pathways regulating PD-L1 and PD-L2 expression. Cell Rep. 19, 1189–1201 (2017).

    Article  Google Scholar 

  44. De Ville, B. Decision trees. Wiley Interdiscip. Rev. Comput. Stat. 5, 448–455 (2013).

    Article  Google Scholar 

  45. Bunne, C. et al. Learning single-cell perturbation responses using neural optimal transport. Nat. Methods 20, 1759–1768 (2023).

    Article  Google Scholar 

  46. Yu, H. & Welch, J. D. PerturbNet predicts single-cell responses to unseen chemical and genetic perturbations. Preprint at bioRxiv https://doi.org/10.1101/2022.07.20.500854 (2022).

  47. Piran, Z., Cohen, N., Hoshen, Y. & Nitzan, M. Disentanglement of single-cell data with biolord. Nat. Biotechnol. https://doi.org/10.1038/s41587-023-02079-x (2024).

  48. Song, B. et al. Decoding heterogenous single-cell perturbation responses. Preprint at bioRxiv https://doi.org/10.1101/2023.10.30.564796 (2023).

  49. Kana, O. et al. Generative modeling of single-cell gene expression for dose-dependent chemical perturbations. Patterns 4, 100817 (2023).

    Article  Google Scholar 

  50. Tang, X. et al. Explainable multi-task learning for multi-modality biological data analysis. Nat. Commun. 14, 2546 (2023).

    Article  Google Scholar 

  51. Ribeiro, M. T., Singh, S. & Guestrin, C. “Why should I trust you?”: explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 1135–1144 (Association for Computing Machinery, 2016).

  52. Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. Adv. Neural Inf. Proc. Syst. 30, 6785–6795 (2017).

    Google Scholar 

  53. Gao, Y. et al. STAMP: toward subtask decomposition-based learning and benchmarking for genetic perturbation outcome prediction and beyond (v0.1.2). Zenodo https://doi.org/10.5281/zenodo.12779567 (2024).

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant No. T24250193, 32341008), the National Key Research and Development Program of China (grants 2021YFF1201200, 2021YFF1200900), Shanghai Pilot Program for Basic Research, Shanghai Science and Technology Innovation Action Plan-Key Specialization in Computational Biology, Shanghai Shuguang Scholars Project, Shanghai Excellent Academic Leader Project, Shanghai Science and Technology Innovation Action Plan—Key Specialization in Computational Biology and Fundamental Research Funds for the Central Universities, and Shanghai Municipal Science and Technology Major Project (Grant No. 2021SHZDZX0100).

Author information

Authors and Affiliations

Authors

Contributions

Q.L., Y.G. and Z.W. designed the framework of this work. Y.G., Z.W., K.D., K.C., J.Y. and G.C. performed the analyses. Y.G., Z.W. and Q.L. wrote the manuscript with the help of the other authors. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Qi Liu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Computational Science thanks Georgios Gavriilidis, Stefan Peidli and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editor: Ananya Rastogi, in collaboration with the Nature Computational Science team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Notes 1–16, Discussion, Tables 1–29 and Figs. 1–23.

Reporting Summary

Peer Review File

Source data

Source Data Fig. 2

Source data for Fig. 2.

Source Data Fig. 3

Source data for Fig. 3.

Source Data Fig. 4

Source data for Fig. 4.

Source Data Fig. 5

Source data for Fig. 5.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gao, Y., Wei, Z., Dong, K. et al. Toward subtask-decomposition-based learning and benchmarking for predicting genetic perturbation outcomes and beyond. Nat Comput Sci 4, 773–785 (2024). https://doi.org/10.1038/s43588-024-00698-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s43588-024-00698-1

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy