Kim, Chanwoo, Kaczmarzyk, Jakub, Savant, Deepika, Zhao, Zhen, Koo, Peter K, Lee, Su-In (June 2026) Dissecting and directing pathology foundation models. bioRxiv. ISSN 2692-8205 (Submitted)
|
PDF
10.64898.2026.06.12.731496.pdf - Submitted Version Available under License Creative Commons Attribution Non-commercial No Derivatives. Download (10MB) |
Abstract
Foundation models (FMs) are central to digital pathology, encoding histology images into dense embeddings for facilitating diagnostic classification, molecular alteration prediction, and clinical outcome modeling. However, the opacity of these embeddings renders FM-based systems “black boxes,” limiting their trustworthiness for clinical translation and utility for scientific discovery. Here, we introduce PICASSO (Pathology Image Concept Atlas built via SparSe dictiOnary learning), a framework that makes pathology FMs interpretable and controllable. PICASSO decomposes FM embeddings into human-interpretable visual concepts using a sparse autoencoder. It is trained on more than 120 million tissue patches across 32 cancer types, producing the first pan-cancer atlas of histomorphological concepts. We demonstrate that PICASSO enables diverse downstream applications of FM embeddings by exposing interpretable structure within learned representations and supporting concept-level intervention. It enables auditing of clinical model behavior by revealing the morphological features driving predictions. Beyond transparency and validation, PICASSO enables the discovery of new biological insights; for example, it identified hobnailing epithelial morphology as a previously unrecognized biomarker of EGFR mutations in lung adenocarcinoma. By linking PICASSO-derived concepts with spatial transcriptomics, we uncover associations between morphological patterns and gene expression programs. Furthermore, PICASSO allows suppression of concepts associated with technical artifacts, thereby reducing model reliance on spurious signals. Finally, PICASSO enables controlled manipulation of learned concepts to generate counterfactual embeddings for exploratory therapeutic analysis, such as modulating tumour-infiltrating lymphocyte density to assess impacts on predict survival outcomes. Together, PICASSO provides a principled framework for transforming pathology FMs into platforms for mechanistic insight and discovery.
| Item Type: | Paper |
|---|---|
| Subjects: | bioinformatics bioinformatics > quantitative biology |
| CSHL Authors: | |
| Communities: | CSHL labs > Koo Lab CSHL labs > Zhao lab |
| SWORD Depositor: | CSHL Elements |
| Depositing User: | CSHL Elements |
| Date: | 16 June 2026 |
| Date Deposited: | 22 Jun 2026 12:36 |
| Last Modified: | 22 Jun 2026 12:36 |
| Related URLs: | |
| URI: | https://repository.cshl.edu/id/eprint/42227 |
Actions (login required)
![]() |
Administrator's edit/view item |



