Dissecting and directing pathology foundation models

Kim, Chanwoo, Kaczmarzyk, Jakub, Savant, Deepika, Zhao, Zhen, Koo, Peter K, Lee, Su-In (June 2026) Dissecting and directing pathology foundation models. bioRxiv. ISSN 2692-8205 (Submitted)

[thumbnail of 10.64898.2026.06.12.731496.pdf] PDF
10.64898.2026.06.12.731496.pdf - Submitted Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (10MB)

Abstract

Foundation models (FMs) are central to digital pathology, encoding histology images into dense embeddings for facilitating diagnostic classification, molecular alteration prediction, and clinical outcome modeling. However, the opacity of these embeddings renders FM-based systems “black boxes,” limiting their trustworthiness for clinical translation and utility for scientific discovery. Here, we introduce PICASSO (Pathology Image Concept Atlas built via SparSe dictiOnary learning), a framework that makes pathology FMs interpretable and controllable. PICASSO decomposes FM embeddings into human-interpretable visual concepts using a sparse autoencoder. It is trained on more than 120 million tissue patches across 32 cancer types, producing the first pan-cancer atlas of histomorphological concepts. We demonstrate that PICASSO enables diverse downstream applications of FM embeddings by exposing interpretable structure within learned representations and supporting concept-level intervention. It enables auditing of clinical model behavior by revealing the morphological features driving predictions. Beyond transparency and validation, PICASSO enables the discovery of new biological insights; for example, it identified hobnailing epithelial morphology as a previously unrecognized biomarker of EGFR mutations in lung adenocarcinoma. By linking PICASSO-derived concepts with spatial transcriptomics, we uncover associations between morphological patterns and gene expression programs. Furthermore, PICASSO allows suppression of concepts associated with technical artifacts, thereby reducing model reliance on spurious signals. Finally, PICASSO enables controlled manipulation of learned concepts to generate counterfactual embeddings for exploratory therapeutic analysis, such as modulating tumour-infiltrating lymphocyte density to assess impacts on predict survival outcomes. Together, PICASSO provides a principled framework for transforming pathology FMs into platforms for mechanistic insight and discovery.

Item Type: Paper
Subjects: bioinformatics
bioinformatics > quantitative biology
CSHL Authors:
Communities: CSHL labs > Koo Lab
CSHL labs > Zhao lab
SWORD Depositor: CSHL Elements
Depositing User: CSHL Elements
Date: 16 June 2026
Date Deposited: 22 Jun 2026 12:36
Last Modified: 22 Jun 2026 12:36
Related URLs:
URI: https://repository.cshl.edu/id/eprint/42227

Actions (login required)

Administrator's edit/view item Administrator's edit/view item