Massive experimental quantification allows interpretable deep learning of protein aggregation

Thompson, Mike, Martín, Mariano, Olmo, Trinidad Sanmartín, Rajesh, Chandana, Koo, Peter K, Bolognesi, Benedetta, Lehner, Ben (May 2025) Massive experimental quantification allows interpretable deep learning of protein aggregation. Science Advances, 11 (18). eadt5111. ISSN 2375-2548 (Public Dataset)

[thumbnail of 10.1126.sciadv.adt5111.pdf]

PDF
10.1126.sciadv.adt5111.pdf - Published Version
Available under License Creative Commons Attribution.
Download (4MB)

URL: https://www.ncbi.nlm.nih.gov/pubmed/40305601

DOI: 10.1126/sciadv.adt5111

Abstract

Protein aggregation is a pathological hallmark of more than 50 human diseases and a major problem for biotechnology. Methods have been proposed to predict aggregation from sequence, but these have been trained and evaluated on small and biased experimental datasets. Here we directly address this data shortage by experimentally quantifying the aggregation of >100,000 protein sequences. This unprecedented dataset reveals the limited performance of existing computational methods and allows us to train CANYA, a convolution-attention hybrid neural network that accurately predicts aggregation from sequence. We adapt genomic neural network interpretability analyses to reveal CANYA's decision-making process and learned grammar. Our results illustrate the power of massive experimental analysis of random sequence-spaces and provide an interpretable and robust neural network model to predict aggregation.

Item Type:	Paper
Subjects:	bioinformatics bioinformatics > computational biology > algorithms bioinformatics > computational biology bioinformatics > computational biology > algorithms > machine learning
CSHL Authors:	Koo, Peter K Rajesh, Chandana
Communities:	CSHL labs > Koo Lab
SWORD Depositor:	CSHL Elements
Depositing User:	CSHL Elements
Date:	2 May 2025
Date Deposited:	01 May 2025 12:49
Last Modified:	01 Jul 2025 15:55
PMCID:	PMC12042874
Related URLs:	Publisher
Dataset ID:	GEO: GSE268261 https://pmlabstack.pythonanywhere.com/dataset_AMYPredFRL http://amypro.net/ https://web.iitm.ac.in/bioinfo2/cpad2/index.html http://waltzdb.switchlab.org/sequences 10.5281/zenodo.15056516
URI:	https://repository.cshl.edu/id/eprint/41862

Actions (login required)

Administrator's edit/view item