Interpreting Potts and Transformer Protein Models Through the Lens of Simplified Attention

Bhattacharya, Nicholas, Thomas, Neil, Rao, Roshan, Dauparas, Justas, Koo, Peter K, Baker, David, Song, Yun S, Ovchinnikov, Sergey (2022) Interpreting Potts and Transformer Protein Models Through the Lens of Simplified Attention. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, 27. pp. 34-45. ISSN 2335-6936

[thumbnail of 2022.Koo.Simplified_attention.pdf] PDF
2022.Koo.Simplified_attention.pdf
Available under License Creative Commons Attribution Non-commercial.

Download (1MB)
URL: https://www.ncbi.nlm.nih.gov/pubmed/34890134

Abstract

The established approach to unsupervised protein contact prediction estimates coevolving positions using undirected graphical models. This approach trains a Potts model on a Multiple Sequence Alignment. Increasingly large Transformers are being pretrained on unlabeled, unaligned protein sequence databases and showing competitive performance on protein contact prediction. We argue that attention is a principled model of protein interactions, grounded in real properties of protein family data. We introduce an energy-based attention layer, factored attention, which, in a certain limit, recovers a Potts model, and use it to contrast Potts and Transformers. We show that the Transformer leverages hierarchical signal in protein family databases not captured by single-layer models. This raises the exciting possibility for the development of powerful structured models of protein family databases.

Item Type: Paper
Subjects: bioinformatics > genomics and proteomics > alignment
bioinformatics
bioinformatics > genomics and proteomics > genetics & nucleic acid processing
bioinformatics > genomics and proteomics
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > protein structure, function, modification
bioinformatics > genomics and proteomics > alignment > sequence alignment
organism description > animal behavior
organism description > animal behavior > attention
bioinformatics > computational biology
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > protein structure, function, modification > protein structure rendering
CSHL Authors:
Communities: CSHL labs > Koo Lab
SWORD Depositor: CSHL Elements
Depositing User: CSHL Elements
Date: 2022
Date Deposited: 31 Mar 2022 19:32
Last Modified: 11 Jan 2024 18:50
PMCID: PMC8752338
URI: https://repository.cshl.edu/id/eprint/40560

Actions (login required)

Administrator's edit/view item Administrator's edit/view item
CSHL HomeAbout CSHLResearchEducationNews & FeaturesCampus & Public EventsCareersGiving