The structure-fitness landscape of pairwise relations in generative sequence models

Marshall, Dylan, Wang, Haobo, Stiffler, Michael, Dauparas, Justas, Koo, Peter, Ovchinnikov, Sergey (November 2020) The structure-fitness landscape of pairwise relations in generative sequence models. BioRxiv. (Unpublished)

[thumbnail of 2020.Marshall.pairwise_relations.pdf] PDF
2020.Marshall.pairwise_relations.pdf
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (13MB)

Abstract

If disentangled properly, patterns distilled from evolutionarily related sequences of a given protein family can inform their traits - such as their structure and function. Recent years have seen an increase in the complexity of generative models towards capturing these patterns; from sitewise to pairwise to deep and variational. In this study we evaluate the degree of structure and fitness patterns learned by a suite of progressively complex models. We introduce pairwise saliency, a novel method for evaluating the degree of captured structural information. We also quantify the fitness information learned by these models by using them to predict the fitness of mutant sequences and then correlate these predictions against their measured fitness values. We observe that models that inform structure do not necessarily inform fitness and vice versa, contrasting recent claims in this field. Our work highlights a dearth of consistency across fitness assays as well as divergently provides a general approach for understanding the pairwise decomposable relations learned by a given generative sequence model.

Item Type: Paper
Subjects: bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > protein structure, function, modification
bioinformatics > computational biology
CSHL Authors:
Communities: CSHL labs > Koo Lab
SWORD Depositor: CSHL Elements
Depositing User: CSHL Elements
Date: 30 November 2020
Date Deposited: 03 Nov 2021 15:17
Last Modified: 03 Nov 2021 15:17
URI: https://repository.cshl.edu/id/eprint/40408

Actions (login required)

Administrator's edit/view item Administrator's edit/view item