Empirical variance component regression for sequence-function relationships

Zhou, Juannan, Wong, Mandy, Chen, Wei-Chia, Krainer, Adrian, Kinney, Justin, McCandlish, David (October 2020) Empirical variance component regression for sequence-function relationships. BioRxiv. (Unpublished)

[thumbnail of 2021.Zhao.ComponentRegression.pdf] PDF
2021.Zhao.ComponentRegression.pdf

Download (6MB)
DOI: 10.1101/2020.10.14.339804

Abstract

Contemporary high-throughput mutagenesis experiments are providing an increasingly detailed view of the complex patterns of genetic interaction that occur between multiple mutations within a single protein or regulatory element. By simultaneously measuring the effects of thousands of combinations of mutations, these experiments have revealed that the genotype-phenotype relationship typically reflects genetic interactions not only between pairs of sites, but also higher-order interactions between larger numbers of sites. However, modeling and understanding these higher-order interactions remains challenging. Here, we present a method for reconstructing sequence-to-function mappings from partially observed data that can accommodate all orders of genetic interaction. The main idea is to make predictions for unobserved genotypes that match the type and extent of epistasis found in the observed data. This information on the type and extent of epistasis can be extracted by considering how phenotypic correlations change as a function of mutational distance, which is equivalent to estimating the fraction of phenotypic variance due to each order of genetic interaction (additive, pairwise, three-way, etc.). Based on these estimated variance components, we then define an empirical Bayes prior that in expectation matches the observed pattern of epistasis, and reconstruct the sequence-function mapping by conducting Gaussian process regression under this prior. To demonstrate the power of this approach, we present an application to the antibody-binding domain GB1 and provide a detailed exploration of a dataset consisting of high-throughput measurements for the splicing efficiency of human pre-mRNA 5′ splice sites for which we also validate our model predictions via additional low-throughput experiments.

Item Type: Paper
Subjects: bioinformatics > computational biology
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification > mutations
CSHL Authors:
Communities: CSHL labs > Kinney lab
CSHL labs > Krainer lab
CSHL labs > McCandlish lab
SWORD Depositor: CSHL Elements
Depositing User: CSHL Elements
Date: 15 October 2020
Date Deposited: 07 May 2021 14:09
Last Modified: 07 May 2021 14:09
URI: https://repository.cshl.edu/id/eprint/40044

Actions (login required)

Administrator's edit/view item Administrator's edit/view item
CSHL HomeAbout CSHLResearchEducationNews & FeaturesCampus & Public EventsCareersGiving