Inference and visualization of complex genotype-phenotype maps with gpmap-tools

Martí-Gómez, Carlos, Zhou, Juannan, Chen, Wei-Chia, Kinney, Justin B, McCandlish, David M (March 2025) Inference and visualization of complex genotype-phenotype maps with gpmap-tools. bioRxiv. ISSN 2692-8205 (Submitted)

[thumbnail of 10.1101.2025.03.09.642267.pdf] PDF
10.1101.2025.03.09.642267.pdf - Submitted Version
Available under License Creative Commons Attribution.

Download (6MB)

Abstract

Multiplex assays of variant effect (MAVEs) allow the functional characterization of an unprecedented number of sequence variants in both gene regulatory regions and protein coding sequences. This has enabled the study of nearly complete combinatorial libraries of mutational variants and revealed the widespread influence of higher-order genetic interactions that arise when multiple mutations are combined. However, the lack of appropriate tools for exploratory analysis of this high-dimensional data limits our overall understanding of the main qualitative properties of complex genotype-phenotype maps. To fill this gap, we have developed gpmap-tools (https://github.com/cmarti/gpmap-tools), a python library that integrates Gaussian process models for inference, phenotypic imputation, and error estimation from incomplete and noisy MAVE data and collections of natural sequences, together with methods for summarizing patterns of higher-order epistasis and non-linear dimensionality reduction techniques that allow visualization of genotype-phenotype maps containing up to millions of genotypes. Here, we used gpmap-tools to study the genotype-phenotype map of the Shine-Dalgarno sequence, a motif that modulates binding of the 16S rRNA to the 5' untranslated region (UTR) of mRNAs through base pair complementarity during translation initiation in prokaryotes. We inferred full combinatorial landscapes containing 262,144 different sequences from the sequences of 5,311 5'UTRs in the E. coli genome and from experimental MAVE data. Visualizations of the inferred landscapes were largely consistent with each other, and unveiled a simple molecular mechanism underlying the highly epistatic genotype-phenotype map of the Shine-Dalgarno sequence.

Item Type: Paper
Subjects: bioinformatics
bioinformatics > computational biology > algorithms
bioinformatics > computational biology
CSHL Authors:
Communities: CSHL labs > Kinney lab
CSHL labs > McCandlish lab
SWORD Depositor: CSHL Elements
Depositing User: CSHL Elements
Date: 13 March 2025
Date Deposited: 24 Mar 2025 12:33
Last Modified: 24 Mar 2025 12:33
Related URLs:
URI: https://repository.cshl.edu/id/eprint/41829

Actions (login required)

Administrator's edit/view item Administrator's edit/view item