Martí-Gómez, Carlos, Zhou, Juannan, Chen, Wei-Chia, Kinney, Justin B, McCandlish, David M (March 2025) Inference and visualization of complex genotype-phenotype maps with gpmap-tools. bioRxiv. ISSN 2692-8205 (Submitted)
![]() |
PDF
10.1101.2025.03.09.642267.pdf - Submitted Version Available under License Creative Commons Attribution. Download (6MB) |
Abstract
Multiplex assays of variant effect (MAVEs) allow the functional characterization of an unprecedented number of sequence variants in both gene regulatory regions and protein coding sequences. This has enabled the study of nearly complete combinatorial libraries of mutational variants and revealed the widespread influence of higher-order genetic interactions that arise when multiple mutations are combined. However, the lack of appropriate tools for exploratory analysis of this high-dimensional data limits our overall understanding of the main qualitative properties of complex genotype-phenotype maps. To fill this gap, we have developed gpmap-tools (https://github.com/cmarti/gpmap-tools), a python library that integrates Gaussian process models for inference, phenotypic imputation, and error estimation from incomplete and noisy MAVE data and collections of natural sequences, together with methods for summarizing patterns of higher-order epistasis and non-linear dimensionality reduction techniques that allow visualization of genotype-phenotype maps containing up to millions of genotypes. Here, we used gpmap-tools to study the genotype-phenotype map of the Shine-Dalgarno sequence, a motif that modulates binding of the 16S rRNA to the 5' untranslated region (UTR) of mRNAs through base pair complementarity during translation initiation in prokaryotes. We inferred full combinatorial landscapes containing 262,144 different sequences from the sequences of 5,311 5'UTRs in the E. coli genome and from experimental MAVE data. Visualizations of the inferred landscapes were largely consistent with each other, and unveiled a simple molecular mechanism underlying the highly epistatic genotype-phenotype map of the Shine-Dalgarno sequence.
Item Type: | Paper |
---|---|
Subjects: | bioinformatics bioinformatics > computational biology > algorithms bioinformatics > computational biology |
CSHL Authors: | |
Communities: | CSHL labs > Kinney lab CSHL labs > McCandlish lab |
SWORD Depositor: | CSHL Elements |
Depositing User: | CSHL Elements |
Date: | 13 March 2025 |
Date Deposited: | 24 Mar 2025 12:33 |
Last Modified: | 24 Mar 2025 12:33 |
Related URLs: | |
URI: | https://repository.cshl.edu/id/eprint/41829 |
Actions (login required)
![]() |
Administrator's edit/view item |