Designing DNA With Tunable Regulatory Activity Using Discrete Diffusion

Sarkar, Anirban, Tang, Ziqi, Zhao, Chris, Koo, Peter (May 2024) Designing DNA With Tunable Regulatory Activity Using Discrete Diffusion. bioRxiv. (Submitted)

[thumbnail of 10.1101.2024.05.23.595630.pdf] PDF
10.1101.2024.05.23.595630.pdf - Submitted Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (11MB)
DOI: 10.1101/2024.05.23.595630

Abstract

Engineering regulatory DNA sequences with precise activity levels in specific cell types hold immense potential for medicine and biotechnology. However, the vast combinatorial space of possible sequences and the complex regulatory grammars governing gene regulation have proven challenging for existing approaches. Supervised deep learning models that score sequences proposed by local search algorithms ignore the global structure of functional sequence space. While diffusion-based generative models have shown promise in learning these distributions, their application to regulatory DNA has been limited. Evaluating the quality of generated sequences also remains challenging due to a lack of a unified framework that characterizes key properties of regulatory DNA. Here we introduce DNA Discrete Diffusion (D3), a generative framework for conditionally sampling regulatory sequences with targeted functional activity levels. We develop a comprehensive suite of evaluation metrics that assess the functional similarity, sequence similarity, and regulatory composition of generated sequences. Through benchmarking on three high-quality functional genomics datasets spanning human promoters and fly enhancers, we demonstrate that D3 outperforms existing methods in capturing the diversity of cis-regulatory grammars and generating sequences that more accurately reflect the properties of genomic regulatory DNA. Furthermore, we show that D3-generated sequences can effectively augment supervised models and improve their predictive performance, even in data-limited scenarios.

Item Type: Paper
Subjects: bioinformatics
bioinformatics > quantitative biology
bioinformatics > quantitative biology > quantitative genetics
CSHL Authors:
Communities: CSHL labs > Koo Lab
SWORD Depositor: CSHL Elements
Depositing User: CSHL Elements
Date: 24 May 2024
Date Deposited: 29 May 2024 15:12
Last Modified: 29 May 2024 15:12
Related URLs:
URI: https://repository.cshl.edu/id/eprint/41570

Actions (login required)

Administrator's edit/view item Administrator's edit/view item
CSHL HomeAbout CSHLResearchEducationNews & FeaturesCampus & Public EventsCareersGiving