Probabilistic and machine-learning methods for predicting local rates of transcription elongation from nascent RNA sequencing data

Liu, Lingjie, Zhao, Yixin, Hassett, Rebecca, Toneyan, Shushan, Koo, Peter K, Siepel, Adam (February 2025) Probabilistic and machine-learning methods for predicting local rates of transcription elongation from nascent RNA sequencing data. Nucleic Acids Research (NAR), 53 (4). ISSN 1362-4962 (Public Dataset)

[thumbnail of 10.1093.nar.gkaf092.pdf]
Preview
PDF
10.1093.nar.gkaf092.pdf - Published Version
Available under License Creative Commons Attribution.

Download (2MB) | Preview

Abstract

Rates of transcription elongation vary within and across eukaryotic gene bodies. Here, we introduce new methods for predicting elongation rates from nascent RNA sequencing data. First, we devise a probabilistic model that predicts nucleotide-specific elongation rates as a generalized linear function of nearby genomic and epigenomic features. We validate this model with simulations and apply it to public PRO-seq (Precision Run-On Sequencing) and epigenomic data for four cell types, finding that reductions in local elongation rate are associated with cytosine nucleotides, DNA methylation, splice sites, RNA stem-loops, CTCF (CCCTC-binding factor) binding sites, and several histone marks, including H3K36me3 and H4K20me1. By contrast, increases in local elongation rate are associated with thymines, A+T-rich and low-complexity sequences, and H3K79me2 marks. We then introduce a convolutional neural network that improves our local rate predictions. Our analysis is the first to permit genome-wide predictions of relative nucleotide-specific elongation rates.

Item Type: Paper
Subjects: bioinformatics
bioinformatics > genomics and proteomics > genetics & nucleic acid processing
bioinformatics > genomics and proteomics
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > protein structure, function, modification
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > protein structure, function, modification > protein types > enzymes
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > protein structure, function, modification > protein types > histone
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > protein structure, function, modification > protein types
CSHL Authors:
Communities: CSHL labs > Koo Lab
CSHL labs > Siepel lab
SWORD Depositor: CSHL Elements
Depositing User: CSHL Elements
Date: 8 February 2025
Date Deposited: 18 Feb 2025 19:39
Last Modified: 18 Feb 2025 19:39
Dataset ID:
  • 10.5281/zenodo.14757127
  • 10.5281/zenodo.14757102
  • http://compgen.cshl.edu/elongation-rate-tracks.php
URI: https://repository.cshl.edu/id/eprint/41797

Actions (login required)

Administrator's edit/view item Administrator's edit/view item