A Unified Probabilistic Modeling Framework for Eukaryotic Transcription Based on Nascent RNA Sequencing Data

Siepel, Adam (January 2021) A Unified Probabilistic Modeling Framework for Eukaryotic Transcription Based on Nascent RNA Sequencing Data. BioRxiv. (Unpublished)

[thumbnail of 2022.Siepel.rnaseq.pdf] PDF
2022.Siepel.rnaseq.pdf
Available under License Creative Commons Attribution.

Download (1MB)
DOI: 10.1101/2021.01.12.426408

Abstract

Nascent RNA sequencing protocols, such as PRO-seq and NET-seq, are now widely used in the study of eukaryotic transcription, and these experimental techniques have given rise to a variety of statistical and machine-learning methods for data analysis. These computational methods, however, are generally designed to address specialized signal-processing or prediction tasks, rather than directly describing the dynamics of RNA polymerases as they move along the DNA template. Here, I introduce a general probabilistic model that describes the kinetics of transcription initiation, elongation, pause release, and termination, as well as the generation of sequencing read counts. I show that this generative model enables estimation of separate pause-release rates, termination rates, and the initiation/elongation rate ratio up to a proportionality constant. Furthermore, if applied to time-course data in a nonequilibrium setting, the model can be used to estimate elongation rates. This model leads naturally to likelihood ratio tests for differences between genes, conditions, or species in various rates of interest. If read counts are assumed to be Poisson-distributed, convenient, closed-form solutions are available for both parameter estimates and likelihood-ratio-test statistics. Straightforward extensions of the model accommodate uncertainty in the pause site and steric hindrance of initiation by paused polymerases. Additional extensions address Bayesian inference under the Poisson model and a generalized linear model that can be used to discover genomic features associated with rates of elongation. Finally, I address technicalities concerning estimation of library size, normalization and sequencing replicates. Altogether, this modeling framework enables a unified treatment of many common tasks in the analysis of nascent RNA sequencing data.

Item Type: Paper
Subjects: bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification > transcription
bioinformatics > computational biology
Investigative techniques and equipment > assays > RNA-seq
CSHL Authors:
Communities: CSHL labs > Siepel lab
SWORD Depositor: CSHL Elements
Depositing User: CSHL Elements
Date: 14 January 2021
Date Deposited: 26 May 2022 15:42
Last Modified: 26 May 2022 15:42
URI: https://repository.cshl.edu/id/eprint/40636

Actions (login required)

Administrator's edit/view item Administrator's edit/view item
CSHL HomeAbout CSHLResearchEducationNews & FeaturesCampus & Public EventsCareersGiving