Siepel, Adam (January 2021) A Unified Probabilistic Modeling Framework for Eukaryotic Transcription Based on Nascent RNA Sequencing Data. bioRxiv. (Unpublished)
PDF
2022.Siepel.rnaseq.pdf Available under License Creative Commons Attribution. Download (1MB) |
Abstract
Nascent RNA sequencing protocols, such as PRO-seq and NET-seq, are now widely used in the study of eukaryotic transcription, and these experimental techniques have given rise to a variety of statistical and machine-learning methods for data analysis. These computational methods, however, are generally designed to address specialized signal-processing or prediction tasks, rather than directly describing the dynamics of RNA polymerases as they move along the DNA template. Here, I introduce a general probabilistic model that describes the kinetics of transcription initiation, elongation, pause release, and termination, as well as the generation of sequencing read counts. I show that this generative model enables estimation of separate pause-release rates, termination rates, and the initiation/elongation rate ratio up to a proportionality constant. Furthermore, if applied to time-course data in a nonequilibrium setting, the model can be used to estimate elongation rates. This model leads naturally to likelihood ratio tests for differences between genes, conditions, or species in various rates of interest. If read counts are assumed to be Poisson-distributed, convenient, closed-form solutions are available for both parameter estimates and likelihood-ratio-test statistics. Straightforward extensions of the model accommodate uncertainty in the pause site and steric hindrance of initiation by paused polymerases. Additional extensions address Bayesian inference under the Poisson model and a generalized linear model that can be used to discover genomic features associated with rates of elongation. Finally, I address technicalities concerning estimation of library size, normalization and sequencing replicates. Altogether, this modeling framework enables a unified treatment of many common tasks in the analysis of nascent RNA sequencing data.
Item Type: | Paper |
---|---|
Subjects: | bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification > transcription bioinformatics > computational biology Investigative techniques and equipment > assays > RNA-seq |
CSHL Authors: | |
Communities: | CSHL labs > Siepel lab |
SWORD Depositor: | CSHL Elements |
Depositing User: | CSHL Elements |
Date: | 14 January 2021 |
Date Deposited: | 26 May 2022 15:42 |
Last Modified: | 20 May 2024 19:40 |
URI: | https://repository.cshl.edu/id/eprint/40636 |
Actions (login required)
Administrator's edit/view item |