Synthetic spike-in standards for RNA-seq experiments

Jiang, L. C., Schlesinger, F. J., Davis, C. A., Zhang, Y., Li, R. H., Salit, M., Gingeras, T. R., Oliver, B. (September 2011) Synthetic spike-in standards for RNA-seq experiments. Genome Research, 21 (9). pp. 1543-1551. ISSN 1088-9051 (Public Dataset)

[thumbnail of Paper]
Preview
PDF (Paper)
Synthetic Spice Standards.pdf - Published Version

Download (1MB) | Preview

Abstract

High-throughput sequencing of cDNA (RNA-seq) is a widely deployed transcriptome profiling and annotation technique, but questions about the performance of different protocols and platforms remain. We used a newly developed pool of 96 synthetic RNAs with various lengths, and GC content covering a 2(20) concentration range as spike-in controls to measure sensitivity, accuracy, and biases in RNA-seq experiments as well as to derive standard curves for quantifying the abundance of transcripts. We observed linearity between read density and RNA input over the entire detection range and excellent agreement between replicates, but we observed significantly larger imprecision than expected under pure Poisson sampling errors. We use the control RNAs to directly measure reproducible protocol-dependent biases due to GC content and transcript length as well as stereotypic heterogeneity in coverage across transcripts correlated with position relative to RNA termini and priming sequence bias. These effects lead to biased quantification for short transcripts and individual exons, which is a serious problem for measurements of isoform abundances, but that can partially be corrected using appropriate models of bias. By using the control RNAs, we derive limits for the discovery and detection of rare transcripts in RNA-seq experiments. By using data collected as part of the model organism and human Encyclopedia of DNA Elements projects (ENCODE and modENCODE), we demonstrate that external RNA controls are a useful resource for evaluating sensitivity and accuracy of RNA-seq experiments for transcriptome discovery and quantification. These quality metrics facilitate comparable analysis across different samples, protocols, and platforms.

Item Type: Paper
Uncontrolled Keywords: RNA transcriptome article controlled study genetic analysis genetic heterogeneity human human cell limit of quantitation Poisson distribution priority journal RNA sequence RNA transcription spike Animals Bias (Epidemiology) Gene Expression Profiling Gene Library High Throughput Nucleotide Sequencing Humans Quality Control Reproducibility of Results Sensitivity and Specificity Sequence Analysis RNA
Subjects: bioinformatics > genomics and proteomics > annotation > sequence annotation
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification > sRNA
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > transcriptomes
CSHL Authors:
Communities: CSHL labs > Gingeras lab
School of Biological Sciences > Publications
CSHL Cancer Center Shared Resources > DNA Sequencing Service
Depositing User: CSHL Librarian
Date: September 2011
Date Deposited: 07 Mar 2012 21:48
Last Modified: 26 Dec 2014 20:17
PMCID: PMC3166838
Related URLs:
Dataset ID:
URI: https://repository.cshl.edu/id/eprint/25348

Actions (login required)

Administrator's edit/view item Administrator's edit/view item