STAR: ultrafast universal RNA-seq aligner

Dobin, A., Davis, C. A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., Chaisson, M., Gingeras, T. R. (January 2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics, 29 (1). pp. 15-21. ISSN 1367-4803

URL: http://www.ncbi.nlm.nih.gov/pubmed/23104886

DOI: 10.1093/bioinformatics/bts635

Abstract

Motivation: Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. Results: To align our large (> 80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of > 50 in mapping speed, aligning to the human genome 550 million 2 x 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy.

Item Type:	Paper
Uncontrolled Keywords:	splice junctions alignment reads algorithms sequence genomes encode
Subjects:	bioinformatics > genomics and proteomics > alignment bioinformatics > genomics and proteomics > analysis and processing > alignment processing bioinformatics > genomics and proteomics > analysis and processing bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification bioinformatics > genomics and proteomics > genetics & nucleic acid processing bioinformatics > genomics and proteomics bioinformatics > genomics and proteomics > alignment > sequence alignment bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification > splice site
CSHL Authors:	Batut, Phillippe Davis, Carrie A. Dobin, Alexander Drenkow, Jorg Gingeras, Thomas R. Jha, Sonali Schlesinger, Felix J. Zaleski, Christopher
Communities:	CSHL Cancer Center Program > Gene Regulation and Cell Proliferation CSHL Cancer Center Shared Resources > Bioinformatics Service CSHL labs > Gingeras lab School of Biological Sciences > Publications CSHL labs > Dobin Lab
Depositing User:	Matt Covey
Date:	January 2013
Date Deposited:	29 Mar 2013 19:19
Last Modified:	08 Jul 2020 18:14
PMCID:	PMC3530905
Related URLs:	Publisher
URI:	https://repository.cshl.edu/id/eprint/28071

Actions (login required)

Administrator's edit/view item