High-throughput sequence alignment using Graphics Processing Units

Schatz, M. C., Trapnell, C., Delcher, A. L., Varshney, A. (2007) High-throughput sequence alignment using Graphics Processing Units. Bmc Bioinformatics, 8. ISSN 14712105 (ISSN)

[thumbnail of Paper]
Preview
PDF (Paper)
Schatz BMC Bioinformatics 2007.pdf - Published Version

Download (432kB) | Preview
URL: http://www.ncbi.nlm.nih.gov/pubmed/18070356
DOI: 10.1186/1471-2105-8-474

Abstract

Background: The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. Results: This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs) in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. Conclusion: MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU. © 2007 Schatz et al; licensee BioMed Central Ltd.

Item Type: Paper
Uncontrolled Keywords: DNA article bioinformatics computer input device computer interface data analysis software DNA sequence genetic algorithm genetic database genomics high throughput screening information processing intermethod comparison nonhuman nucleotide sequence sequence analysis animal Bacillus anthracis Caenorhabditis computer computer graphics contig mapping data base economics gene library genetics instrumentation Listeria monocytogenes methodology sequence alignment Streptococcus suis task performance time ultrastructure Animals Base Sequence Computers Database Management Systems Databases, Genetic Genomic Library Sequence Analysis, DNA Time Factors Work Simplification
Subjects: bioinformatics > genomics and proteomics > alignment
bioinformatics
bioinformatics > genomics and proteomics > computers
bioinformatics > genomics and proteomics > genetics & nucleic acid processing
bioinformatics > genomics and proteomics
bioinformatics > genomics and proteomics > alignment > sequence alignment
bioinformatics > genomics and proteomics > computers > computer hardware
bioinformatics > genomics and proteomics > computers > computer software
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > genomes
CSHL Authors:
Communities: CSHL labs > Schatz lab
Depositing User: Matt Covey
Date: 2007
Date Deposited: 15 Mar 2013 17:49
Last Modified: 15 Mar 2013 17:49
PMCID: PMC2222658
Related URLs:
URI: https://repository.cshl.edu/id/eprint/27833

Actions (login required)

Administrator's edit/view item Administrator's edit/view item
CSHL HomeAbout CSHLResearchEducationNews & FeaturesCampus & Public EventsCareersGiving