Smith, A. D., Xuan, Z. Y., Zhang, M. Q. (February 2008) Using quality scores and longer reads improves accuracy of Solexa read mapping. Bmc Bioinformatics, 9 (128). ISSN 1471-2105
Preview |
PDF (Paper)
Zhang BMC Bioinformatics 2008.pdf - Published Version Download (450kB) | Preview |
Abstract
Background: Second-generation sequencing has the potential to revolutionize genomics and impact all areas of biomedical science. New technologies will make re-sequencing widely available for such applications as identifying genome variations or interrogating the oligonucleotide content of a large sample ( e. g. ChIP-sequencing). The increase in speed, sensitivity and availability of sequencing technology brings demand for advances in computational technology to perform associated analysis tasks. The Solexa/Illumina IG sequencer can produce tens of millions of reads, ranging in length from similar to 25-50 nt, in a single experiment. Accurately mapping the reads back to a reference genome is a critical task in almost all applications. Two sources of information that are often ignored when mapping reads from the Solexa technology are the 3' ends of longer reads, which contain a much higher frequency of sequencing errors, and the base-call quality scores. Results: To investigate whether these sources of information can be used to improve accuracy when mapping reads, we developed the RMAP tool, which can map reads having a wide range of lengths and allows base-call quality scores to determine which positions in each read are more important when mapping. We applied RMAP to analyze data re-sequenced from two human BAC regions for varying read lengths, and varying criteria for use of quality scores. RMAP is freely available for downloading at http://rulai.cshl.edu/rmap/. Conclusion: Our results indicate that significant gains in Solexa read mapping performance can be achieved by considering the information in 3' ends of longer reads, and appropriately using the base-call quality scores. The RMAP tool we have developed will enable researchers to effectively exploit this information in targeted re-sequencing projects.
Item Type: | Paper |
---|---|
Uncontrolled Keywords: | GENOME SEARCH |
Subjects: | bioinformatics bioinformatics > genomics and proteomics bioinformatics > genomics and proteomics > Mapping and Rendering |
CSHL Authors: | |
Communities: | CSHL labs > Zhang lab |
Depositing User: | Matt Covey |
Date: | February 2008 |
Date Deposited: | 22 Feb 2013 19:35 |
Last Modified: | 22 Feb 2013 19:35 |
PMCID: | PMC2335322 |
Related URLs: | |
URI: | https://repository.cshl.edu/id/eprint/27638 |
Actions (login required)
Administrator's edit/view item |