Automated correction of genome sequence errors

Gajer, P., Schatz, M., Salzberg, S. L. (2004) Automated correction of genome sequence errors. Nucleic Acids Research, 32 (2). pp. 562-569. ISSN 03051048 (ISSN)

[thumbnail of Paper]
Preview
PDF (Paper)
Schatz Nucleic Acids Research 2004.pdf - Published Version

Download (201kB) | Preview
URL: http://www.ncbi.nlm.nih.gov/pubmed/14744981
DOI: 10.1093/nar/gkh216

Abstract

By using information from an assembly of a genome, a new program called AutoEditor significantly improves base calling accuracy over that achieved by previous algorithms. This in turn improves the overall accuracy of genome sequences and facilitates the use of these sequences for polymorphism discovery. We describe the algorithm and its application in a large set of recent genome sequencing projects. The number of erroneous base calls in these projects was reduced by 80%. In an analysis of over one million corrections, we found that AutoEditor made just one error per 8828 corrections. By substantially increasing the accuracy of base calling, AutoEditor can dramatically accelerate the process of finishing genomes, which involves closing all gaps and ensuring minimum quality standards for the final sequence. It also greatly improves our ability to discover single nucleotide polymorphisms (SNPs) between closely related strains and isolates of the same species. © Oxford University Press 2004; all rights reserved.

Item Type: Paper
Uncontrolled Keywords: DNA base acceleration accuracy analytical error article automation computer program controlled study gene sequence genetic algorithm genetic polymorphism genome analysis nonhuman priority journal sequence analysis single nucleotide polymorphism species standard Algorithms Animals Base Sequence Genome Genomics Molecular Sequence Data Polymorphism, Single Nucleotide Research Design Sensitivity and Specificity Software
Subjects: bioinformatics
bioinformatics > genomics and proteomics > computers
bioinformatics > genomics and proteomics > genetics & nucleic acid processing
bioinformatics > genomics and proteomics
bioinformatics > genomics and proteomics > computers > computer software
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > genomes
CSHL Authors:
Communities: CSHL labs > Schatz lab
Depositing User: Matt Covey
Date: 2004
Date Deposited: 15 Mar 2013 18:05
Last Modified: 15 Mar 2013 18:05
PMCID: PMC373340
Related URLs:
URI: https://repository.cshl.edu/id/eprint/27821

Actions (login required)

Administrator's edit/view item Administrator's edit/view item
CSHL HomeAbout CSHLResearchEducationNews & FeaturesCampus & Public EventsCareersGiving