Finding errors in DNA sequences

Posfai, J., Roberts, R. J. (May 1992) Finding errors in DNA sequences. Proc Natl Acad Sci U S A, 89 (10). pp. 4698-702. ISSN 0027-8424 (Print)0027-8424 (Linking)

[thumbnail of Paper]
Preview
PDF (Paper)
Roberts PNAS 1992.pdf - Published Version

Download (1MB) | Preview

Abstract

An algorithm is described that can detect certain errors within coding regions of DNA sequences. The algorithm is based on the idea that an insertion or deletion error within a coding sequence would interrupt the reading frame and cause the correct translation of a DNA sequence to require one or more frameshifts. If the coding sequence shows similarity to a known protein sequence then such errors can be detected by comparing the conceptual translations of DNA sequences in all six reading frames with every sequence in a protein sequence data base. We have incorporated these ideas into a computer program, called DETECT, that can serve as an aid to the experimentalist who is determining new DNA sequences so that obvious errors may be located and corrected. The program has been tested using raw experimental data and against sequences from the European Molecular Biology Laboratory data base, annotated as containing frameshifts. We have also tested it using unidentified open reading frames that flank known, annotated genes in the GenBank data base. Many potential errors are apparent and in some cases functions can be suggested for the "corrected" versions of these reading frames leading to the identification of new genes. As more sequences are determined the power of this method will increase substantially.

Item Type: Paper
Uncontrolled Keywords: Adenylate Cyclase/genetics Algorithms Amino Acid Sequence Animals Bordetella pertussis/enzymology/genetics DNA/*genetics Databases, Factual/*standards Genes, Bacterial Humans Molecular Sequence Data Nucleotidyltransferases/genetics Protein Biosynthesis Proteins/*genetics Reading Frames Transposases
Subjects: bioinformatics > genomics and proteomics > alignment
bioinformatics
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification
bioinformatics > genomics and proteomics > computers > computer software
CSHL Authors:
Depositing User: Matt Covey
Date: 15 May 1992
Date Deposited: 22 Sep 2015 19:07
Last Modified: 09 Nov 2017 20:15
PMCID: PMC49150
Related URLs:
URI: https://repository.cshl.edu/id/eprint/31857

Actions (login required)

Administrator's edit/view item Administrator's edit/view item