SIMPROT: Using an empirically determined indel distribution in simulations of protein evolution

Pang, A., Smith, A. D., Nuin, P. A. S., Tillier, E. R. (September 2005) SIMPROT: Using an empirically determined indel distribution in simulations of protein evolution. BMC Bioinformatics, 6. ISSN 1471-2105

[thumbnail of Paper]
Preview
PDF (Paper)
SynProt.pdf - Published Version

Download (273kB) | Preview
URL: http://apps.webofknowledge.com/InboundService.do?S...

Abstract

Background: General protein evolution models help determine the baseline expectations for the evolution of sequences, and they have been extensively useful in sequence analysis and for the computer simulation of artificial sequence data sets. Results: We have developed a new method of simulating protein sequence evolution, including insertion and deletion (indel) events in addition to amino-acid substitutions. The simulation generates both the simulated sequence family and a true sequence alignment that captures the evolutionary relationships between amino acids from different sequences. Our statistical model for indel evolution is based on the empirical indel distribution determined by Qian and Goldstein. We have parameterized this distribution so that it applies to sequences diverged by varying evolutionary times and generalized it to provide flexibility in simulation conditions. Our method uses a Monte-Carlo simulation strategy, and has been implemented in a C++ program named Simprot. Conclusion: Simprot will be useful for testing methods of analysis of protein sequence families particularly alignment methods, phylogenetic tree building, detection of recombination and horizontal gene transfer, and homology detection, where knowing the true course of sequence evolution is essential.

Item Type: Paper
Uncontrolled Keywords: Monte Carlo Simulation sequence evolution DNA sequences Insertions Deletions alignment models GEN
Subjects: bioinformatics > genomics and proteomics > annotation > phylogenetic tree annotation
bioinformatics > quantitative biology
bioinformatics > genomics and proteomics > annotation > sequence annotation
CSHL Authors:
Depositing User: CSHL Librarian
Date: September 2005
Date Deposited: 09 Jan 2012 14:31
Last Modified: 09 Jan 2012 14:31
URI: https://repository.cshl.edu/id/eprint/22673

Actions (login required)

Administrator's edit/view item Administrator's edit/view item
CSHL HomeAbout CSHLResearchEducationNews & FeaturesCampus & Public EventsCareersGiving