Computational identification of evolutionarily conserved exons

Siepel, A., Haussler, D. (2004) Computational identification of evolutionarily conserved exons. Proceedings of the eighth annual international conference on Resaerch in computational molecular biology. pp. 177-186.

URL: http://dl.acm.org/citation.cfm?id=974638&CFID=6169...
DOI: 10.1145/974614.974638

Abstract

Phylogenetic hidden Markov models (phylo-HMMs) have recently been proposed as a means for addressing a multi-species version of the ab initio gene prediction problem. These models allow sequence divergence, a phylogeny, patterns of substitution, and base composition all to be considered simultaneously, in a single unified probabilistic model. Here, we apply phylo-HMMs to a restricted version of the gene prediction problem in which individual exons are sought that are evolutionarily conserved across a diverse set of species. We discuss two new methods for improving prediction performance: (1) the use of context-dependent phylogenetic models, which capture phenomena such as a strong CpG effect in noncoding regions and a preference for synonymous rather than nonsynonymous substitutions in coding regions; and (2) a novel strategy for incorporating insertions and deletion (indels) into the state-transition structure of the model, which captures the different characteristic patterns of alignment gaps in coding and noncoding regions. We also discuss the technique, previously used in pairwise gene predictors, of explicitly modeling conserved noncoding sequence to help reduce false positive predictions. These methods have been incorporated into an exon prediction program called ExoniPhy, and tested with two large data sets. Experimental results indicate that all three methods produce significant improvements in prediction performance. In combination, they lead to prediction accuracy comparable to that of some of the best available gene predictors, despite several limitations of our current models.

Item Type: Paper
Subjects: bioinformatics > computational biology
evolution
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification > exons
CSHL Authors:
Communities: CSHL labs > Siepel lab
Depositing User: Matt Covey
Date: 2004
Date Deposited: 12 Jan 2015 20:40
Last Modified: 12 Jan 2015 20:40
URI: https://repository.cshl.edu/id/eprint/31103

Actions (login required)

Administrator's edit/view item Administrator's edit/view item
CSHL HomeAbout CSHLResearchEducationNews & FeaturesCampus & Public EventsCareersGiving