Detection of nonneutral substitution rates on mammalian phylogenies

Pollard, K. S., Hubisz, M. J., Rosenbloom, K. R., Siepel, A. (2010) Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res, 20 (1). pp. 110-21. ISSN 1088-9051

PDF (Paper)
Siepel Genome Research 2010.pdf - Published Version

Download (1519Kb) | Preview
DOI: 10.1101/gr.097857.109


Methods for detecting nucleotide substitution rates that are faster or slower than expected under neutral drift are widely used to identify candidate functional elements in genomic sequences. However, most existing methods consider either reductions (conservation) or increases (acceleration) in rate but not both, or assume that selection acts uniformly across the branches of a phylogeny. Here we examine the more general problem of detecting departures from the neutral rate of substitution in either direction, possibly in a clade-specific manner. We consider four statistical, phylogenetic tests for addressing this problem: a likelihood ratio test, a score test, a test based on exact distributions of numbers of substitutions, and the genomic evolutionary rate profiling (GERP) test. All four tests have been implemented in a freely available program called phyloP. Based on extensive simulation experiments, these tests are remarkably similar in statistical power. With 36 mammalian species, they all appear to be capable of fairly good sensitivity with low false-positive rates in detecting strong selection at individual nucleotides, moderate selection in 3-bp elements, and weaker or clade-specific selection in longer elements. By applying phyloP to mammalian multiple alignments from the ENCODE project, we shed light on patterns of conservation/acceleration in known and predicted functional elements, approximate fractions of sites subject to constraint, and differences in clade-specific selection in the primate and glires clades. We also describe new "Conservation" tracks in the UCSC Genome Browser that display both phyloP and phastCons scores for genome-wide alignments of 44 vertebrate species.

Uncontrolled Keywords: Animals *Base Sequence Computer Simulation Conserved Sequence *Evolution, Molecular Humans Likelihood Functions Mammals/classification/*genetics Models, Genetic Models, Statistical *Phylogeny Primates/genetics *Selection, Genetic Sequence Alignment Software Species Specificity
Subjects: bioinformatics > genomics and proteomics > alignment > sequence alignment
bioinformatics > genomics and proteomics > computers > computer software
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > genomes
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > genomes > genome annotation
PMCID: PMC2798823
