Inference of natural selection from interspersed genomic elements based on polymorphism and divergence

Gronau, I., Arbiza, L., Mohammed, J., Siepel, A. (May 2013) Inference of natural selection from interspersed genomic elements based on polymorphism and divergence. Mol Biol Evol, 30 (5). pp. 1159-71. ISSN 0737-4038

DOI: 10.1093/molbev/mst019


Complete genome sequences contain valuable information about natural selection, but this information is difficult to access for short, widely scattered noncoding elements such as transcription factor binding sites or small noncoding RNAs. Here, we introduce a new computational method, called Inference of Natural Selection from Interspersed Genomically coHerent elemenTs (INSIGHT), for measuring the influence of natural selection on such elements. INSIGHT uses a generative probabilistic model to contrast patterns of polymorphism and divergence in the elements of interest with those in flanking neutral sites, pooling weak information from many short elements in a manner that accounts for variation among loci in mutation rates and coalescent times. The method is able to disentangle the contributions of weak negative, strong negative, and positive selection based on their distinct effects on patterns of polymorphism and divergence. It obtains information about divergence from multiple outgroup genomes using a general statistical phylogenetic approach. The INSIGHT model is efficiently fitted to genome-wide data using an approximate expectation maximization algorithm. Using simulations, we show that the method can accurately estimate the parameters of interest even in complex demographic scenarios, and that it significantly improves on methods based on summary statistics describing polymorphism and divergence. To demonstrate the usefulness of INSIGHT, we apply it to several classes of human noncoding RNAs and to GATA2-binding sites in the human genome.

Item Type: Paper
Uncontrolled Keywords: DNA/genetics *Evolution, Molecular Genetic Variation/genetics Genetics, Population Humans Phylogeny Polymorphism, Genetic/*genetics Regulatory Sequences, Nucleic Acid/genetics Selection, Genetic/*genetics
Subjects: bioinformatics
bioinformatics > genomics and proteomics
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > genomes
CSHL Authors:
Communities: CSHL labs > Siepel lab
Depositing User: Matt Covey
Date: May 2013
Date Deposited: 15 Jan 2015 17:13
Last Modified: 15 Jan 2015 17:13
PMCID: PMC3697874
Related URLs:

Actions (login required)

Administrator's edit/view item Administrator's edit/view item
CSHL HomeAbout CSHLResearchEducationNews & FeaturesCampus & Public EventsCareersGiving