Modeling the evolution dynamics of exon-intron structure with a general random fragmentation process

Wang, L., Stein, L. D. (2013) Modeling the evolution dynamics of exon-intron structure with a general random fragmentation process. BMC Evolutionary Biology, 13. p. 57. ISSN 1471-2148 (Electronic)1471-2148 (Linking)

[thumbnail of Paper]
Preview
PDF (Paper)
Stein BMC Evolutionary Biology 2013.pdf - Published Version

Download (1MB) | Preview

Abstract

BACKGROUND: Most eukaryotic genes are interrupted by spliceosomal introns. The evolution of exon-intron structure remains mysterious despite rapid advance in genome sequencing technique. In this work, a novel approach is taken based on the assumptions that the evolution of exon-intron structure is a stochastic process, and that the characteristics of this process can be understood by examining its historical outcome, the present-day size distribution of internal translated exons (exon). Through the combination of simulation and modeling the size distribution of exons in different species, we propose a general random fragmentation process (GRFP) to characterize the evolution dynamics of exon-intron structure. This model accurately predicts the probability that an exon will be split by a new intron and the distribution of novel insertions along the length of the exon. RESULTS: As the first observation from this model, we show that the chance for an exon to obtain an intron is proportional to its size to the 3rd power. We also show that such size dependence is nearly constant across gene, with the exception of the exons adjacent to the 5' UTR. As the second conclusion from the model, we show that intron insertion loci follow a normal distribution with a mean of 0.5 (center of the exon) and a standard deviation of 0.11. Finally, we show that intron insertions within a gene are independent of each other for vertebrates, but are more negatively correlated for non-vertebrate. We use simulation to demonstrate that the negative correlation might result from significant intron loss during evolution, which could be explained by selection against multi-intron genes in these organisms. CONCLUSIONS: The GRFP model suggests that intron gain is dynamic with a higher chance for longer exons; introns are inserted into exons randomly with the highest probability at the center of the exon. GRFP estimates that there are 78 introns in every 10 kb coding sequences for vertebrate genomes, agreeing with empirical observations. GRFP also estimates that there are significant intron losses in the evolution of non-vertebrate genomes, with extreme cases of around 57% intron loss in Drosophila melanogaster, 28% in Caenorhabditis elegans, and 24% in Oryza sativa.

Item Type: Paper
Uncontrolled Keywords: Animals Computer Simulation Evolution, Molecular Exons/ genetics Introns/ genetics Invertebrates/genetics Models, Genetic Plants/genetics Vertebrates/genetics
Subjects: bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification
bioinformatics > genomics and proteomics > genetics & nucleic acid processing
evolution
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification > exons
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification > introns
CSHL Authors:
Depositing User: Matt Covey
Date: 2013
Date Deposited: 24 Jun 2013 20:41
Last Modified: 17 Sep 2013 19:17
PMCID: PMC3732091
Related URLs:
URI: https://repository.cshl.edu/id/eprint/28367

Actions (login required)

Administrator's edit/view item Administrator's edit/view item