Distribution of short paired duplications in mammalian genomes

Thomas, E. E., Srebro, N., Sebat, J., Navin, N., Healy, J., Mishra, B., Wigler, M. H. (July 2004) Distribution of short paired duplications in mammalian genomes. Proc Natl Acad Sci U S A, 101 (28). pp. 10349-54. ISSN 0027-8424 (Print)

[thumbnail of Paper]
Preview
PDF (Paper)
Wigler et al PNAS 2004.pdf - Published Version

Download (529kB) | Preview

Abstract

Mammalian genomes are densely populated with long duplicated sequences. In this paper, we demonstrate the existence of doublets, short duplications between 25 and 100 bp, distinct from previously described repeats. Each doublet is a pair of exact matches, separated by some distance. The distribution of these intermatch distances is strikingly nonrandom. An unexpectedly high number of doublets have matches either within 100 bp (adjacent) or at distances tightly concentrated approximately 1,000 bp apart (nearby). We focus our study on these proximate doublets. First, they tend to have both matches on the same strand. By comparing nearby doublets shared in human and chimpanzee, we can also see that these doublets seem to arise by an insertion event that produces a copy without markedly affecting the surrounding sequence. Most doublets in humans are shared with chimpanzee, but many new pairs arose after the divergence of the species. Doublets found in human but not chimpanzee are most often composed of almost tandem matches, whereas older doublets (found in both species) are more likely to have matches spaced by approximately 1 kb, indicating that the nearly tandem doublets may be more dynamic. The spacing of doublets is highly conserved. So far, we have found clearly recognizable doublets in the following genomes: Homo sapiens, Mus musculus, Arabidopsis thaliana, and Caenorhabditis elegans, indicating that the mechanism generating these doublets is widespread. A mechanism that generates short local duplications while conserving polarity could have a profound impact on the evolution of regulatory and protein-coding sequences.

Item Type: Paper
Uncontrolled Keywords: Animals Arabidopsis Base Sequence Caenorhabditis elegans DNA Transposable Elements genetics Evolution Molecular Gene Duplication Genome, Human Humans Mice Molecular Sequence Data Pan troglodytes
Subjects: bioinformatics > genomics and proteomics > Mapping and Rendering > DNA Structure Rendering
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification > DNA expression
CSHL Authors:
Communities: CSHL labs > Wigler lab
CSHL labs > Sebat lab
School of Biological Sciences > Publications
Depositing User: CSHL Librarian
Date: 13 July 2004
Date Deposited: 06 Apr 2012 13:36
Last Modified: 09 Nov 2017 17:05
PMCID: PMC478600
Related URLs:
URI: https://repository.cshl.edu/id/eprint/25999

Actions (login required)

Administrator's edit/view item Administrator's edit/view item