Statistical analysis of the DNA sequence of human chromosome 22

Holste, D., Grosse, I., Herzel, H. (October 2001) Statistical analysis of the DNA sequence of human chromosome 22. Physical Review E, 6404 (4). ISSN 1063-651X

[thumbnail of Paper]
Preview
PDF (Paper)
Grosse Physical Review E 2001.pdf - Published Version

Download (438kB) | Preview

Abstract

We study statistical patterns in the DNA sequence of human chromosome 22, the first completely sequenced human chromosome. We find that (i) the 33.4 x 10(6) nucleotide long human chromosome exhibits long-range power-law correlations over more than four orders of magnitude, (ii) the entropies H-n of the frequency distribution of oligonucleotides of length n (n-mers) grow sublinearly with increasing n, indicating the presence of higher-order correlations for all of the studied lengths 1 less than or equal to n less than or equal to 10, and (iii) the generalized entropies H-n(q) of n-mers decrease monotonically with increasing q and the decay of H-n(q) with q becomes steeper with increasing n less than or equal to 10, indicating that the frequency distribution of oligonucleotides becomes increasingly nonuniform as the length n increases. We investigate to what degree known biological features may explain the observed statistical patterns. We find that (iv) the presence of interspersed repeats may cause the sublinear increase of H-n with n, and that (v) the presence of monomeric tandem repeats as well as the suppression of CG dinucleotides may cause the observed decay of H-n(q) with q.

Item Type: Paper
Uncontrolled Keywords: RANGE FRACTAL CORRELATIONS NONCODING DNA NUCLEOTIDE-SEQUENCES REPEATS HUMAN-CHROMOSOME-22 ENTROPIES ALU IDENTIFICATION ORGANIZATION LINGUISTICS
Subjects: bioinformatics > genomics and proteomics > analysis and processing
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification
bioinformatics > genomics and proteomics > genetics & nucleic acid processing
bioinformatics > genomics and proteomics
bioinformatics > genomics and proteomics > analysis and processing > Sequence Data Processing
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification > chromosome
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification > chromosomes, structure and function > chromosome
CSHL Authors:
Communities: CSHL labs > Zhang lab
Depositing User: Matt Covey
Date: October 2001
Date Deposited: 21 Jan 2014 17:03
Last Modified: 21 Jan 2014 17:03
Related URLs:
URI: https://repository.cshl.edu/id/eprint/29253

Actions (login required)

Administrator's edit/view item Administrator's edit/view item