Computational comparison of two mouse draft genomes and the human golden path

Xuan, Z., Wang, J., Zhang, M. Q. (2003) Computational comparison of two mouse draft genomes and the human golden path. Genome Biology, 4 (1). R1. ISSN 1474-7596

[thumbnail of Paper]
Preview
PDF (Paper)
Zhang Genome Biology 2003.pdf - Published Version

Download (626kB) | Preview
URL: http://www.ncbi.nlm.nih.gov/pubmed/12537546
DOI: 10.1186/gb-2002-4-1-r1

Abstract

BACKGROUND: The availability of both mouse and human draft genomes has marked the beginning of a new era of comparative mammalian genomics. The two available mouse genome assemblies, from the public mouse genome sequencing consortium and Celera Genomics, were obtained using different clone libraries and different assembly methods. RESULTS: We present here a critical comparison of the two latest mouse genome assemblies. The utility of the combined genomes is further demonstrated by comparing them with the human 'golden path' and through a subsequent analysis of a resulting conserved sequence element (CSE) database, which allows us to identify over 6,000 potential novel genes and to derive independent estimates of the number of human protein-coding genes. CONCLUSION: The Celera and public mouse assemblies differ in about 10% of the mouse genome. Each assembly has advantages over the other: Celera has higher accuracy in base-pairs and overall higher coverage of the genome; the public assembly, however, has higher sequence quality in some newly finished bacterial artificial chromosome clone (BAC) regions and the data are freely accessible. Perhaps most important, by combining both assemblies, we can get a better annotation of the human genome; in particular, we can obtain the most complete set of CSEs, one third of which are related to known genes and some others are related to other functional genomic regions. More than half the CSEs are of unknown function. From the CSEs, we estimate the total number of human protein-coding genes to be about 40,000. This searchable publicly available online CSEdb will expedite new discoveries through comparative genomics.

Item Type: Paper
Uncontrolled Keywords: Animals Chromosome Mapping Computational Biology/ methods Conserved Sequence/genetics Databases, Nucleic Acid Gene Duplication Genes/genetics Genome Genome, Human Humans Mice/ genetics Promoter Regions (Genetics)/genetics Synteny
Subjects: bioinformatics > genomics and proteomics > genetics & nucleic acid processing
bioinformatics > genomics and proteomics
bioinformatics > genomics and proteomics > Mapping and Rendering
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > genomes
organism description > animal > mammal > rodent > mouse
organism description > animal > mammal > rodent
CSHL Authors:
Communities: CSHL labs > Zhang lab
Depositing User: Matt Covey
Date: 2003
Date Deposited: 01 Jul 2013 19:44
Last Modified: 01 Jul 2013 19:44
PMCID: PMC151282
Related URLs:
URI: https://repository.cshl.edu/id/eprint/27875

Actions (login required)

Administrator's edit/view item Administrator's edit/view item
CSHL HomeAbout CSHLResearchEducationNews & FeaturesCampus & Public EventsCareersGiving