Whole genome de novo assemblies of three divergent strains of rice, Oryza sativa, document novel gene space of aus and indica

Schatz, M. C., Maron, L. G., Stein, J. C., Hernandez Wences, A., Gurtowski, J., Biggers, E., Lee, H., Kramer, M., Antoniou, E., Ghiban, E., Wright, M. H., Chia, J. M., Ware, D., McCouch, S. R., McCombie, W. R. (2014) Whole genome de novo assemblies of three divergent strains of rice, Oryza sativa, document novel gene space of aus and indica. Genome Biol, 15 (11). p. 506. ISSN 1465-6906

[img]
Preview
PDF (Paper)
Schatz Ware and McCombie Genome Biology 2014.pdf - Published Version

Download (1684Kb) | Preview
URL: http://www.ncbi.nlm.nih.gov/pubmed/25468217
DOI: 10.1186/s13059-014-0506-z

Abstract

BACKGROUND: The use of high throughput genome-sequencing technologies has uncovered a large extent of structural variation in eukaryotic genomes that makes important contributions to genomic diversity and phenotypic variation. When the genomes of different strains of a given organism are compared, whole genome resequencing data are typically aligned to an established reference sequence. However, when the reference differs in significant structural ways from the individuals under study, the analysis is often incomplete or inaccurate. RESULTS: Here, we use rice as a model to demonstrate how improvements in sequencing and assembly technology allow rapid and inexpensive de novo assembly of next generation sequence data into high-quality assemblies that can be directly compared using whole genome alignment to provide an unbiased assessment. Using this approach, we are able to accurately assess the "pan-genome" of three divergent rice varieties and document several megabases of each genome absent in the other two. CONCLUSIONS: Many of the genome-specific loci are annotated to contain genes, reflecting the potential for new biological properties that would be missed by standard reference-mapping approaches. We further provide a detailed analysis of several loci associated with agriculturally important traits, including the S5 hybrid sterility locus, the Sub1 submergence tolerance locus, the LRK gene cluster associated with improved yield, and the Pup1 cluster associated with phosphorus deficiency, illustrating the utility of our approach for biological discovery. All of the data and software are openly available to support further breeding and functional studies of rice and other species.

Item Type: Paper
Subjects: bioinformatics
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > genomes > de novo assembly
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > genomes
Investigative techniques and equipment > assays > next generation sequencing
Investigative techniques and equipment > assays > whole genome sequencing
CSHL Authors:
Communities: CSHL labs > McCombie lab
CSHL labs > Schatz lab
CSHL labs > Ware lab
Depositing User: Matt Covey
Date Deposited: 16 Jan 2015 20:20
Last Modified: 28 Apr 2015 19:20
PMCID: PMC4268812
Related URLs:
URI: http://repository.cshl.edu/id/eprint/31136

Actions (login required)

Administrator's edit/view item Administrator's edit/view item
CSHL HomeAbout CSHLResearchEducationNews & FeaturesCampus & Public EventsCareersGiving