Assembly of the 373k gene space of the polyploid sugarcane genome reveals reservoirs of functional diversity in the world's leading biomass crop

Souza, G. M., Van Sluys, M. A., Lembke, C. G., Lee, H., Margarido, G. R. A., Hotta, C. T., Gaiarsa, J. W., Diniz, A. L., Oliveira, M. M., Ferreira, S. S., Nishiyama, M. Y., Ten-Caten, F., Ragagnin, G. T., Andrade, P. M., de Souza, R. F., Nicastro, G. G., Pandya, R., Kim, C., Guo, H., Durham, A. M., Carneiro, M. S., Zhang, J., Zhang, X., Zhang, Q., Ming, R., Schatz, M. C., Davidson, B., Paterson, A. H., Heckerman, D. (December 2019) Assembly of the 373k gene space of the polyploid sugarcane genome reveals reservoirs of functional diversity in the world's leading biomass crop. Gigascience, 8 (12). ISSN 2047-217x (Public Dataset)

[img]
Preview
PDF
Schatz_Gigascience_2019.pdf - Published Version

Download (4MB) | Preview
URL: https://www.ncbi.nlm.nih.gov/pubmed/31782791
DOI: 10.1093/gigascience/giz129

Abstract

BACKGROUND: Sugarcane cultivars are polyploid interspecific hybrids of giant genomes, typically with 10-13 sets of chromosomes from 2 Saccharum species. The ploidy, hybridity, and size of the genome, estimated to have >10 Gb, pose a challenge for sequencing. RESULTS: Here we present a gene space assembly of SP80-3280, including 373,869 putative genes and their potential regulatory regions. The alignment of single-copy genes in diploid grasses to the putative genes indicates that we could resolve 2-6 (up to 15) putative homo(eo)logs that are 99.1% identical within their coding sequences. Dissimilarities increase in their regulatory regions, and gene promoter analysis shows differences in regulatory elements within gene families that are expressed in a species-specific manner. We exemplify these differences for sucrose synthase (SuSy) and phenylalanine ammonia-lyase (PAL), 2 gene families central to carbon partitioning. SP80-3280 has particular regulatory elements involved in sucrose synthesis not found in the ancestor Saccharum spontaneum. PAL regulatory elements are found in co-expressed genes related to fiber synthesis within gene networks defined during plant growth and maturation. Comparison with sorghum reveals predominantly bi-allelic variations in sugarcane, consistent with the formation of 2 "subgenomes" after their divergence approximately 3.8-4.6 million years ago and reveals single-nucleotide variants that may underlie their differences. CONCLUSIONS: This assembly represents a large step towards a whole-genome assembly of a commercial sugarcane cultivar. It includes a rich diversity of genes and homo(eo)logous resolution for a representative fraction of the gene space, relevant to improve biomass and food production.

Item Type: Paper
Subjects: bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification > genes, structure and function > alleles
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification > genes, structure and function > gene regulation
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification > genes, structure and function > gene regulation
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > genomes
CSHL Authors:
Communities: CSHL labs > Schatz lab
Depositing User: Adrian Gomez
Date: 1 December 2019
Date Deposited: 18 Dec 2019 20:22
Last Modified: 18 Dec 2019 20:22
PMCID: PMC6884061
Related URLs:
Dataset ID:
URI: https://repository.cshl.edu/id/eprint/38788

Actions (login required)

Administrator's edit/view item Administrator's edit/view item
CSHL HomeAbout CSHLResearchEducationNews & FeaturesCampus & Public EventsCareersGiving