Evidence-based gene predictions in plant genomes

Liang, C., Mao, L., Ware, D. H., Stein, L. D. (October 2009) Evidence-based gene predictions in plant genomes. Genome Res, 19 (10). 1912-1923 .

[thumbnail of Paper]
Preview
PDF (Paper)
Ware and Stein Genome Research 2009.pdf - Published Version

Download (610kB) | Preview
URL: http://www.ncbi.nlm.nih.gov/pubmed/19541913
DOI: 10.1101/gr.088997.108

Abstract

Automated evidence-based gene building is a rapid and cost-effective way to provide reliable gene annotations on newly sequenced genomes. One of the limitations of evidence-based gene builders, however, is their requirement for transcriptional evidence - known proteins, full-length cDNAs, or ESTs - in the species of interest. This limitation is of particular concern for plant genomes, where the rate of genome sequencing is greatly outpacing the rate of EST- and cDNA-sequencing projects. To overcome this limitation, we have developed an evidence-based gene build system (the Gramene pipeline) that can use transcriptional evidence across related species. The Gramene pipeline uses the Ensembl computing infrastructure with a novel data processing scheme. Using the previously annotated plant genomes, the dicot and the monocot , we show that the cross-species ESTs from within monocot or dicot class are a valuable source of evidence for gene predictions. We also find that, using only EST and cross-species evidence, the Gramene pipeline can generate a plant gene set that is comparable in quality to the human genes based on known proteins and full-length cDNAs. We compare the Gramene pipeline to several widely used ab initio gene prediction programs in rice; this comparison shows the pipeline performs favorably at both the gene and exon levels with cross-species gene products only. We discuss the results of testing the pipeline on a 22-Mb region of the newly sequenced maize genome and potential application of the pipeline to other genomes.

Item Type: Paper
Subjects: bioinformatics
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification
bioinformatics > genomics and proteomics > genetics & nucleic acid processing
bioinformatics > genomics and proteomics
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification > genes, structure and function
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > genomes
organism description > plant
CSHL Authors:
Communities: CSHL labs > Stein lab
CSHL labs > Ware lab
Depositing User: Matt Covey
Date: October 2009
Date Deposited: 21 Feb 2013 16:22
Last Modified: 06 Nov 2017 17:14
PMCID: PMC2765265
Related URLs:
URI: https://repository.cshl.edu/id/eprint/27372

Actions (login required)

Administrator's edit/view item Administrator's edit/view item
CSHL HomeAbout CSHLResearchEducationNews & FeaturesCampus & Public EventsCareersGiving