Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline

Ou, S., Su, W., Liao, Y., Chougule, K., Agda, J. R. A., Hellinga, A. J., Lugo, C. S. B., Elliott, T. A., Ware, D., Peterson, T., Jiang, N., Hirsch, C. N., Hufford, M. B. (December 2019) Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol, 20 (1). p. 275. ISSN 1474-7596 (Public Dataset)

[thumbnail of Ware_Genome Biol_2019.pdf]
Preview
PDF
Ware_Genome Biol_2019.pdf - Published Version

Download (1MB) | Preview
URL: https://www.ncbi.nlm.nih.gov/pubmed/31843001
DOI: 10.1186/s13059-019-1905-y

Abstract

BACKGROUND: Sequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse transposable elements (TEs) and provide an opportunity for comprehensive annotation of TEs. Numerous methods exist for annotation of each class of TEs, but their relative performances have not been systematically compared. Moreover, a comprehensive pipeline is needed to produce a non-redundant library of TEs for species lacking this resource to generate whole-genome TE annotations. RESULTS: We benchmark existing programs based on a carefully curated library of rice TEs. We evaluate the performance of methods annotating long terminal repeat (LTR) retrotransposons, terminal inverted repeat (TIR) transposons, short TIR transposons known as miniature inverted transposable elements (MITEs), and Helitrons. Performance metrics include sensitivity, specificity, accuracy, precision, FDR, and F1. Using the most robust programs, we create a comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) that produces a filtered non-redundant TE library for annotation of structurally intact and fragmented elements. EDTA also deconvolutes nested TE insertions frequently found in highly repetitive genomic regions. Using other model species with curated TE libraries (maize and Drosophila), EDTA is shown to be robust across both plant and animal species. CONCLUSIONS: The benchmarking results and pipeline developed here will greatly facilitate TE annotation in eukaryotic genomes. These annotations will promote a much more in-depth understanding of the diversity and evolution of TEs at both intra- and inter-species levels. EDTA is open-source and freely available: https://github.com/oushujun/EDTA.

Item Type: Paper
Subjects: bioinformatics
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification > DNA expression
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification
bioinformatics > genomics and proteomics > genetics & nucleic acid processing
bioinformatics > genomics and proteomics
bioinformatics > computational biology > algorithms
bioinformatics > computational biology
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > genomes
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > genomes > genome annotation
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification > DNA expression > transposable elements
CSHL Authors:
Communities: CSHL labs > Ware lab
Depositing User: Adrian Gomez
Date: 16 December 2019
Date Deposited: 18 Dec 2019 20:29
Last Modified: 02 Feb 2024 19:45
PMCID: PMC6913007
Related URLs:
Dataset ID:
  • Supplement: https://doi.org/10.1186/s13059-019-1905-y
  • Code: https://github.com/oushujun/EDTA
URI: https://repository.cshl.edu/id/eprint/38774

Actions (login required)

Administrator's edit/view item Administrator's edit/view item
CSHL HomeAbout CSHLResearchEducationNews & FeaturesCampus & Public EventsCareersGiving