GenomeScope 2.0 and Smudgeplot for Reference-Free Profiling of Polyploid Genomes

Ranallo-Benavidez, T.R., Jaron, K.S., Schatz, M. C. (March 2020) GenomeScope 2.0 and Smudgeplot for Reference-Free Profiling of Polyploid Genomes. Nat Commun, 11 (1432). pp. 1-10. ISSN 2041-1723 (Public Dataset)

[thumbnail of s41467-020-14998-3.pdf] PDF
s41467-020-14998-3.pdf - Published Version

Download (1MB)
URL: https://pubmed.ncbi.nlm.nih.gov/32188846/
DOI: 10.1038/s41467-020-14998-3

Abstract

An important assessment prior to genome assembly and related analyses is genome profiling, where the k-mer frequencies within raw sequencing reads are analyzed to estimate major genome characteristics such as size, heterozygosity, and repetitiveness. Here we introduce GenomeScope 2.0 (https://github.com/tbenavi1/genomescope2.0), which applies combinatorial theory to establish a detailed mathematical model of how k-mer frequencies are distributed in heterozygous and polyploid genomes. We describe and evaluate a practical implementation of the polyploid-aware mixture model that quickly and accurately infers genome properties across thousands of simulated and several real datasets spanning a broad range of complexity. We also present a method called Smudgeplot (https://github.com/KamilSJaron/smudgeplot) to visualize and estimate the ploidy and genome structure of a genome by analyzing heterozygous k-mer pairs. We successfully apply the approach to systems of known variable ploidy levels in the Meloidogyne genus and the extreme case of octoploid Fragaria × ananassa.

Item Type: Paper
Subjects: bioinformatics
bioinformatics > computational biology > algorithms
bioinformatics > computational biology
bioinformatics > genomics and proteomics > computers > computer software
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > genomes > de novo assembly
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > genomes
CSHL Authors:
Communities: CSHL labs > Schatz lab
Depositing User: Adrian Gomez
Date: 18 March 2020
Date Deposited: 02 Apr 2020 15:14
Last Modified: 01 Feb 2024 16:57
PMCID: PMC7080791
Related URLs:
Dataset ID:
  • https://github.com/tbenavi1/genomescope2.0
  • https://github.com/KamilSJaron/smudgeplot
  • https://doi.org/10.5281/zenodo.3657798
  • https://doi.org/10.5281/zenodo.3658220
  • https://doi.org/10.5281/zenodo.3658220
URI: https://repository.cshl.edu/id/eprint/39214

Actions (login required)

Administrator's edit/view item Administrator's edit/view item
CSHL HomeAbout CSHLResearchEducationNews & FeaturesCampus & Public EventsCareersGiving