Scalable, accessible, and reproducible reference genome assembly and evaluation in Galaxy

Larivière, Delphine, Abueg, Linelle, Brajuka, Nadolina, Gallardo-Alba, Cristóbal, Grüning, Bjorn, Ko, Byung June, Ostrovsky, Alex, Palmada-Flores, Marc, Pickett, Brandon D, Rabbani, Keon, Balacco, Jennifer R, Chaisson, Mark, Cheng, Haoyu, Collins, Joanna, Denisova, Alexandra, Fedrigo, Olivier, Gallo, Guido Roberto, Giani, Alice Maria, Gooder, Grenville MacDonald, Jain, Nivesh, Johnson, Cassidy, Kim, Heebal, Lee, Chul, Marques-Bonet, Tomas, O'Toole, Brian, Rhie, Arang, Secomandi, Simona, Sozzoni, Marcella, Tilley, Tatiana, Uliano-Silva, Marcela, van den Beek, Marius, Waterhouse, Robert M, Phillippy, Adam M, Jarvis, Erich D, Schatz, Michael C, Nekrutenko, Anton, Formenti, Giulio (June 2023) Scalable, accessible, and reproducible reference genome assembly and evaluation in Galaxy. (Submitted)

[thumbnail of 2023_Lariviere_Scalable_Accessible_and_Reproducible_Reference_Genome.pdf]
Preview
PDF
2023_Lariviere_Scalable_Accessible_and_Reproducible_Reference_Genome.pdf - Submitted Version
Available under License Creative Commons Attribution.

Download (3MB) | Preview

Abstract

Improvements in genome sequencing and assembly are enabling high-quality reference genomes for all species. However, the assembly process is still laborious, computationally and technically demanding, lacks standards for reproducibility, and is not readily scalable. Here we present the latest Vertebrate Genomes Project assembly pipeline and demonstrate that it delivers high-quality reference genomes at scale across a set of vertebrate species arising over the last ∼500 million years. The pipeline is versatile and combines PacBio HiFi long-reads and Hi-C-based haplotype phasing in a new graph-based paradigm. Standardized quality control is performed automatically to troubleshoot assembly issues and assess biological complexities. We make the pipeline freely accessible through Galaxy, accommodating researchers even without local computational resources and enhanced reproducibility by democratizing the training and assembly process. We demonstrate the flexibility and reliability of the pipeline by assembling reference genomes for 51 vertebrate species from major taxonomic groups (fish, amphibians, reptiles, birds, and mammals).

Item Type: Paper
Subjects: bioinformatics > genomics and proteomics > genetics & nucleic acid processing > genomes
bioinformatics > genomics and proteomics > analysis and processing > reference assembly
CSHL Authors:
Communities: CSHL labs > Schatz lab
SWORD Depositor: CSHL Elements
Depositing User: CSHL Elements
Date: 30 June 2023
Date Deposited: 21 Sep 2023 20:27
Last Modified: 21 Sep 2023 20:27
PMCID: PMC10327048
Related URLs:
URI: https://repository.cshl.edu/id/eprint/40961

Actions (login required)

Administrator's edit/view item Administrator's edit/view item