Enabling large-scale next-generation sequence assembly with Blacklight

Couger, M. B., Pipes, L., Squina, F., Prade, R., Siepel, A., Palermo, R., Katze, M. G., Mason, C. E., Blood, P. D. (September 2014) Enabling large-scale next-generation sequence assembly with Blacklight. Concurrency and computation : practice & experience, 26 (13). pp. 2157-2166. ISSN 1532-0626 (Print)1532-0626

URL: http://www.ncbi.nlm.nih.gov/pubmed/25294974
DOI: 10.1002/cpe.3231


A variety of extremely challenging biological sequence analyses were conducted on the XSEDE large shared memory resource Blacklight, using current bioinformatics tools and encompassing a wide range of scientific applications. These include genomic sequence assembly, very large metagenomic sequence assembly, transcriptome assembly, and sequencing error correction. The data sets used in these analyses included uncategorized fungal species, reference microbial data, very large soil and human gut microbiome sequence data, and primate transcriptomes, composed of both short-read and long-read sequence data. A new parallel command execution program was developed on the Blacklight resource to handle some of these analyses. These results, initially reported previously at XSEDE13 and expanded here, represent significant advances for their respective scientific communities. The breadth and depth of the results achieved demonstrate the ease of use, versatility, and unique capabilities of the Blacklight XSEDE resource for scientific analysis of genomic and transcriptomic sequence data, and the power of these resources, together with XSEDE support, in meeting the most challenging scientific problems.

Item Type: Paper
Uncontrolled Keywords: Ngs RNA-seq bioinformatics data-intensive computing de novo assembly genome genomics high-performance computing large shared memory computing metagenome primates transcriptome
Subjects: bioinformatics
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > genomes
Investigative techniques and equipment > assays > next generation sequencing
Investigative techniques and equipment > assays > RNA-seq
Investigative techniques and equipment > assays > whole genome sequencing
CSHL Authors:
Communities: CSHL labs > Siepel lab
Depositing User: Matt Covey
Date: 10 September 2014
Date Deposited: 14 Jan 2015 21:10
Last Modified: 14 Jan 2015 21:10
PMCID: PMC4185199
Related URLs:
URI: https://repository.cshl.edu/id/eprint/31057

Actions (login required)

Administrator's edit/view item Administrator's edit/view item
CSHL HomeAbout CSHLResearchEducationNews & FeaturesCampus & Public EventsCareersGiving