Sapling: accelerating suffix array queries with learned data models.

Kirsche, Melanie, Das, Arun, Schatz, Michael C (May 2021) Sapling: accelerating suffix array queries with learned data models. Bioinformatics, 37 (6). pp. 744-749. ISSN 1367-4803

URL: https://www.ncbi.nlm.nih.gov/pubmed/33107913
DOI: 10.1093/bioinformatics/btaa911

Abstract

MOTIVATION: As genomic data becomes more abundant, efficient algorithms and data structures for sequence alignment become increasingly important. The suffix array is a widely used data structure to accelerate alignment, but the binary search algorithm used to query, it requires widespread memory accesses, causing a large number of cache misses on large datasets. RESULTS: Here, we present Sapling, an algorithm for sequence alignment, which uses a learned data model to augment the suffix array and enable faster queries. We investigate different types of data models, providing an analysis of different neural network models as well as providing an open-source aligner with a compact, practical piecewise linear model. We show that Sapling outperforms both an optimized binary search approach and multiple widely used read aligners on a diverse collection of genomes, including human, bacteria and plants, speeding up the algorithm by more than a factor of two while adding <1% to the suffix array's memory footprint. AVAILABILITY AND IMPLEMENTATION: The source code and tutorial are available open-source at https://github.com/mkirsche/sapling. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Item Type: Paper
Subjects: bioinformatics > genomics and proteomics > alignment
bioinformatics
bioinformatics > genomics and proteomics
bioinformatics > genomics and proteomics > alignment > sequence alignment
bioinformatics > computational biology > algorithms
bioinformatics > computational biology
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > genomes
CSHL Authors:
Communities: CSHL labs > Schatz lab
SWORD Depositor: CSHL Elements
Depositing User: CSHL Elements
Date: 5 May 2021
Date Deposited: 02 Jun 2021 14:12
Last Modified: 25 Jan 2024 14:25
URI: https://repository.cshl.edu/id/eprint/40188

Actions (login required)

Administrator's edit/view item Administrator's edit/view item
CSHL HomeAbout CSHLResearchEducationNews & FeaturesCampus & Public EventsCareersGiving