Theoretical and Empirical Comparisons of Approximate String Matching Algorithms

Chang, W. I., Lampe, J. (1992) Theoretical and Empirical Comparisons of Approximate String Matching Algorithms. Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching, 644. pp. 175-184. ISSN 0302-9743

URL: http://dl.acm.org/citation.cfm?id=738270

Abstract

We study in depth a model of non-exact pattern matching based on edit distance, which is the minimum number of substitutions, insertions, and deletions needed to transform one string of symbols to another. More precisely, the k differences approximate string matching problem specifies a text string of length n, a pattern string of length m, the number k of differences (substitutions, insertions, deletions) allowed in a match, and asks for all locations in the text where a match occurs. We have carefully implemented and analyzed various O(kn) algorithms based on dynamic programming (DP), paying particular attention to dependence on b the alphabet size. An empirical observation on the average values of the DP tabulation makes apparent each algorithm's dependence on b. A new algorithm is presented that computes much fewer entries of the DP table. In practice, its speedup over the previous fastest algorithm is 2.5X for binary alphabet; 4X for four-letter alphabet; 10X for twenty-letter alphabet. We give a probabilistic analysis Of the DP table in order to prove that the expected running time of our algorithm (as well as an earlier ''cut-off'' algorithm due to Ukkonen) is O(kn) for random text. Furthermore, we give a heuristic argument that our algorithm is O(kn/(square-root b - 1)) on the average, when alphabet size is taken into consideration.

Item Type: Paper
Additional Information: Meeting Abstract
Uncontrolled Keywords: COMMON ANCESTORS
Subjects: bioinformatics > computational biology > algorithms
bioinformatics > computational biology
CSHL Authors:
Communities: CSHL labs
Depositing User: Matt Covey
Date: 1992
Date Deposited: 18 Sep 2015 14:27
Last Modified: 18 Sep 2015 14:27
URI: https://repository.cshl.edu/id/eprint/31861

Actions (login required)

Administrator's edit/view item Administrator's edit/view item
CSHL HomeAbout CSHLResearchEducationNews & FeaturesCampus & Public EventsCareersGiving