ID:
202062.0,
MPI für Informatik / Algorithms and Complexity Group |
Computing the threshold for q-gram filters |
Authors: | Kärkkäinen, Juha |
Editors: | Penttonen, Martti; Meineche Schmidt, Erik |
Language: | English |
Publisher: | Springer |
Place of Publication: | Berlin, Germany |
Date of Publication (YYYY-MM-DD): | 2002 |
Title of Proceedings: | Algorithm theory, SWAT 2002 : 8th Scandinavian Workshop on Algorithm Theory |
Start Page: | 348 |
End Page: | 357 |
Title of Series: | Lecture Notes in Computer Science |
Place of Conference/Meeting: | Turku, Finland |
(Start) Date of Conference/Meeting (YYYY-MM-DD): | 2002-07-03 |
Review Status: | not specified |
Audience: | Experts Only |
Intended Educational Use: | No |
Abstract / Description: | A popular and much studied class of filters for approximate string matching is based on finding common $q$-grams, substrings of length $q$, between the pattern and the text. A variation of the basic idea uses \emph{gapped} $q$-grams and has been recently shown to provide significant improvements in practice. A major difficulty with gapped $q$-gram filters is the computation of the so-called \emph{threshold} which defines the filter criterium. We describe the first general method for computing the threshold for $q$-gram filters. The method is based on a carefully chosen precise statement of the problem which is then transformed into a constrained shortest path problem. In its generic form the method leaves certain parts open but is applicable to a large variety of $q$-gram filters and may be extensible even to other classes of filters. We also give a full algorithm for a specific subclass. For this subclass, the algorithm has been implemented and used succesfully in an experimental comparison. |
Last Change of the Resource (YYYY-MM-DD): | 2003-08-27 |
External Publication Status: | published |
Document Type: | Conference-Paper |
Communicated by: | Kurt Mehlhorn |
Affiliations: | MPI für Informatik/Algorithms and Complexity Group
|
Identifiers: | ISBN:3-540-43866-1 LOCALID:C1256428004B93B8-BDE5926620208DA2C1256CE5006C7B5E-... |
|