Please note that eDoc will be permanently shut down in the first quarter of 2021!      Home News About Us Contact Contributors Disclaimer Privacy Policy Help FAQ

Home
Search
Quick Search
Advanced
Fulltext
Browse
Collections
Persons
My eDoc
Session History
Login
Name:
Password:
Documentation
Help
Support Wiki
Direct access to
document ID:


          Institute: MPI für Informatik     Collection: Algorithms and Complexity Group     Display Documents



ID: 201801.0, MPI für Informatik / Algorithms and Complexity Group
Better Filtering with Gapped q-Grams
Authors:Burkhardt, Stefan; Kärkkäinen, Juha
Language:English
Date of Publication (YYYY-MM-DD):2003
Title of Journal:Fundamenta Informaticae
Volume:56
Start Page:51
End Page:70
Review Status:Peer-review
Audience:Experts Only
Intended Educational Use:No
Abstract / Description:A popular and well-studied class of filters for approximate string
matching compares substrings of length $q$, \emph{the $q$-grams}, in
the pattern and the text to identify text areas that contain potential
matches. A generalization of the method that uses {\em gapped}
$q$-grams instead of contiguous substrings is mentioned a few times in
literature but has never been analyzed in any depth. In this paper,
we report the first results of a study on gapped $q$-grams. We show
that gapped $q$-grams can provide orders of magnitude faster and/or
more efficient filtering than contiguous $q$-grams. To achieve these
results the arrangement of the gaps in the $q$-gram and a filter
parameter called \emph{threshold} have to be optimized. Both of these
tasks are nontrivial combinatorial optimization problems for which we
present efficient solutions. We concentrate on the $k$~mismatches
problem, i.e, approximate string matching with the Hamming distance.
Last Change of the Resource (YYYY-MM-DD):2004-06-15
External Publication Status:published
Document Type:Article
Communicated by:Kurt Mehlhorn
Affiliations:MPI für Informatik/Algorithms and Complexity Group
Identifiers:ISSN:0169-2968
LOCALID:C1256428004B93B8-97486D0D727E678AC1256D0900521D74-...
The scope and number of records on eDoc is subject to the collection policies defined by each institute - see "info" button in the collection browse view.