Longest common substrings with k mismatches

The longest common substring with k-mismatches problem is to find, given two strings S1 and S2, a longest substring A1 of S1 and A2 of S2 such that the Hamming distance between A1 and A2 is ≤k. We introduce a practical O(nm) time and O(1) space solution for this problem, where n and m are the length...

Full description

Saved in:
Bibliographic Details
Published in:Information processing letters Vol. 115; no. 6-8; pp. 643 - 647
Main Authors: Flouri, Tomas, Giaquinta, Emanuele, Kobert, Kassian, Ukkonen, Esko
Format: Journal Article
Language:English
Published: Amsterdam Elsevier B.V 01.06.2015
Elsevier Sequoia S.A
Subjects:
ISSN:0020-0190, 1872-6119
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract The longest common substring with k-mismatches problem is to find, given two strings S1 and S2, a longest substring A1 of S1 and A2 of S2 such that the Hamming distance between A1 and A2 is ≤k. We introduce a practical O(nm) time and O(1) space solution for this problem, where n and m are the lengths of S1 and S2, respectively. This algorithm can also be used to compute the matching statistics with k-mismatches of S1 and S2 in O(nm) time and O(m) space. Moreover, we also present a theoretical solution for the k=1 case which runs in O(nlog⁡m) time, assuming m≤n, and uses O(m) space, improving over the existing O(nm) time and O(m) space bound of Babenko and Starikovskaya [1]. •Two new algorithms for the longest common substring with k mismatches problem.•A practical solution for arbitrary k which uses constant space.•A theoretical solution for one mismatch which runs in quasilinear time.
AbstractList The longest common substring with k-mismatches problem is to find, given two strings S1 and S2, a longest substring A1 of S1 and A2 of S2 such that the Hamming distance between A1 and A2 is ≤k. We introduce a practical O(nm) time and O(1) space solution for this problem, where n and m are the lengths of S1 and S2, respectively. This algorithm can also be used to compute the matching statistics with k-mismatches of S1 and S2 in O(nm) time and O(m) space. Moreover, we also present a theoretical solution for the k=1 case which runs in O(nlog⁡m) time, assuming m≤n, and uses O(m) space, improving over the existing O(nm) time and O(m) space bound of Babenko and Starikovskaya [1]. •Two new algorithms for the longest common substring with k mismatches problem.•A practical solution for arbitrary k which uses constant space.•A theoretical solution for one mismatch which runs in quasilinear time.
The longest common substring with k-mismatches problem is to find, given two strings S1 and S2, a longest substring A1 of S1 and A2 of S2 such that the Hamming distance between A1 and A2 is ... We introduce a practical O(nm) time and O(1) space solution for this problem, where n and m are the lengths of S1 and S2, respectively. This algorithm can also be used to compute the matching statistics with k-mismatches of S1 and S2 in O(nm) time and O(m) space. Moreover, we also present a theoretical solution for the k=1 case which runs in ... time, assuming ..., and uses O(m) space, improving over the existing O(nm) time and O(m) space bound of Babenko and Starikovskaya. (ProQuest: ... denotes formulae/symbols omitted.)
Author Flouri, Tomas
Giaquinta, Emanuele
Ukkonen, Esko
Kobert, Kassian
Author_xml – sequence: 1
  givenname: Tomas
  surname: Flouri
  fullname: Flouri, Tomas
  organization: Heidelberg Institute for Theoretical Studies, Germany
– sequence: 2
  givenname: Emanuele
  orcidid: 0000-0002-9473-3971
  surname: Giaquinta
  fullname: Giaquinta, Emanuele
  email: emanuele.giaquinta@aalto.fi
  organization: Department of Computer Science, Aalto University, Finland
– sequence: 3
  givenname: Kassian
  surname: Kobert
  fullname: Kobert, Kassian
  organization: Heidelberg Institute for Theoretical Studies, Germany
– sequence: 4
  givenname: Esko
  surname: Ukkonen
  fullname: Ukkonen, Esko
  organization: Department of Computer Science, University of Helsinki, Finland
BookMark eNp9kD1PwzAQhi0EEm3hB7BFYkQJZztxYjGhii-pEgvMVmpfWocmLrYL4t_jqkwMnW55n7t7nyk5Hd2IhFxRKChQcdsXdrspGNCqAF4AiBMyoU3NckGpPCUTAAY5UAnnZBpCDylR8npCbhZuXGGImXbD4MYs7JYhejuuQvZt4zr7yAYbhjbqNYYLcta1m4CXf3NG3h8f3ubP-eL16WV-v8h1yZuYNwYb0wEFyeuOlcCFRtZxQ5nW1bKqUFbIpDFMApquFB00YEA0Qi4lao18Rq4Pe7fefe7Sc6p3Oz-mk4qKmlHRVAJSqj6ktHcheOyUtrGN1o3Rt3ajKKi9GdWrZEbtzSjgKvVOJP1Hbr0dWv9zlLk7MJiKf1n0KmiLo0ZjPeqojLNH6F_BfH0R
CODEN IFPLAT
CitedBy_id crossref_primary_10_3390_a13090224
crossref_primary_10_1007_s00453_022_00934_y
crossref_primary_10_3390_electronics9101670
crossref_primary_10_1007_s00453_020_00744_0
crossref_primary_10_1007_s00453_022_01092_x
crossref_primary_10_1007_s00453_019_00548_x
crossref_primary_10_1145_3488245
crossref_primary_10_1016_j_molp_2017_05_010
crossref_primary_10_1089_cmb_2015_0235
crossref_primary_10_1007_s00453_021_00842_7
crossref_primary_10_1089_cmb_2015_0217
crossref_primary_10_1186_s13015_016_0072_x
crossref_primary_10_1016_j_tcs_2017_06_017
crossref_primary_10_1186_s12859_017_1658_0
Cites_doi 10.1016/j.tcs.2006.06.029
10.1145/322123.322127
10.1017/CBO9780511574931
10.1016/0022-2836(81)90087-5
10.1134/S0032946011010030
10.1007/BF01185431
10.1093/bioinformatics/btu331
ContentType Journal Article
Copyright 2015 The Authors
Copyright Elsevier Sequoia S.A. Jun-Aug 2015
Copyright_xml – notice: 2015 The Authors
– notice: Copyright Elsevier Sequoia S.A. Jun-Aug 2015
DBID 6I.
AAFTH
AAYXX
CITATION
7SC
8FD
JQ2
L7M
L~C
L~D
DOI 10.1016/j.ipl.2015.03.006
DatabaseName ScienceDirect Open Access Titles
Elsevier:ScienceDirect:Open Access
CrossRef
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Computer and Information Systems Abstracts
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1872-6119
EndPage 647
ExternalDocumentID 3650871051
10_1016_j_ipl_2015_03_006
S0020019015000459
Genre Feature
GroupedDBID --K
--M
-~X
.DC
.~1
0R~
1B1
1RT
1~.
1~5
29I
4.4
457
4G.
5GY
5VS
6I.
7-5
71M
8P~
9JN
AACTN
AAEDT
AAEDW
AAFTH
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AAXUO
AAYFN
ABBOA
ABEFU
ABFNM
ABFSI
ABJNI
ABMAC
ABTAH
ABXDB
ABYKQ
ACDAQ
ACGFS
ACNNM
ACRLP
ACZNC
ADBBV
ADEZE
ADJOM
ADMUD
AEBSH
AEKER
AENEX
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
ASPBG
AVWKF
AXJTR
AZFZN
BKOJK
BKOMP
BLXMC
CS3
DU5
E.L
EBS
EFJIC
EFLBG
EJD
EO8
EO9
EP2
EP3
F5P
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-2
G-Q
G8K
GBLVA
GBOLZ
HLZ
HMJ
HVGLF
HZ~
IHE
J1W
KOM
LG9
M26
M41
MO0
MS~
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
PQQKQ
Q38
R2-
RIG
ROL
RPZ
SBC
SDF
SDG
SDP
SES
SEW
SME
SPC
SPCBC
SSV
SSZ
T5K
TN5
UQL
WH7
WUQ
XPP
ZMT
ZY4
~G-
9DU
AATTM
AAXKI
AAYWO
AAYXX
ABDPE
ABWVN
ACLOT
ACRPL
ACVFH
ADCNI
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
CITATION
EFKBS
~HD
7SC
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c438t-8de8df010937f24036ce2f3d12cc5b55e95e29dd290edf46f080d06869b9ecce3
ISICitedReferencesCount 29
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000353742700020&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0020-0190
IngestDate Sun Nov 09 07:35:14 EST 2025
Tue Nov 18 22:09:45 EST 2025
Sat Nov 29 03:44:21 EST 2025
Fri Feb 23 02:16:27 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 6-8
Keywords String algorithms
Combinatorial problems
Longest common substring
Hamming distance
Language English
License http://creativecommons.org/licenses/by/4.0
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c438t-8de8df010937f24036ce2f3d12cc5b55e95e29dd290edf46f080d06869b9ecce3
Notes SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ORCID 0000-0002-9473-3971
OpenAccessLink https://dx.doi.org/10.1016/j.ipl.2015.03.006
PQID 1672168560
PQPubID 45522
PageCount 5
ParticipantIDs proquest_journals_1672168560
crossref_citationtrail_10_1016_j_ipl_2015_03_006
crossref_primary_10_1016_j_ipl_2015_03_006
elsevier_sciencedirect_doi_10_1016_j_ipl_2015_03_006
PublicationCentury 2000
PublicationDate 2015-06-01
PublicationDateYYYYMMDD 2015-06-01
PublicationDate_xml – month: 06
  year: 2015
  text: 2015-06-01
  day: 01
PublicationDecade 2010
PublicationPlace Amsterdam
PublicationPlace_xml – name: Amsterdam
PublicationTitle Information processing letters
PublicationYear 2015
Publisher Elsevier B.V
Elsevier Sequoia S.A
Publisher_xml – name: Elsevier B.V
– name: Elsevier Sequoia S.A
References Smith, Waterman (br0050) 1981; 147
Leimeister, Morgenstern (br0070) 2014; 30
Brown, Tarjan (br0110) 1979; 26
Gusfield (br0020) 1997
Bender, Farach-Colton (br0090) 2000
Kociumaka, Starikovskaya, Vildhøj (br0040) 2014
Chang, Lawler (br0060) 1994; 12
Crochemore, Iliopoulos, Mohamed, Sagot (br0080) 2006; 362
Babenko, Starikovskaya (br0010) 2011; 47
Landau, Vishkin (br0100) 1986
Starikovskaya, Vildhøj (br0030) 2013
Kociumaka (10.1016/j.ipl.2015.03.006_br0040) 2014
Leimeister (10.1016/j.ipl.2015.03.006_br0070) 2014; 30
Starikovskaya (10.1016/j.ipl.2015.03.006_br0030) 2013
Chang (10.1016/j.ipl.2015.03.006_br0060) 1994; 12
Smith (10.1016/j.ipl.2015.03.006_br0050) 1981; 147
Crochemore (10.1016/j.ipl.2015.03.006_br0080) 2006; 362
Brown (10.1016/j.ipl.2015.03.006_br0110) 1979; 26
Landau (10.1016/j.ipl.2015.03.006_br0100) 1986
Babenko (10.1016/j.ipl.2015.03.006_br0010) 2011; 47
Bender (10.1016/j.ipl.2015.03.006_br0090) 2000
Gusfield (10.1016/j.ipl.2015.03.006_br0020) 1997
References_xml – volume: 12
  start-page: 327
  year: 1994
  end-page: 344
  ident: br0060
  article-title: Sublinear approximate string matching and biological applications
  publication-title: Algorithmica
– start-page: 223
  year: 2013
  end-page: 234
  ident: br0030
  article-title: Time-space trade-offs for the longest common substring problem
  publication-title: CPM
– volume: 362
  start-page: 248
  year: 2006
  end-page: 254
  ident: br0080
  article-title: Longest repeats with a block of
  publication-title: Theor. Comput. Sci.
– start-page: 220
  year: 1986
  end-page: 230
  ident: br0100
  article-title: Introducing efficient parallelism into approximate string matching and a new serial algorithm
  publication-title: STOC
– volume: 47
  start-page: 28
  year: 2011
  end-page: 33
  ident: br0010
  article-title: Computing the longest common substring with one mismatch
  publication-title: Probl. Inf. Transm.
– volume: 30
  start-page: 2000
  year: 2014
  end-page: 2008
  ident: br0070
  article-title: kmacs: the
  publication-title: Bioinformatics
– volume: 147
  start-page: 195
  year: 1981
  end-page: 197
  ident: br0050
  article-title: Identification of common molecular subsequences
  publication-title: J. Mol. Biol.
– start-page: 605
  year: 2014
  end-page: 617
  ident: br0040
  article-title: Sublinear space algorithms for the longest common substring problem
  publication-title: ESA
– year: 1997
  ident: br0020
  article-title: Algorithms on Strings, Trees, and Sequences
  publication-title: Computer Science and Computational Biology
– volume: 26
  start-page: 211
  year: 1979
  end-page: 226
  ident: br0110
  article-title: A fast merging algorithm
  publication-title: J. ACM
– start-page: 88
  year: 2000
  end-page: 94
  ident: br0090
  article-title: The LCA problem revisited
  publication-title: LATIN
– volume: 362
  start-page: 248
  issue: 1–3
  year: 2006
  ident: 10.1016/j.ipl.2015.03.006_br0080
  article-title: Longest repeats with a block of k don't cares
  publication-title: Theor. Comput. Sci.
  doi: 10.1016/j.tcs.2006.06.029
– start-page: 220
  year: 1986
  ident: 10.1016/j.ipl.2015.03.006_br0100
  article-title: Introducing efficient parallelism into approximate string matching and a new serial algorithm
– volume: 26
  start-page: 211
  issue: 2
  year: 1979
  ident: 10.1016/j.ipl.2015.03.006_br0110
  article-title: A fast merging algorithm
  publication-title: J. ACM
  doi: 10.1145/322123.322127
– year: 1997
  ident: 10.1016/j.ipl.2015.03.006_br0020
  article-title: Algorithms on Strings, Trees, and Sequences
  doi: 10.1017/CBO9780511574931
– start-page: 88
  year: 2000
  ident: 10.1016/j.ipl.2015.03.006_br0090
  article-title: The LCA problem revisited
– volume: 147
  start-page: 195
  issue: 1
  year: 1981
  ident: 10.1016/j.ipl.2015.03.006_br0050
  article-title: Identification of common molecular subsequences
  publication-title: J. Mol. Biol.
  doi: 10.1016/0022-2836(81)90087-5
– volume: 47
  start-page: 28
  issue: 1
  year: 2011
  ident: 10.1016/j.ipl.2015.03.006_br0010
  article-title: Computing the longest common substring with one mismatch
  publication-title: Probl. Inf. Transm.
  doi: 10.1134/S0032946011010030
– volume: 12
  start-page: 327
  issue: 4/5
  year: 1994
  ident: 10.1016/j.ipl.2015.03.006_br0060
  article-title: Sublinear approximate string matching and biological applications
  publication-title: Algorithmica
  doi: 10.1007/BF01185431
– volume: 30
  start-page: 2000
  issue: 14
  year: 2014
  ident: 10.1016/j.ipl.2015.03.006_br0070
  article-title: kmacs: the k-mismatch average common substring approach to alignment-free sequence comparison
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btu331
– start-page: 223
  year: 2013
  ident: 10.1016/j.ipl.2015.03.006_br0030
  article-title: Time-space trade-offs for the longest common substring problem
– start-page: 605
  year: 2014
  ident: 10.1016/j.ipl.2015.03.006_br0040
  article-title: Sublinear space algorithms for the longest common substring problem
SSID ssj0006437
Score 2.293019
Snippet The longest common substring with k-mismatches problem is to find, given two strings S1 and S2, a longest substring A1 of S1 and A2 of S2 such that the Hamming...
SourceID proquest
crossref
elsevier
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 643
SubjectTerms Algorithms
Combinatorial problems
Hamming distance
Longest common substring
Mathematical problems
String algorithms
Studies
Vector space
Title Longest common substrings with k mismatches
URI https://dx.doi.org/10.1016/j.ipl.2015.03.006
https://www.proquest.com/docview/1672168560
Volume 115
WOSCitedRecordID wos000353742700020&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: ScienceDirect Freedom Collection - Elsevier
  customDbUrl:
  eissn: 1872-6119
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0006437
  issn: 0020-0190
  databaseCode: AIEXJ
  dateStart: 19950113
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3Na9swFBfpx6GXfY-2a4cPOy24xLZkS8cyMroPyg4t5CZsS4I0rpvFSeif3_ck2Q4pK91gF2NEbBO9n55-ep-EfMppEhciVSGNCxNSHQlYc_koVIblNFIG7kvbbCK7vOSTifg1GKzbXJh1ldU1v78X8_8qahgDYWPq7F-Iu3spDMA9CB2uIHa4PkvwP-9q9BlhsDh8c9iAZlgubHNOa3OdDUGyQFOxUdYmM_V5SRYOc5c9gFaEyqb79EkiFRrtrZC7wCKM3pnmv1fT2jHR8W1er3TVIeaHDd52oRtNswHH6xmwU28AamZ3mwaIiPWBUs4q9igzxmcJYMCbawV6pp1y5VkMR1WvIlvt67I5PczSkG9o09RVcPIbc-pKcz7S-c78cHM2naMrKWKuaO1WfW27Y6NnGjktWnmQy4odshdnTIA23Dv_Np587_ZwdGe64CD3L1p_uI0M3PrQnxjN1t5uCcvVK_LCnzSCc4eQ12Sg6zfkZdvFI_BK_S0ZesAEDjBBD5gAARPMgh4w78j11_HVl4vQt9AIS5rwZciV5sqg-zPJDJZexAZwJlFRXJasYEwLpmOhVCxGWhmaGjhAKMwaEoWAxa2T92S3BigckgDGciB0Os-NoTpRIuOUFQlXlMO-wM0RGbXTIEtfXx7bnFSyDSS8kTBzEmdOjhIJM3dEPnePzF1xlad-TNu5lZ4dOtYnAQhPPXbSykH6VdrIKMWaVRzY_vG_vfUDOeiXwgnZXS5W-pTsl-vltFl89Gh6AN79klI
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Longest+common+substrings+with+k+mismatches&rft.jtitle=Information+processing+letters&rft.au=Flouri%2C+Tomas&rft.au=Giaquinta%2C+Emanuele&rft.au=Kobert%2C+Kassian&rft.au=Ukkonen%2C+Esko&rft.date=2015-06-01&rft.pub=Elsevier+B.V&rft.issn=0020-0190&rft.eissn=1872-6119&rft.volume=115&rft.issue=6-8&rft.spage=643&rft.epage=647&rft_id=info:doi/10.1016%2Fj.ipl.2015.03.006&rft.externalDocID=S0020019015000459
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0020-0190&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0020-0190&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0020-0190&client=summon