Longest common substrings with k mismatches
The longest common substring with k-mismatches problem is to find, given two strings S1 and S2, a longest substring A1 of S1 and A2 of S2 such that the Hamming distance between A1 and A2 is ≤k. We introduce a practical O(nm) time and O(1) space solution for this problem, where n and m are the length...
Uloženo v:
| Vydáno v: | Information processing letters Ročník 115; číslo 6-8; s. 643 - 647 |
|---|---|
| Hlavní autoři: | , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Amsterdam
Elsevier B.V
01.06.2015
Elsevier Sequoia S.A |
| Témata: | |
| ISSN: | 0020-0190, 1872-6119 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | The longest common substring with k-mismatches problem is to find, given two strings S1 and S2, a longest substring A1 of S1 and A2 of S2 such that the Hamming distance between A1 and A2 is ≤k. We introduce a practical O(nm) time and O(1) space solution for this problem, where n and m are the lengths of S1 and S2, respectively. This algorithm can also be used to compute the matching statistics with k-mismatches of S1 and S2 in O(nm) time and O(m) space. Moreover, we also present a theoretical solution for the k=1 case which runs in O(nlogm) time, assuming m≤n, and uses O(m) space, improving over the existing O(nm) time and O(m) space bound of Babenko and Starikovskaya [1].
•Two new algorithms for the longest common substring with k mismatches problem.•A practical solution for arbitrary k which uses constant space.•A theoretical solution for one mismatch which runs in quasilinear time. |
|---|---|
| AbstractList | The longest common substring with k-mismatches problem is to find, given two strings S1 and S2, a longest substring A1 of S1 and A2 of S2 such that the Hamming distance between A1 and A2 is ≤k. We introduce a practical O(nm) time and O(1) space solution for this problem, where n and m are the lengths of S1 and S2, respectively. This algorithm can also be used to compute the matching statistics with k-mismatches of S1 and S2 in O(nm) time and O(m) space. Moreover, we also present a theoretical solution for the k=1 case which runs in O(nlogm) time, assuming m≤n, and uses O(m) space, improving over the existing O(nm) time and O(m) space bound of Babenko and Starikovskaya [1].
•Two new algorithms for the longest common substring with k mismatches problem.•A practical solution for arbitrary k which uses constant space.•A theoretical solution for one mismatch which runs in quasilinear time. The longest common substring with k-mismatches problem is to find, given two strings S1 and S2, a longest substring A1 of S1 and A2 of S2 such that the Hamming distance between A1 and A2 is ... We introduce a practical O(nm) time and O(1) space solution for this problem, where n and m are the lengths of S1 and S2, respectively. This algorithm can also be used to compute the matching statistics with k-mismatches of S1 and S2 in O(nm) time and O(m) space. Moreover, we also present a theoretical solution for the k=1 case which runs in ... time, assuming ..., and uses O(m) space, improving over the existing O(nm) time and O(m) space bound of Babenko and Starikovskaya. (ProQuest: ... denotes formulae/symbols omitted.) |
| Author | Flouri, Tomas Giaquinta, Emanuele Ukkonen, Esko Kobert, Kassian |
| Author_xml | – sequence: 1 givenname: Tomas surname: Flouri fullname: Flouri, Tomas organization: Heidelberg Institute for Theoretical Studies, Germany – sequence: 2 givenname: Emanuele orcidid: 0000-0002-9473-3971 surname: Giaquinta fullname: Giaquinta, Emanuele email: emanuele.giaquinta@aalto.fi organization: Department of Computer Science, Aalto University, Finland – sequence: 3 givenname: Kassian surname: Kobert fullname: Kobert, Kassian organization: Heidelberg Institute for Theoretical Studies, Germany – sequence: 4 givenname: Esko surname: Ukkonen fullname: Ukkonen, Esko organization: Department of Computer Science, University of Helsinki, Finland |
| BookMark | eNp9kD1PwzAQhi0EEm3hB7BFYkQJZztxYjGhii-pEgvMVmpfWocmLrYL4t_jqkwMnW55n7t7nyk5Hd2IhFxRKChQcdsXdrspGNCqAF4AiBMyoU3NckGpPCUTAAY5UAnnZBpCDylR8npCbhZuXGGImXbD4MYs7JYhejuuQvZt4zr7yAYbhjbqNYYLcta1m4CXf3NG3h8f3ubP-eL16WV-v8h1yZuYNwYb0wEFyeuOlcCFRtZxQ5nW1bKqUFbIpDFMApquFB00YEA0Qi4lao18Rq4Pe7fefe7Sc6p3Oz-mk4qKmlHRVAJSqj6ktHcheOyUtrGN1o3Rt3ajKKi9GdWrZEbtzSjgKvVOJP1Hbr0dWv9zlLk7MJiKf1n0KmiLo0ZjPeqojLNH6F_BfH0R |
| CODEN | IFPLAT |
| CitedBy_id | crossref_primary_10_3390_a13090224 crossref_primary_10_1007_s00453_022_00934_y crossref_primary_10_3390_electronics9101670 crossref_primary_10_1007_s00453_020_00744_0 crossref_primary_10_1007_s00453_022_01092_x crossref_primary_10_1007_s00453_019_00548_x crossref_primary_10_1145_3488245 crossref_primary_10_1016_j_molp_2017_05_010 crossref_primary_10_1089_cmb_2015_0235 crossref_primary_10_1007_s00453_021_00842_7 crossref_primary_10_1089_cmb_2015_0217 crossref_primary_10_1186_s13015_016_0072_x crossref_primary_10_1016_j_tcs_2017_06_017 crossref_primary_10_1186_s12859_017_1658_0 |
| Cites_doi | 10.1016/j.tcs.2006.06.029 10.1145/322123.322127 10.1017/CBO9780511574931 10.1016/0022-2836(81)90087-5 10.1134/S0032946011010030 10.1007/BF01185431 10.1093/bioinformatics/btu331 |
| ContentType | Journal Article |
| Copyright | 2015 The Authors Copyright Elsevier Sequoia S.A. Jun-Aug 2015 |
| Copyright_xml | – notice: 2015 The Authors – notice: Copyright Elsevier Sequoia S.A. Jun-Aug 2015 |
| DBID | 6I. AAFTH AAYXX CITATION 7SC 8FD JQ2 L7M L~C L~D |
| DOI | 10.1016/j.ipl.2015.03.006 |
| DatabaseName | ScienceDirect Open Access Titles Elsevier:ScienceDirect:Open Access CrossRef Computer and Information Systems Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
| DatabaseTitle | CrossRef Computer and Information Systems Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Advanced Technologies Database with Aerospace ProQuest Computer Science Collection Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Computer and Information Systems Abstracts |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1872-6119 |
| EndPage | 647 |
| ExternalDocumentID | 3650871051 10_1016_j_ipl_2015_03_006 S0020019015000459 |
| Genre | Feature |
| GroupedDBID | --K --M -~X .DC .~1 0R~ 1B1 1RT 1~. 1~5 29I 4.4 457 4G. 5GY 5VS 6I. 7-5 71M 8P~ 9JN AACTN AAEDT AAEDW AAFTH AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AAXUO AAYFN ABBOA ABEFU ABFNM ABFSI ABJNI ABMAC ABTAH ABXDB ABYKQ ACDAQ ACGFS ACNNM ACRLP ACZNC ADBBV ADEZE ADJOM ADMUD AEBSH AEKER AENEX AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AHZHX AIALX AIEXJ AIKHN AITUG AJBFU AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD ASPBG AVWKF AXJTR AZFZN BKOJK BKOMP BLXMC CS3 DU5 E.L EBS EFJIC EFLBG EJD EO8 EO9 EP2 EP3 F5P FDB FEDTE FGOYB FIRID FNPLU FYGXN G-2 G-Q G8K GBLVA GBOLZ HLZ HMJ HVGLF HZ~ IHE J1W KOM LG9 M26 M41 MO0 MS~ O-L O9- OAUVE OZT P-8 P-9 P2P PC. PQQKQ Q38 R2- RIG ROL RPZ SBC SDF SDG SDP SES SEW SME SPC SPCBC SSV SSZ T5K TN5 UQL WH7 WUQ XPP ZMT ZY4 ~G- 9DU AATTM AAXKI AAYWO AAYXX ABDPE ABWVN ACLOT ACRPL ACVFH ADCNI ADNMO AEIPS AEUPX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP CITATION EFKBS ~HD 7SC 8FD JQ2 L7M L~C L~D |
| ID | FETCH-LOGICAL-c438t-8de8df010937f24036ce2f3d12cc5b55e95e29dd290edf46f080d06869b9ecce3 |
| ISICitedReferencesCount | 29 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000353742700020&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0020-0190 |
| IngestDate | Sun Nov 09 07:35:14 EST 2025 Tue Nov 18 22:09:45 EST 2025 Sat Nov 29 03:44:21 EST 2025 Fri Feb 23 02:16:27 EST 2024 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 6-8 |
| Keywords | String algorithms Combinatorial problems Longest common substring Hamming distance |
| Language | English |
| License | http://creativecommons.org/licenses/by/4.0 |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c438t-8de8df010937f24036ce2f3d12cc5b55e95e29dd290edf46f080d06869b9ecce3 |
| Notes | SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 |
| ORCID | 0000-0002-9473-3971 |
| OpenAccessLink | https://dx.doi.org/10.1016/j.ipl.2015.03.006 |
| PQID | 1672168560 |
| PQPubID | 45522 |
| PageCount | 5 |
| ParticipantIDs | proquest_journals_1672168560 crossref_citationtrail_10_1016_j_ipl_2015_03_006 crossref_primary_10_1016_j_ipl_2015_03_006 elsevier_sciencedirect_doi_10_1016_j_ipl_2015_03_006 |
| PublicationCentury | 2000 |
| PublicationDate | 2015-06-01 |
| PublicationDateYYYYMMDD | 2015-06-01 |
| PublicationDate_xml | – month: 06 year: 2015 text: 2015-06-01 day: 01 |
| PublicationDecade | 2010 |
| PublicationPlace | Amsterdam |
| PublicationPlace_xml | – name: Amsterdam |
| PublicationTitle | Information processing letters |
| PublicationYear | 2015 |
| Publisher | Elsevier B.V Elsevier Sequoia S.A |
| Publisher_xml | – name: Elsevier B.V – name: Elsevier Sequoia S.A |
| References | Smith, Waterman (br0050) 1981; 147 Leimeister, Morgenstern (br0070) 2014; 30 Brown, Tarjan (br0110) 1979; 26 Gusfield (br0020) 1997 Bender, Farach-Colton (br0090) 2000 Kociumaka, Starikovskaya, Vildhøj (br0040) 2014 Chang, Lawler (br0060) 1994; 12 Crochemore, Iliopoulos, Mohamed, Sagot (br0080) 2006; 362 Babenko, Starikovskaya (br0010) 2011; 47 Landau, Vishkin (br0100) 1986 Starikovskaya, Vildhøj (br0030) 2013 Kociumaka (10.1016/j.ipl.2015.03.006_br0040) 2014 Leimeister (10.1016/j.ipl.2015.03.006_br0070) 2014; 30 Starikovskaya (10.1016/j.ipl.2015.03.006_br0030) 2013 Chang (10.1016/j.ipl.2015.03.006_br0060) 1994; 12 Smith (10.1016/j.ipl.2015.03.006_br0050) 1981; 147 Crochemore (10.1016/j.ipl.2015.03.006_br0080) 2006; 362 Brown (10.1016/j.ipl.2015.03.006_br0110) 1979; 26 Landau (10.1016/j.ipl.2015.03.006_br0100) 1986 Babenko (10.1016/j.ipl.2015.03.006_br0010) 2011; 47 Bender (10.1016/j.ipl.2015.03.006_br0090) 2000 Gusfield (10.1016/j.ipl.2015.03.006_br0020) 1997 |
| References_xml | – volume: 12 start-page: 327 year: 1994 end-page: 344 ident: br0060 article-title: Sublinear approximate string matching and biological applications publication-title: Algorithmica – start-page: 223 year: 2013 end-page: 234 ident: br0030 article-title: Time-space trade-offs for the longest common substring problem publication-title: CPM – volume: 362 start-page: 248 year: 2006 end-page: 254 ident: br0080 article-title: Longest repeats with a block of publication-title: Theor. Comput. Sci. – start-page: 220 year: 1986 end-page: 230 ident: br0100 article-title: Introducing efficient parallelism into approximate string matching and a new serial algorithm publication-title: STOC – volume: 47 start-page: 28 year: 2011 end-page: 33 ident: br0010 article-title: Computing the longest common substring with one mismatch publication-title: Probl. Inf. Transm. – volume: 30 start-page: 2000 year: 2014 end-page: 2008 ident: br0070 article-title: kmacs: the publication-title: Bioinformatics – volume: 147 start-page: 195 year: 1981 end-page: 197 ident: br0050 article-title: Identification of common molecular subsequences publication-title: J. Mol. Biol. – start-page: 605 year: 2014 end-page: 617 ident: br0040 article-title: Sublinear space algorithms for the longest common substring problem publication-title: ESA – year: 1997 ident: br0020 article-title: Algorithms on Strings, Trees, and Sequences publication-title: Computer Science and Computational Biology – volume: 26 start-page: 211 year: 1979 end-page: 226 ident: br0110 article-title: A fast merging algorithm publication-title: J. ACM – start-page: 88 year: 2000 end-page: 94 ident: br0090 article-title: The LCA problem revisited publication-title: LATIN – volume: 362 start-page: 248 issue: 1–3 year: 2006 ident: 10.1016/j.ipl.2015.03.006_br0080 article-title: Longest repeats with a block of k don't cares publication-title: Theor. Comput. Sci. doi: 10.1016/j.tcs.2006.06.029 – start-page: 220 year: 1986 ident: 10.1016/j.ipl.2015.03.006_br0100 article-title: Introducing efficient parallelism into approximate string matching and a new serial algorithm – volume: 26 start-page: 211 issue: 2 year: 1979 ident: 10.1016/j.ipl.2015.03.006_br0110 article-title: A fast merging algorithm publication-title: J. ACM doi: 10.1145/322123.322127 – year: 1997 ident: 10.1016/j.ipl.2015.03.006_br0020 article-title: Algorithms on Strings, Trees, and Sequences doi: 10.1017/CBO9780511574931 – start-page: 88 year: 2000 ident: 10.1016/j.ipl.2015.03.006_br0090 article-title: The LCA problem revisited – volume: 147 start-page: 195 issue: 1 year: 1981 ident: 10.1016/j.ipl.2015.03.006_br0050 article-title: Identification of common molecular subsequences publication-title: J. Mol. Biol. doi: 10.1016/0022-2836(81)90087-5 – volume: 47 start-page: 28 issue: 1 year: 2011 ident: 10.1016/j.ipl.2015.03.006_br0010 article-title: Computing the longest common substring with one mismatch publication-title: Probl. Inf. Transm. doi: 10.1134/S0032946011010030 – volume: 12 start-page: 327 issue: 4/5 year: 1994 ident: 10.1016/j.ipl.2015.03.006_br0060 article-title: Sublinear approximate string matching and biological applications publication-title: Algorithmica doi: 10.1007/BF01185431 – volume: 30 start-page: 2000 issue: 14 year: 2014 ident: 10.1016/j.ipl.2015.03.006_br0070 article-title: kmacs: the k-mismatch average common substring approach to alignment-free sequence comparison publication-title: Bioinformatics doi: 10.1093/bioinformatics/btu331 – start-page: 223 year: 2013 ident: 10.1016/j.ipl.2015.03.006_br0030 article-title: Time-space trade-offs for the longest common substring problem – start-page: 605 year: 2014 ident: 10.1016/j.ipl.2015.03.006_br0040 article-title: Sublinear space algorithms for the longest common substring problem |
| SSID | ssj0006437 |
| Score | 2.293019 |
| Snippet | The longest common substring with k-mismatches problem is to find, given two strings S1 and S2, a longest substring A1 of S1 and A2 of S2 such that the Hamming... |
| SourceID | proquest crossref elsevier |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 643 |
| SubjectTerms | Algorithms Combinatorial problems Hamming distance Longest common substring Mathematical problems String algorithms Studies Vector space |
| Title | Longest common substrings with k mismatches |
| URI | https://dx.doi.org/10.1016/j.ipl.2015.03.006 https://www.proquest.com/docview/1672168560 |
| Volume | 115 |
| WOSCitedRecordID | wos000353742700020&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 customDbUrl: eissn: 1872-6119 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0006437 issn: 0020-0190 databaseCode: AIEXJ dateStart: 19950113 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3Na9swFBdZu8MuXffF2nXDh50WXGLJsqVjGRnrNsoOLeQmHEuCNJ6bxknon9_3JNkOKSvbYBdjjD-w3k9PP71PQj5mqRS8YDSmmrM4NczERVKmsaCJ5nkhcilK12wiv7gQk4n8ORhs2lyYTZXXtbi7k4v_Kmq4BsLG1Nm_EHf3UrgA5yB0OILY4fhHgv9xU6PPCIPF4ZvDBjTDaumaczqb63wIkgWaio2ytplpyEtycFj47AG0IlQu3adPEqnQaO-E3AUWYfTOrLhdz2rPRMe_inptqg4x313wtg_daJotOF7NgZ0GA1Azv9k2QCS8D5TyVrEHmTEhSwAD3nwr0FPjlavIKWxVg4psta_P5gwwy2KxpU0zX8EpLMyZL835QOd788P16WyBrqSE-6K1O_W13YqNnmnktGjlQS4rn5B9mnMJ2nD_7Hw8-dat4ejO9MFB_i9af7iLDNz50O8Yzc7a7gjL5SE5CDuN6Mwj5AUZmPoled528YiCUn9FhgEwkQdM1AMmQsBE86gHzGty9WV8-flrHFpoxGXKxCoW2ght0f3JcoulF7EBnGU6oWXJp5wbyQ2VWlM5MtqmmYUNhMasITmVMLkNe0P2aoDCWxIluS6M4HY0FSLVOZ3agtnCipIKC_eLIzJqh0GVob48tjmpVBtIeK1g5BSOnBoxBSN3RD51jyx8cZXHbk7bsVWBHXrWpwAIjz120spBhVnaqCTDmlUC2P7xv731HXnWT4UTsrdars178rTcrGbN8kNA0z07ZZIH |
| linkProvider | Elsevier |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Longest+common+substrings+with+k+mismatches&rft.jtitle=Information+processing+letters&rft.au=Flouri%2C+Tomas&rft.au=Giaquinta%2C+Emanuele&rft.au=Kobert%2C+Kassian&rft.au=Ukkonen%2C+Esko&rft.date=2015-06-01&rft.pub=Elsevier+B.V&rft.issn=0020-0190&rft.eissn=1872-6119&rft.volume=115&rft.issue=6-8&rft.spage=643&rft.epage=647&rft_id=info:doi/10.1016%2Fj.ipl.2015.03.006&rft.externalDocID=S0020019015000459 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0020-0190&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0020-0190&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0020-0190&client=summon |