A parameterizable enumeration algorithm for sequence mining

In this paper, we introduce an generic framework for the mining of sequences under various constraints. More precisely, we study the enumeration of all partitions of a word w into multisets of subsequences. We show that using additional predicates, this generator can be used for frequent subsequence...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Theoretical computer science Ročník 468; s. 59 - 68
Hlavní autoři: David, J., Nourine, L.
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier B.V 14.01.2013
Elsevier
Témata:
ISSN:0304-3975, 1879-2294
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract In this paper, we introduce an generic framework for the mining of sequences under various constraints. More precisely, we study the enumeration of all partitions of a word w into multisets of subsequences. We show that using additional predicates, this generator can be used for frequent subsequences and substrings mining. We define the transition graph Tw whose vertices are multisets of words and arcs are transitions between multisets. We show that Tw is a directed acyclic graph and it admits a covering tree. We use Tw to propose a generic algorithm that enumerates all multisets that satisfies a set of predicates, without redundancy.
AbstractList In this paper, we introduce an generic framework for the mining of sequences under various constraints. More precisely, we study the enumeration of all partitions of a word w into multisets of subsequences. We show that using additional predicates, this generator can be used for frequent subsequences and substrings mining. We define the transition graph Tw whose vertices are multisets of words and arcs are transitions between multisets. We show that Tw is a directed acyclic graph and it admits a covering tree. We use Tw to propose a generic algorithm that enumerates all multisets that satisfies a set of predicates, without redundancy.
Author David, J.
Nourine, L.
Author_xml – sequence: 1
  givenname: J.
  surname: David
  fullname: David, J.
  email: Julien.David@lipn.univ-paris13.fr
– sequence: 2
  givenname: L.
  surname: Nourine
  fullname: Nourine, L.
BackLink https://hal.science/hal-01765525$$DView record in HAL
BookMark eNp9kMtKw0AUhgepYFt9AHfZukicS-YSuipFrRBwo-thMjnTTsmlzqQFfXoTKi49mwOH_zvwfws06_oOELonOCOYiMdDNtiYUUxoRkiGMb9Cc6JkkVJa5DM0xwznKSskv0GLGA94HC7FHK3WydEE08IAwX-bqoEEulMLwQy-7xLT7Prgh32buD4kET5P0FlIWt_5bneLrp1pItz97iX6eH5632zT8u3ldbMuU0sVG1KTV5Wi2AkpKiUEBcWxtEwJUhdQFCCZcyJ3lcop1CLHlhaOWs6UqVQFMmdL9HD5uzeNPgbfmvCle-P1dl3q6YaJFJxTfiZjllyyNvQxBnB_AMF6MqUPejSlJ1OaED1qGJnVhYGxxNlD0NH6qWftA9hB173_h_4BD31yRQ
Cites_doi 10.1016/j.jda.2007.06.001
10.1023/A:1009748302351
10.1016/0022-0000(84)90018-7
10.1016/0020-0190(88)90065-8
10.1016/j.dam.2008.10.010
10.1109/TCBB.2005.5
ContentType Journal Article
Copyright 2012 Elsevier B.V.
Distributed under a Creative Commons Attribution 4.0 International License
Copyright_xml – notice: 2012 Elsevier B.V.
– notice: Distributed under a Creative Commons Attribution 4.0 International License
DBID 6I.
AAFTH
AAYXX
CITATION
1XC
DOI 10.1016/j.tcs.2012.11.005
DatabaseName ScienceDirect Open Access Titles
Elsevier:ScienceDirect:Open Access
CrossRef
Hyper Article en Ligne (HAL)
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Mathematics
Computer Science
EISSN 1879-2294
EndPage 68
ExternalDocumentID oai:HAL:hal-01765525v1
10_1016_j_tcs_2012_11_005
S0304397512010353
GroupedDBID --K
--M
-~X
.DC
.~1
0R~
123
1B1
1RT
1~.
1~5
4.4
457
4G.
5VS
6I.
7-5
71M
8P~
9JN
AABNK
AACTN
AAEDT
AAEDW
AAFTH
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAXUO
AAYFN
ABAOU
ABBOA
ABJNI
ABMAC
ABVKL
ABXDB
ABYKQ
ACAZW
ACDAQ
ACGFS
ACRLP
ACZNC
ADBBV
ADEZE
AEBSH
AEKER
AENEX
AEXQZ
AFKWA
AFTJW
AGUBO
AGYEJ
AHHHB
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
ARUGR
AXJTR
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EFLBG
EJD
EO8
EO9
EP2
EP3
F5P
FDB
FEDTE
FIRID
FNPLU
FYGXN
G-Q
GBLVA
GBOLZ
HVGLF
IHE
IXB
J1W
KOM
LG9
M26
M41
MHUIS
MO0
N9A
NCXOZ
O-L
O9-
OAUVE
OK1
OZT
P-8
P-9
P2P
PC.
Q38
RIG
ROL
RPZ
SCC
SDF
SDG
SES
SPC
SPCBC
SSV
SSW
SSZ
T5K
TN5
WH7
YNT
ZMT
~G-
29Q
9DU
AAQXK
AATTM
AAXKI
AAYWO
AAYXX
ABDPE
ABEFU
ABFNM
ABWVN
ACLOT
ACNNM
ACRPL
ACVFH
ADCNI
ADMUD
ADNMO
ADVLN
AEIPS
AEUPX
AFJKZ
AFPUW
AGHFR
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
ASPBG
AVWKF
AZFZN
CITATION
EFKBS
FGOYB
G-2
HZ~
R2-
SEW
TAE
WUQ
ZY4
~HD
1XC
XJT
ID FETCH-LOGICAL-c283t-a4bb820f676b8662e8507c3861d9e99e73ff64fb842ed640c29f2c538ab8be743
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000313917200006&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0304-3975
IngestDate Tue Oct 14 20:34:37 EDT 2025
Sat Nov 29 05:15:12 EST 2025
Fri Feb 23 02:30:24 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Language English
License http://www.elsevier.com/open-access/userlicense/1.0
https://www.elsevier.com/tdm/userlicense/1.0
https://www.elsevier.com/open-access/userlicense/1.0
Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c283t-a4bb820f676b8662e8507c3861d9e99e73ff64fb842ed640c29f2c538ab8be743
OpenAccessLink https://dx.doi.org/10.1016/j.tcs.2012.11.005
PageCount 10
ParticipantIDs hal_primary_oai_HAL_hal_01765525v1
crossref_primary_10_1016_j_tcs_2012_11_005
elsevier_sciencedirect_doi_10_1016_j_tcs_2012_11_005
PublicationCentury 2000
PublicationDate 2013-01-14
PublicationDateYYYYMMDD 2013-01-14
PublicationDate_xml – month: 01
  year: 2013
  text: 2013-01-14
  day: 14
PublicationDecade 2010
PublicationTitle Theoretical computer science
PublicationYear 2013
Publisher Elsevier B.V
Elsevier
Publisher_xml – name: Elsevier B.V
– name: Elsevier
References Agrawal, Srikant (br000005) 1995
Arimura, Uno (br000010) 2009
Uno, Arimura (br000065) 2007; vol. 4755
Gély, Nourine, Sadi (br000020) 2009; 157
Johnson, Yannakakis, Papadimitriou (br000030) 1988; 27
Trasarti, Bonchi, Goethals (br000060) 2008
Nourine, Petit (br000040) 2012
Rivière, Barth, Cohen, Denise (br000050) 2008; 6
Warmuth, Haussler (br000070) 1984; 28
David (br000015) 2010
Mannila, Toivonen, Inkeri~Verkamo (br000035) 1997; 1
Singh, Ibrahim, Yohanna, Singh (br000055) 2007; 37
Huan, Wang, Prins (br000025) 2003
Pisanti, Crochemore, Grossi, Sagot (br000045) 2005; 2
Uno (10.1016/j.tcs.2012.11.005_br000065) 2007; vol. 4755
David (10.1016/j.tcs.2012.11.005_br000015) 2010
Johnson (10.1016/j.tcs.2012.11.005_br000030) 1988; 27
Singh (10.1016/j.tcs.2012.11.005_br000055) 2007; 37
Mannila (10.1016/j.tcs.2012.11.005_br000035) 1997; 1
Gély (10.1016/j.tcs.2012.11.005_br000020) 2009; 157
Warmuth (10.1016/j.tcs.2012.11.005_br000070) 1984; 28
Pisanti (10.1016/j.tcs.2012.11.005_br000045) 2005; 2
Trasarti (10.1016/j.tcs.2012.11.005_br000060) 2008
Rivière (10.1016/j.tcs.2012.11.005_br000050) 2008; 6
Nourine (10.1016/j.tcs.2012.11.005_br000040) 2012
Arimura (10.1016/j.tcs.2012.11.005_br000010) 2009
Agrawal (10.1016/j.tcs.2012.11.005_br000005) 1995
Huan (10.1016/j.tcs.2012.11.005_br000025) 2003
References_xml – volume: 27
  start-page: 119
  year: 1988
  end-page: 123
  ident: br000030
  article-title: On generating all maximal independent sets
  publication-title: Inform. Process. Lett.
– start-page: 3
  year: 1995
  end-page: 14
  ident: br000005
  article-title: Mining sequential patterns
  publication-title: Proceedings of the Eleventh International Conference on Data Engineering, 1995
– start-page: 1088
  year: 2009
  end-page: 1099
  ident: br000010
  article-title: Polynomial-delay and polynomial-space algorithms for mining closed sequences, graphs, and pictures in accessible set systems
  publication-title: Main
– volume: 28
  start-page: 345
  year: 1984
  end-page: 358
  ident: br000070
  article-title: On the complexity of iterated shuffle
  publication-title: J. Comput. System Sci.
– volume: 2
  start-page: 40
  year: 2005
  end-page: 50
  ident: br000045
  article-title: Bases of motifs for generating repeated patterns with wild cards
  publication-title: IEEE/ACM Trans. Comput. Biol. Bioinform.
– volume: 6
  start-page: 192
  year: 2008
  end-page: 204
  ident: br000050
  article-title: Shuffling biological sequences with motif constraints
  publication-title: J. Discrete Algorithms
– start-page: 549
  year: 2003
  end-page: 553
  ident: br000025
  article-title: Efficient mining of frequent subgraphs in the presence of isomorphism
  publication-title: Proceedings of the Third IEEE International Conference on Data Mining
– volume: vol. 4755
  start-page: 219
  year: 2007
  end-page: 230
  ident: br000065
  article-title: An efficient polynomial delay algorithm for pseudo frequent itemset mining
  publication-title: Discovery Science
– volume: 1
  start-page: 259
  year: 1997
  end-page: 289
  ident: br000035
  article-title: Discovery of frequent episodes in event sequences
  publication-title: Data Min. Knowl. Discov.
– volume: 37
  start-page: 73
  year: 2007
  end-page: 92
  ident: br000055
  article-title: An overview of the applications of multisets
  publication-title: Novi Sad J. Math
– start-page: 1061
  year: 2008
  end-page: 1066
  ident: br000060
  article-title: Sequence mining automata: a new technique for mining frequent sequences under regular expressions
  publication-title: ICDM
– start-page: 318
  year: 2010
  end-page: 329
  ident: br000015
  article-title: The average complexity of moore’s state minimization algorithm is O(n log log n)
  publication-title: MFCS
– volume: 157
  start-page: 1447
  year: 2009
  end-page: 1459
  ident: br000020
  article-title: Enumeration aspects of maximal cliques and bicliques
  publication-title: Discrete Appl. Math.
– start-page: 630
  year: 2012
  end-page: 635
  ident: br000040
  article-title: Extending set-based dualization: application to pattern mining
  publication-title: ECAI 2012
– volume: 37
  start-page: 73
  issue: 2
  year: 2007
  ident: 10.1016/j.tcs.2012.11.005_br000055
  article-title: An overview of the applications of multisets
  publication-title: Novi Sad J. Math
– volume: 6
  start-page: 192
  year: 2008
  ident: 10.1016/j.tcs.2012.11.005_br000050
  article-title: Shuffling biological sequences with motif constraints
  publication-title: J. Discrete Algorithms
  doi: 10.1016/j.jda.2007.06.001
– start-page: 3
  year: 1995
  ident: 10.1016/j.tcs.2012.11.005_br000005
  article-title: Mining sequential patterns
– volume: 1
  start-page: 259
  year: 1997
  ident: 10.1016/j.tcs.2012.11.005_br000035
  article-title: Discovery of frequent episodes in event sequences
  publication-title: Data Min. Knowl. Discov.
  doi: 10.1023/A:1009748302351
– volume: 28
  start-page: 345
  issue: 3
  year: 1984
  ident: 10.1016/j.tcs.2012.11.005_br000070
  article-title: On the complexity of iterated shuffle
  publication-title: J. Comput. System Sci.
  doi: 10.1016/0022-0000(84)90018-7
– volume: 27
  start-page: 119
  issue: 3
  year: 1988
  ident: 10.1016/j.tcs.2012.11.005_br000030
  article-title: On generating all maximal independent sets
  publication-title: Inform. Process. Lett.
  doi: 10.1016/0020-0190(88)90065-8
– start-page: 630
  year: 2012
  ident: 10.1016/j.tcs.2012.11.005_br000040
  article-title: Extending set-based dualization: application to pattern mining
– start-page: 318
  year: 2010
  ident: 10.1016/j.tcs.2012.11.005_br000015
  article-title: The average complexity of moore’s state minimization algorithm is O(n log log n)
– volume: vol. 4755
  start-page: 219
  year: 2007
  ident: 10.1016/j.tcs.2012.11.005_br000065
  article-title: An efficient polynomial delay algorithm for pseudo frequent itemset mining
– start-page: 1088
  year: 2009
  ident: 10.1016/j.tcs.2012.11.005_br000010
  article-title: Polynomial-delay and polynomial-space algorithms for mining closed sequences, graphs, and pictures in accessible set systems
– volume: 157
  start-page: 1447
  issue: 7
  year: 2009
  ident: 10.1016/j.tcs.2012.11.005_br000020
  article-title: Enumeration aspects of maximal cliques and bicliques
  publication-title: Discrete Appl. Math.
  doi: 10.1016/j.dam.2008.10.010
– start-page: 549
  year: 2003
  ident: 10.1016/j.tcs.2012.11.005_br000025
  article-title: Efficient mining of frequent subgraphs in the presence of isomorphism
– volume: 2
  start-page: 40
  year: 2005
  ident: 10.1016/j.tcs.2012.11.005_br000045
  article-title: Bases of motifs for generating repeated patterns with wild cards
  publication-title: IEEE/ACM Trans. Comput. Biol. Bioinform.
  doi: 10.1109/TCBB.2005.5
– start-page: 1061
  year: 2008
  ident: 10.1016/j.tcs.2012.11.005_br000060
  article-title: Sequence mining automata: a new technique for mining frequent sequences under regular expressions
SSID ssj0000576
Score 2.0018625
Snippet In this paper, we introduce an generic framework for the mining of sequences under various constraints. More precisely, we study the enumeration of all...
SourceID hal
crossref
elsevier
SourceType Open Access Repository
Index Database
Publisher
StartPage 59
SubjectTerms Computer Science
Data Structures and Algorithms
Discrete Mathematics
Title A parameterizable enumeration algorithm for sequence mining
URI https://dx.doi.org/10.1016/j.tcs.2012.11.005
https://hal.science/hal-01765525
Volume 468
WOSCitedRecordID wos000313917200006&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  customDbUrl:
  eissn: 1879-2294
  dateEnd: 20180131
  omitProxy: false
  ssIdentifier: ssj0000576
  issn: 0304-3975
  databaseCode: AIEXJ
  dateStart: 19950109
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3fb9MwELZg4wEe-DFADBiKEE9UiRLXcWzxFLFN3SjVJIrUNytxnLVTm01tqPbnc47tJNsEGg-8WJHTROndp_Pd-fMdQp_yYUYTSTQPJ-E-kaH0dRUtv8yIJEUUy4w3B4XHyWTCZjN-Zvnzm6adQFJV7PqaX_1XVcMcKFsfnf0HdbcvhQm4BqXDCGqH8V6KTwe6nPdK01w0Y2upBprtrqyms-X55XpRz1cNv9ARqQerpk9E31Od9k44Stv6YWDXyy69vV00CDkN2qSyTuqbLOk46KcUdHuHyI-6lKI769IzRUO9f8JNj5NAGVPJQLkYmxbFzpYS0yPHWkNb69usq-bOHYttkgcXQS118fQIB7qmahh3y5Pbkh-lP8TZ4bEYn0y-3bzboxSO0jGM82ypsUfjGMdbCI13cRJzsNq76cnR7LRbqOPEbGXbP-c2vRv6363v-ZPb8nDuEvCNQzJ9jp7aSMJLDQJeoAeq2kPPXJcOzxrtPfTke1uZd_MSfUm9W_DwevDwWnh4AA_PwcMz8HiFfh4fTb-OfNtAw5fgNdZ-RvIcPLySJjRnlGLFwPuXQ0ajgivOVTIsS0rKnBGsCkpCiXmJJSyBWc5yBb7la7RTXVbqDfJoBoEnY3GiFPiAEsKKnEQyLFmBi5iHfB99dvIRV6ZOinAEwgsBwhRamBBvChDmPiJOgsIC1zhwAlDxt8c-grTb1-vC6KBvoec6bb-9z4_eoccd7t-jnXr9Sx2gR3JbLzbrDxYnvwEQtnrt
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+parameterizable+enumeration+algorithm+for+sequence+mining&rft.jtitle=Theoretical+computer+science&rft.au=David%2C+J.&rft.au=Nourine%2C+L.&rft.date=2013-01-14&rft.pub=Elsevier&rft.issn=0304-3975&rft.eissn=1879-2294&rft.volume=468&rft.spage=59&rft.epage=68&rft_id=info:doi/10.1016%2Fj.tcs.2012.11.005&rft.externalDBID=HAS_PDF_LINK&rft.externalDocID=oai%3AHAL%3Ahal-01765525v1
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0304-3975&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0304-3975&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0304-3975&client=summon