Optimization of frequent item set mining parallelization algorithm based on spark platform

In this paper, we propose a new method that combines the parallelism of the Spark-based platform with fast frequent mining, called STB_Apriori. Previous research has shown that traditional frequent itemset mining algorithms have high overhead when faced with large datasets and high-dimensional data...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Information retrieval (Boston) Ročník 27; číslo 1; s. 38
Médium: Journal Article
Jazyk:angličtina
Vydáno: Dordrecht Springer Nature B.V 02.11.2024
Témata:
ISSN:1386-4564, 1573-7659
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract In this paper, we propose a new method that combines the parallelism of the Spark-based platform with fast frequent mining, called STB_Apriori. Previous research has shown that traditional frequent itemset mining algorithms have high overhead when faced with large datasets and high-dimensional data computation, and generate a large number of candidate itemsets; at the same time, when faced with diverse user requirements, they often generate very sparse and diverse data. In order to solve the problem of fast mining of massive data, our idea originates from the capability of Spark distributed computing and the common optimisation ideas in Apriori mining, by using the efficient operator BitSet to achieve transaction compression, bit storage and data manipulation by Boolean matrices, and at the same time by parallelising the processing and optimising the algorithmic logic to achieve fast and frequent mining. In experiments on real-world datasets, our model consistently outperforms five widely used methods by a significant margin on very large data and maintains its excellence in the remaining cases, proving its effectiveness on real-world tasks, while further analysis shows that increasing the number of distributed nodes also incrementally and continuously improves performance.
AbstractList In this paper, we propose a new method that combines the parallelism of the Spark-based platform with fast frequent mining, called STB_Apriori. Previous research has shown that traditional frequent itemset mining algorithms have high overhead when faced with large datasets and high-dimensional data computation, and generate a large number of candidate itemsets; at the same time, when faced with diverse user requirements, they often generate very sparse and diverse data. In order to solve the problem of fast mining of massive data, our idea originates from the capability of Spark distributed computing and the common optimisation ideas in Apriori mining, by using the efficient operator BitSet to achieve transaction compression, bit storage and data manipulation by Boolean matrices, and at the same time by parallelising the processing and optimising the algorithmic logic to achieve fast and frequent mining. In experiments on real-world datasets, our model consistently outperforms five widely used methods by a significant margin on very large data and maintains its excellence in the remaining cases, proving its effectiveness on real-world tasks, while further analysis shows that increasing the number of distributed nodes also incrementally and continuously improves performance.
BookMark eNo1jk1LAzEYhIMo2Fb_gKeA52iy75uPPUrxCwq96MVLya7ZmrqbrEl68dcbUE8zDA8zsySnIQZHyJXgN4JzfZsF161gvEHGW9ScyROyEFID00q2p9WDUQylwnOyzPnAOVeI7YK8befiJ_9ti4-BxoEOyX0dXSjUFzfR7AqdfPBhT2eb7Di68Z-14z4mXz4m2tns3mmNcmU-6TzaMsQ0XZCzwY7ZXf7pirw-3L-sn9hm-_i8vtuwHgQWBroTfQdopOHtgC0q3ptOWcS-Mw6gGbSyYDphDTYAsnrLhVZa1lAKByty_ds7p1iv57I7xGMKdXIHogEE04CCH4xhVuU
ContentType Journal Article
Copyright Copyright Springer Nature B.V. Dec 2024
Copyright_xml – notice: Copyright Springer Nature B.V. Dec 2024
DBID 3V.
7SC
7WY
7WZ
7XB
87Z
88I
8AL
8AO
8FD
8FE
8FG
8FK
8FL
ABUWG
AFKRA
ARAPS
AZQEC
BENPR
BEZIV
BGLVJ
CCPQU
DWQXO
FRNLG
F~G
GNUQQ
HCIFZ
JQ2
K60
K6~
K7-
L.-
L7M
L~C
L~D
M0C
M0N
M2P
P5Z
P62
PHGZM
PHGZT
PKEHL
PQBIZ
PQBZA
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PYYUZ
Q9U
DOI 10.1007/s10791-024-09470-5
DatabaseName ProQuest Central (Corporate)
Computer and Information Systems Abstracts
ABI/INFORM Collection
ABI/INFORM Global (PDF only)
ProQuest Central (purchase pre-March 2016)
ABI/INFORM Collection
Science Database (Alumni Edition)
Computing Database (Alumni Edition)
ProQuest Pharma Collection
Technology Research Database
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central (Alumni) (purchase pre-March 2016)
ABI/INFORM Collection (Alumni)
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
Health Research Premium Collection
ProQuest Central Essentials
ProQuest Central
ProQuest Business Premium Collection
ProQuest Technology Collection
ProQuest One Community College
ProQuest Central
Business Premium Collection (Alumni)
ABI/INFORM Global (Corporate)
ProQuest Central Student
SciTech Premium Collection
ProQuest Computer Science Collection
ProQuest Business Collection (Alumni Edition)
ProQuest Business Collection
Computer Science Database
ABI/INFORM Professional Advanced
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
ABI/INFORM Global
Computing Database
Science Database
Advanced Technologies & Aerospace Database
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Premium
ProQuest One Academic
ProQuest One Academic Middle East (New)
ProQuest One Business (OCUL)
ProQuest One Business (Alumni)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic (retired)
ProQuest One Academic UKI Edition
ProQuest Central China
ABI/INFORM Collection China
ProQuest Central Basic
DatabaseTitle ABI/INFORM Global (Corporate)
ProQuest Business Collection (Alumni Edition)
ProQuest One Business
Computer Science Database
ProQuest Central Student
Technology Collection
Technology Research Database
Computer and Information Systems Abstracts – Academic
ProQuest One Academic Middle East (New)
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Pharma Collection
ProQuest Central China
ABI/INFORM Complete
ProQuest Central
ABI/INFORM Professional Advanced
ProQuest One Applied & Life Sciences
ProQuest Central Korea
ProQuest Central (New)
Advanced Technologies Database with Aerospace
ABI/INFORM Complete (Alumni Edition)
Advanced Technologies & Aerospace Collection
Business Premium Collection
ABI/INFORM Global
ProQuest Computing
ProQuest Science Journals (Alumni Edition)
ABI/INFORM Global (Alumni Edition)
ProQuest Central Basic
ProQuest Science Journals
ProQuest Computing (Alumni Edition)
ProQuest One Academic Eastern Edition
ABI/INFORM China
ProQuest Technology Collection
ProQuest SciTech Collection
ProQuest Business Collection
Computer and Information Systems Abstracts Professional
Advanced Technologies & Aerospace Database
ProQuest One Academic UKI Edition
ProQuest One Business (Alumni)
ProQuest One Academic
ProQuest Central (Alumni)
ProQuest One Academic (New)
Business Premium Collection (Alumni)
DatabaseTitleList ABI/INFORM Global (Corporate)
Database_xml – sequence: 1
  dbid: BENPR
  name: ProQuest Central
  url: https://www.proquest.com/central
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Library & Information Science
Computer Science
EISSN 1573-7659
GroupedDBID .4I
.86
.DC
.VR
06D
0R~
0VY
199
1N0
203
29I
2J2
2JY
2KG
2LR
2~H
30V
3V.
4.4
406
408
409
40D
40E
5GY
5VS
67Z
6NX
77I
7SC
7WY
7XB
88I
8AL
8AO
8FD
8FE
8FG
8FK
8FL
8FW
8TC
8UJ
95-
95.
95~
96X
AABHQ
AAHNG
AAIAL
AAJKR
AANZL
AARTL
AATVU
AAWCG
AAYIU
AAYQN
AAYTO
ABBBX
ABBXA
ABDBF
ABDZT
ABECU
ABFTD
ABFTV
ABHLI
ABHQN
ABJNI
ABJOX
ABKCH
ABKTR
ABMNI
ABMQK
ABNWP
ABQBU
ABSXP
ABTEG
ABTHY
ABTKH
ABTMW
ABUWG
ABWNU
ABXPI
ACGFS
ACGOD
ACHSB
ACHXU
ACKNC
ACMDZ
ACMLO
ACOKC
ACOMO
ACSNA
ACSTC
ADHHG
ADHIR
ADKNI
ADKPE
ADMLS
ADRFC
ADTPH
ADURQ
ADYFF
ADZKW
AEGAL
AEGNC
AEJHL
AEJRE
AENEX
AEOHA
AEPYU
AETLH
AEVLU
AEXYK
AFBBN
AFKRA
AFLOW
AFQWF
AFWTZ
AFZKB
AGAYW
AGDGC
AGMZJ
AGQEE
AGQMX
AGRTI
AGWIL
AGWZB
AGYKE
AHAVH
AHBYD
AHKAY
AHPBZ
AHYZX
AIAKS
AIIXL
AILAN
AITGF
AJRNO
AJZVZ
ALMA_UNASSIGNED_HOLDINGS
ALWAN
AMKLP
AMXSW
AMYLF
AMYQR
AOCGG
ARAPS
ARMRJ
ASPBG
AVWKF
AXYYD
AYFIA
AYJHY
AZFZN
AZQEC
B-.
BA0
BENPR
BEZIV
BGLVJ
BPHCQ
CCPQU
CS3
CSCUP
DDRTE
DL5
DNIVK
DU5
DWQXO
EBS
EIOEI
ELW
ESBYG
F5P
FEDTE
FERAY
FFXSO
FNLPD
FRNLG
FRRFC
FWDCC
GGCAI
GGRSB
GJIRD
GNUQQ
GNWQR
GQ7
GQ8
GROUPED_ABI_INFORM_RESEARCH
GXS
HCIFZ
HF~
HG5
HMJXF
HQYDN
HRMNR
HVGLF
I-F
I09
IHE
IJ-
IKXTQ
IWAJR
IXC
IXD
IZIGR
IZQ
I~Z
J-C
J0Z
JBSCW
JCJTX
JQ2
K60
K6V
K6~
K7-
KDC
KOV
L.-
L7M
LAK
LLZTM
L~C
L~D
M0C
M0N
M2P
MA-
NB0
NPVJJ
NQJWS
O93
O9J
OAM
P2P
P62
P9O
PF0
PHGZM
PHGZT
PKEHL
PQBIZ
PQBZA
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PROAC
PT5
Q2X
Q9U
QOS
R89
R9I
RNS
RPX
S16
S27
S3B
SAP
SCO
SDH
SHX
SISQX
SNE
SNPRN
SNX
SOHCF
SRMVM
SSLCW
STPWE
SZN
T13
TSG
TSV
TUC
U2A
UG4
UOJIU
UTJUX
UZXMN
VC2
VFIZW
W23
W48
WK8
YLTOR
Z45
ZMTXR
ID FETCH-LOGICAL-c314t-37b1cb3485809f49460c8b6a44cb8e332f76a38b1a842335a38a01767538b51e3
IEDL.DBID K7-
ISICitedReferencesCount 1
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001346187600001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1386-4564
IngestDate Tue Dec 02 05:32:25 EST 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c314t-37b1cb3485809f49460c8b6a44cb8e332f76a38b1a842335a38a01767538b51e3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
OpenAccessLink https://doi.org/10.1007/s10791-024-09470-5
PQID 3123438236
PQPubID 26106
ParticipantIDs proquest_journals_3123438236
PublicationCentury 2000
PublicationDate 2024-11-02
PublicationDateYYYYMMDD 2024-11-02
PublicationDate_xml – month: 11
  year: 2024
  text: 2024-11-02
  day: 02
PublicationDecade 2020
PublicationPlace Dordrecht
PublicationPlace_xml – name: Dordrecht
PublicationTitle Information retrieval (Boston)
PublicationYear 2024
Publisher Springer Nature B.V
Publisher_xml – name: Springer Nature B.V
SSID ssj0006449
Score 2.4005191
Snippet In this paper, we propose a new method that combines the parallelism of the Spark-based platform with fast frequent mining, called STB_Apriori. Previous...
SourceID proquest
SourceType Aggregation Database
StartPage 38
SubjectTerms Algorithms
Boolean
Clustering
Data compression
Data mining
Datasets
Distributed processing
Efficiency
Parallel processing
Python
Social networks
User requirements
Title Optimization of frequent item set mining parallelization algorithm based on spark platform
URI https://www.proquest.com/docview/3123438236
Volume 27
WOSCitedRecordID wos001346187600001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1JS8QwFH64HfTg6Ki4jEMO4i04WdomJ1FRBHUcxP0iaSdVsbM4U_39vsRUBcGLl1DSlIYk73tL3gKw1eWKGWRzlGdKUqnzlBqrGbVaMpszaSNfvu36NGm31e2t7gSD2zi4VVaY6IG6O8icjXxHIMT6S6t4d_hKXdUod7saSmhMwjTj3BPmSUK_kBh5vfYKl4qpS5sSgmZC6FzifH64pKjgJC0a_QJjz2GOav-d2wLMB9mS7H0ehkWYsP061Kq6DSSQcR3mfiQhrMNmCF0g2yTEJrm9qkYvwf05okovhGuSQU7ykXe_Lomz-5KxLUnPV5kgLo14UdiiGmuKR5xl-dQjjld2CXYhfo1eyLAwpfvTMlwdHV4eHNNQkYFmgskS0ShlWSqkilRL51LLuJWpNDZSZqmyQvA8iY1QKTMKxTQR4bNBkkelBDsjZsUKTPUHfbsKRMtcoLSCGKCZzBV-jqKkNQLlzUijGLgGjWqtHwJZjR--F3r979cbMMvd3jr7L2_AVDl6s5swk72Xz-NREyaTm7smTO8ftjsXTX9msD1rHbiWd7DtRPcfG0TJxQ
linkProvider ProQuest
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1bLwQxGP2yLgkeXBZxWfQBb43ttDPbPoiIS4i1PCDiZXVmO4jZXXYH8af8Rl-rg0TizYO3SaedSdrT09PL1wOw2gok0zjM0SCRggqVxlQbxahRgpmUCRM6-7aLeq3RkJeX6rQEb0UsjD1WWXCiI-pWN7Fr5BscKdZtWkVbD4_UukbZ3dXCQuMDFkfm9QWnbP3Nw11s37Ug2N872zmg3lWAJpyJHHtUzJKYCxnKqkqFElE1kXGkhUhiaTgP0lqkuYyZlig1eIjPGmGLwhoTQ2Y4fncAhpx1F_af0_Dqk_lRWyg3wZMRtde0-CAdH6pXs2eMAkFxQlWr0vAH-bsRbX_iv9XFJIx77Uy2P8A-BSXTKcNE4UtBPE2VYezbJYtlWPKhGWSd-Ngri8Ui9zRcnSBrtn04KummJO254-U5sevapG9y0nYuGsRek55lJivy6uwGayW_bROrBVoEk5Cfe_fkIdO5_dMMnP9JfczCYKfbMXNAlEg5qjHkOMVEKrE4SmWjOerpUKHMnYdK0bZNTxv95lfDLvz-egVGDs6O6836YeNoEUYDiyu71h1UYDDvPZklGE6e87t-b9khlMD1X8PgHb7rHgA
linkToPdf http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V3NLwQxFH_xFeHgYxHfesCtsZ12ZtqDiGBDsBwQcVmd2RYxu8vuIP41f53X0UEicXNwm3TamaTv9fd-r32vD2C1GUim0czRIJWCCmUTqo1i1CjBjGXChEX5toujuF6Xl5fqtA_eylwYF1ZZYmIB1M1O6vbINzhCbHFoFW1YHxZxulvbenikroKUO2kty2l8qMiheX1B9623ebCLsl4Lgtre2c4-9RUGaMqZyHF1JSxNuJChrCorlIiqqUwiLUSaSMN5YONIc5kwLZF28BCfNaowkmxsDJnh-N1-GIzRx3HVE46rO59WAHmGKpw9GVF3ZYtP2PFpe7GLNwoERecqrtLwhyEorFtt_D_PywSMeU5Ntj8WwST0mXYFxst6FcTDVwVGv12-WIEln7JB1onPyXI6WvaegqsTRNOWT1MlHUtstwg7z4nb7yY9k5NWUV2DuOvTs8xkZV-d3eCs5Lct4jhCk2AT4nb3njxkOnd_mobzP5mPGRhod9pmFogSliNLQ-xTTFiJw5FCG82RZ4cK6e8cLJZybng46TW-hDz_--sVGEbpN44O6ocLMBI4FXNb4MEiDOTdJ7MEQ-lzftfrLhfKSuD6r7XgHVSLJoA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Optimization+of+frequent+item+set+mining+parallelization+algorithm+based+on+spark+platform&rft.jtitle=Information+retrieval+%28Boston%29&rft.date=2024-11-02&rft.pub=Springer+Nature+B.V&rft.issn=1386-4564&rft.eissn=1573-7659&rft.volume=27&rft.issue=1&rft.spage=38&rft_id=info:doi/10.1007%2Fs10791-024-09470-5&rft.externalDBID=HAS_PDF_LINK
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1386-4564&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1386-4564&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1386-4564&client=summon