Optimization of frequent item set mining parallelization algorithm based on spark platform
In this paper, we propose a new method that combines the parallelism of the Spark-based platform with fast frequent mining, called STB_Apriori. Previous research has shown that traditional frequent itemset mining algorithms have high overhead when faced with large datasets and high-dimensional data...
Saved in:
| Published in: | Information retrieval (Boston) Vol. 27; no. 1; p. 38 |
|---|---|
| Format: | Journal Article |
| Language: | English |
| Published: |
Dordrecht
Springer Nature B.V
02.11.2024
|
| Subjects: | |
| ISSN: | 1386-4564, 1573-7659 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | In this paper, we propose a new method that combines the parallelism of the Spark-based platform with fast frequent mining, called STB_Apriori. Previous research has shown that traditional frequent itemset mining algorithms have high overhead when faced with large datasets and high-dimensional data computation, and generate a large number of candidate itemsets; at the same time, when faced with diverse user requirements, they often generate very sparse and diverse data. In order to solve the problem of fast mining of massive data, our idea originates from the capability of Spark distributed computing and the common optimisation ideas in Apriori mining, by using the efficient operator BitSet to achieve transaction compression, bit storage and data manipulation by Boolean matrices, and at the same time by parallelising the processing and optimising the algorithmic logic to achieve fast and frequent mining. In experiments on real-world datasets, our model consistently outperforms five widely used methods by a significant margin on very large data and maintains its excellence in the remaining cases, proving its effectiveness on real-world tasks, while further analysis shows that increasing the number of distributed nodes also incrementally and continuously improves performance. |
|---|---|
| AbstractList | In this paper, we propose a new method that combines the parallelism of the Spark-based platform with fast frequent mining, called STB_Apriori. Previous research has shown that traditional frequent itemset mining algorithms have high overhead when faced with large datasets and high-dimensional data computation, and generate a large number of candidate itemsets; at the same time, when faced with diverse user requirements, they often generate very sparse and diverse data. In order to solve the problem of fast mining of massive data, our idea originates from the capability of Spark distributed computing and the common optimisation ideas in Apriori mining, by using the efficient operator BitSet to achieve transaction compression, bit storage and data manipulation by Boolean matrices, and at the same time by parallelising the processing and optimising the algorithmic logic to achieve fast and frequent mining. In experiments on real-world datasets, our model consistently outperforms five widely used methods by a significant margin on very large data and maintains its excellence in the remaining cases, proving its effectiveness on real-world tasks, while further analysis shows that increasing the number of distributed nodes also incrementally and continuously improves performance. |
| BookMark | eNo1jk1LAzEYhIMo2Fb_gKeA52iy75uPPUrxCwq96MVLya7ZmrqbrEl68dcbUE8zDA8zsySnIQZHyJXgN4JzfZsF161gvEHGW9ScyROyEFID00q2p9WDUQylwnOyzPnAOVeI7YK8befiJ_9ti4-BxoEOyX0dXSjUFzfR7AqdfPBhT2eb7Di68Z-14z4mXz4m2tns3mmNcmU-6TzaMsQ0XZCzwY7ZXf7pirw-3L-sn9hm-_i8vtuwHgQWBroTfQdopOHtgC0q3ptOWcS-Mw6gGbSyYDphDTYAsnrLhVZa1lAKByty_ds7p1iv57I7xGMKdXIHogEE04CCH4xhVuU |
| ContentType | Journal Article |
| Copyright | Copyright Springer Nature B.V. Dec 2024 |
| Copyright_xml | – notice: Copyright Springer Nature B.V. Dec 2024 |
| DBID | 3V. 7SC 7WY 7WZ 7XB 87Z 88I 8AL 8AO 8FD 8FE 8FG 8FK 8FL ABUWG AFKRA ARAPS AZQEC BENPR BEZIV BGLVJ CCPQU DWQXO FRNLG F~G GNUQQ HCIFZ JQ2 K60 K6~ K7- L.- L7M L~C L~D M0C M0N M2P P5Z P62 PHGZM PHGZT PKEHL PQBIZ PQBZA PQEST PQGLB PQQKQ PQUKI PRINS PYYUZ Q9U |
| DOI | 10.1007/s10791-024-09470-5 |
| DatabaseName | ProQuest Central (Corporate) Computer and Information Systems Abstracts ABI/INFORM Collection ABI/INFORM Global (PDF only) ProQuest Central (purchase pre-March 2016) ABI/INFORM Collection Science Database (Alumni Edition) Computing Database (Alumni Edition) ProQuest Pharma Collection Technology Research Database ProQuest SciTech Collection ProQuest Technology Collection ProQuest Central (Alumni) (purchase pre-March 2016) ABI/INFORM Collection (Alumni) ProQuest Central (Alumni) ProQuest Central UK/Ireland Advanced Technologies & Computer Science Collection ProQuest Central Essentials - QC Download PDF from ProQuest Central Business Premium Collection Technology collection ProQuest One Community College ProQuest Central Business Premium Collection (Alumni) ABI/INFORM Global (Corporate) ProQuest Central Student SciTech Collection (ProQuest) ProQuest Computer Science Collection ProQuest Business Collection (Alumni Edition) ProQuest Business Collection Computer Science Database ABI/INFORM Professional Advanced Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional ABI/INFORM Global Computing Database Science Database (ProQuest) Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Premium ProQuest One Academic (New) ProQuest One Academic Middle East (New) ProQuest One Business ProQuest One Business (Alumni) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic (retired) ProQuest One Academic UKI Edition ProQuest Central China ABI/INFORM Collection China ProQuest Central Basic |
| DatabaseTitle | ABI/INFORM Global (Corporate) ProQuest Business Collection (Alumni Edition) ProQuest One Business Computer Science Database ProQuest Central Student Technology Collection Technology Research Database Computer and Information Systems Abstracts – Academic ProQuest One Academic Middle East (New) ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection Computer and Information Systems Abstracts ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Pharma Collection ProQuest Central China ABI/INFORM Complete ProQuest Central ABI/INFORM Professional Advanced ProQuest One Applied & Life Sciences ProQuest Central Korea ProQuest Central (New) Advanced Technologies Database with Aerospace ABI/INFORM Complete (Alumni Edition) Advanced Technologies & Aerospace Collection Business Premium Collection ABI/INFORM Global ProQuest Computing ProQuest Science Journals (Alumni Edition) ABI/INFORM Global (Alumni Edition) ProQuest Central Basic ProQuest Science Journals ProQuest Computing (Alumni Edition) ProQuest One Academic Eastern Edition ABI/INFORM China ProQuest Technology Collection ProQuest SciTech Collection ProQuest Business Collection Computer and Information Systems Abstracts Professional Advanced Technologies & Aerospace Database ProQuest One Academic UKI Edition ProQuest One Business (Alumni) ProQuest One Academic ProQuest Central (Alumni) ProQuest One Academic (New) Business Premium Collection (Alumni) |
| DatabaseTitleList | ABI/INFORM Global (Corporate) |
| Database_xml | – sequence: 1 dbid: BENPR name: Download PDF from ProQuest Central url: https://www.proquest.com/central sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Library & Information Science Computer Science |
| EISSN | 1573-7659 |
| GroupedDBID | .4I .86 .DC .VR 06D 0R~ 0VY 199 1N0 203 29I 2J2 2JY 2KG 2LR 2~H 30V 3V. 4.4 406 408 409 40D 40E 5GY 5VS 67Z 6NX 77I 7SC 7WY 7XB 88I 8AL 8AO 8FD 8FE 8FG 8FK 8FL 8FW 8TC 8UJ 95- 95. 95~ 96X AABHQ AAHNG AAIAL AAJKR AANZL AARTL AATVU AAWCG AAYIU AAYQN AAYTO ABBBX ABBXA ABDBF ABDZT ABECU ABFTD ABFTV ABHLI ABHQN ABJNI ABJOX ABKCH ABKTR ABMNI ABMQK ABNWP ABQBU ABSXP ABTEG ABTHY ABTKH ABTMW ABUWG ABWNU ABXPI ACGFS ACGOD ACHSB ACHXU ACKNC ACMDZ ACMLO ACOKC ACOMO ACSNA ACSTC ADHHG ADHIR ADKNI ADKPE ADMLS ADRFC ADTPH ADURQ ADYFF ADZKW AEGAL AEGNC AEJHL AEJRE AENEX AEOHA AEPYU AETLH AEVLU AEXYK AFBBN AFKRA AFLOW AFQWF AFWTZ AFZKB AGAYW AGDGC AGMZJ AGQEE AGQMX AGRTI AGWIL AGWZB AGYKE AHAVH AHBYD AHKAY AHPBZ AHYZX AIAKS AIIXL AILAN AITGF AJRNO AJZVZ ALMA_UNASSIGNED_HOLDINGS ALWAN AMKLP AMXSW AMYLF AMYQR AOCGG ARAPS ARMRJ ASPBG AVWKF AXYYD AYFIA AYJHY AZFZN AZQEC B-. BA0 BENPR BEZIV BGLVJ BPHCQ CCPQU CS3 CSCUP DDRTE DL5 DNIVK DU5 DWQXO EBS EIOEI ELW ESBYG F5P FEDTE FERAY FFXSO FNLPD FRNLG FRRFC FWDCC GGCAI GGRSB GJIRD GNUQQ GNWQR GQ7 GQ8 GROUPED_ABI_INFORM_RESEARCH GXS HCIFZ HF~ HG5 HMJXF HQYDN HRMNR HVGLF I-F I09 IHE IJ- IKXTQ IWAJR IXC IXD IZIGR IZQ I~Z J-C J0Z JBSCW JCJTX JQ2 K60 K6V K6~ K7- KDC KOV L.- L7M LAK LLZTM L~C L~D M0C M0N M2P MA- NB0 NPVJJ NQJWS O93 O9J OAM P2P P62 P9O PF0 PHGZM PHGZT PKEHL PQBIZ PQBZA PQEST PQGLB PQQKQ PQUKI PRINS PROAC PT5 Q2X Q9U QOS R89 R9I RNS RPX S16 S27 S3B SAP SCO SDH SHX SISQX SNE SNPRN SNX SOHCF SRMVM SSLCW STPWE SZN T13 TSG TSV TUC U2A UG4 UOJIU UTJUX UZXMN VC2 VFIZW W23 W48 WK8 YLTOR Z45 ZMTXR |
| ID | FETCH-LOGICAL-c314t-37b1cb3485809f49460c8b6a44cb8e332f76a38b1a842335a38a01767538b51e3 |
| IEDL.DBID | 7WY |
| ISICitedReferencesCount | 1 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001346187600001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1386-4564 |
| IngestDate | Tue Dec 02 05:32:25 EST 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 1 |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c314t-37b1cb3485809f49460c8b6a44cb8e332f76a38b1a842335a38a01767538b51e3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| OpenAccessLink | https://doi.org/10.1007/s10791-024-09470-5 |
| PQID | 3123438236 |
| PQPubID | 26106 |
| ParticipantIDs | proquest_journals_3123438236 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-11-02 |
| PublicationDateYYYYMMDD | 2024-11-02 |
| PublicationDate_xml | – month: 11 year: 2024 text: 2024-11-02 day: 02 |
| PublicationDecade | 2020 |
| PublicationPlace | Dordrecht |
| PublicationPlace_xml | – name: Dordrecht |
| PublicationTitle | Information retrieval (Boston) |
| PublicationYear | 2024 |
| Publisher | Springer Nature B.V |
| Publisher_xml | – name: Springer Nature B.V |
| SSID | ssj0006449 |
| Score | 2.4112298 |
| Snippet | In this paper, we propose a new method that combines the parallelism of the Spark-based platform with fast frequent mining, called STB_Apriori. Previous... |
| SourceID | proquest |
| SourceType | Aggregation Database |
| StartPage | 38 |
| SubjectTerms | Algorithms Boolean Clustering Data compression Data mining Datasets Distributed processing Efficiency Parallel processing Python Social networks User requirements |
| Title | Optimization of frequent item set mining parallelization algorithm based on spark platform |
| URI | https://www.proquest.com/docview/3123438236 |
| Volume | 27 |
| WOSCitedRecordID | wos001346187600001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8QwEB509aAHH6vic8lBvAXbJm3Sk6gogrou4vuyNGmqYvdhW_39TmqqguDFy9CmgYZJ8s0jMxmAbSXizNdo5EiTZpRHPKVSZZqqMI0y7mtP1OXebs5Etyvv7uKec7iVLqyywcQaqNORtj7yXYYQWx9aRXvjV2qrRtnTVVdCYxKmUFBzG9Ilbu-_kBhlfVwbXDKi9toUlzTjUueEjfkJOEUDR3g0_AXGtYQ5nv_v2BZgzumWZP9zMSzChBm2Yb6p20DcNm7D7I9LCNuw5VIXyA5xuUl2rpreS_BwgagycOmaZJSRrKjDryti_b6kNBUZ1FUmiL1GPM9N3vRN8kccZfU0IFZWpgSbEL-KFzLOk8r-aRmuj4-uDk-oq8hANfN5hWikfK0Yl6H04ozHPPK0VFHCuVbSMBZkIkqYVH4iUU1jIT4nuOXRKMHG0DdsBVrD0dCsAkHZbHhghDaxQqXIU0qkQsQmRV4yFYk12Gx43Xfbqux_M3r9788bMBPYubX-32ATWlXxZrZgWr9Xz2XRqVdJB6YOjrq9S3w7FRTpuXdoadBD2gsfPgABM8wO |
| linkProvider | ProQuest |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1NT9swGH5VGBLsQEdhYnzNB-BmkcRO7BzQhNhQq5bCoSDEpcSOwxDpx9oMtD_Fb-R1cMYkpN164BY5ViL5ffLkee33A2BXiTjzNTo50qQZ5RFPqVSZpipMo4z72hNlu7fLjuh25dVVfF6DpyoXxoZVVpxYEnU60naP_IAhxZaHVtG38S9qu0bZ09WqhcYLLNrmzyO6bNPD1ne0714QnPzoHTep6ypANfN5gV-U8rViXIbSizMe88jTUkUJ51pJw1iQiShhUvmJRKnBQrxOELYorHEw9A3D587BB1vi0zp7bUH_Mj9qi7h08GREbZkWl6TjUvWEjTEKOEWHSng0fEP-5R_tpP7e1uITLDvtTI5ewL4CNTNsQL3qS0EcTTXg4z9FFhuw7VIzyD5xuVcWi9XsVbg-Q9YcuHRUMspINinDywti97XJ1BRkUHbRILZMep6bvJqb5Le4KsXPAbFaICU4hPw8uSfjPCnsm9bgYibr8Rnmh6OhWQeC2sPwwAhtYoWiz1NKpELEJkXbMRWJL7BV2bbvaGPafzXsxv9vf4XFZu-00--0uu1NWAosruxed7AF88Xkt9mGBf1Q3E0nOyVCCdzMGgbPdHsfpA |
| linkToPdf | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1NT9wwEB1RqCp6KHQLaikfPpTeLBLbiZ1DhVBhBdpl2UNboV62sWND1ewuzQZQ_xq_jnFwClKl3jhwixwrkWbGz2_GMx6AD1pmLjbo5ChbOCpSUVClnaE6KVInYhPJpt3bt74cDNTpaTacg5u2FsanVbaY2AB1MTU-Rr7DEWKbQ6t0x4W0iOF-d_fiN_UdpPxJa9tO485EevbPNbpvs09H-6jrbca6B18-H9LQYYAaHosaV5eOjeZCJSrKnMhEGhml01wIo5XlnDmZ5lzpOFdIO3iCzzmaMJJsHExiy_G7z2BBiiT26WTHbPh3F0CekTXOnkqpv7IlFOyEsj3p842YoOhcyYgm_2wEze7WXXrKclmGV4FTk727RfAa5uykA0ttvwoS4KsDLx9cvtiBjVCyQT6SUJPlbbSd_Qa-nyCajkOZKpk64qom7bwmPt5NZrYm46a7BvHXp5elLdu5eXmGUqnPx8RzhILgEOJ29YtclHnt_7QCXx9FHqswP5lO7FsgyEmsYFYam2kkg5HWspAyswXqketUvoP1Vs-jACez0b2S1_7_egteoPZH_aNB7z0sMm9iPgTO1mG-ri7tBjw3V_XPWbXZGCuBH49tBbfsCiii |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Optimization+of+frequent+item+set+mining+parallelization+algorithm+based+on+spark+platform&rft.jtitle=Information+retrieval+%28Boston%29&rft.date=2024-11-02&rft.pub=Springer+Nature+B.V&rft.issn=1386-4564&rft.eissn=1573-7659&rft.volume=27&rft.issue=1&rft.spage=38&rft_id=info:doi/10.1007%2Fs10791-024-09470-5&rft.externalDBID=HAS_PDF_LINK |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1386-4564&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1386-4564&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1386-4564&client=summon |