Differential snapshot algorithms based on Hadoop MapReduce
Change Data Capture from source system is the first step in the incremental maintenance of data warehouses and business intelligence and is a key component of ETL (Extract, Transform and Load) technique. Methods of CDC are currently available, namely, time stamps, differential snapshots, triggers, a...
Uložené v:
| Vydané v: | 2015 12th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD) s. 1203 - 1208 |
|---|---|
| Hlavní autori: | , |
| Médium: | Konferenčný príspevok.. |
| Jazyk: | English |
| Vydavateľské údaje: |
IEEE
01.08.2015
|
| Predmet: | |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | Change Data Capture from source system is the first step in the incremental maintenance of data warehouses and business intelligence and is a key component of ETL (Extract, Transform and Load) technique. Methods of CDC are currently available, namely, time stamps, differential snapshots, triggers, and archive log. Differential snapshots do not rely on the implementation mechanism of the information sources, and therefore demonstrates better universality and adaptability. Due to the lack of computing resources, the differential snapshots based on sort merge and hash partition are sometimes error and not effective. This paper proposes the differential snapshot of low cost and high efficiency which combines open source database and Hadoop MapReduce. The differential snapshot based data summary which is generated by the MD5 algorithm is very effective but I/O cost is very heavy. So the paper proposes the SQL statement which queries the database while generating the tuples summary only once I/O. We implement the SQL statement on the open source database MySQL. In addition the parallel programming of MapReduce is used to find difference of database files which improves the efficiency and avoids the error. Experiment verifies the different performances among differential snapshot algorithms difference algorithm. |
|---|---|
| AbstractList | Change Data Capture from source system is the first step in the incremental maintenance of data warehouses and business intelligence and is a key component of ETL (Extract, Transform and Load) technique. Methods of CDC are currently available, namely, time stamps, differential snapshots, triggers, and archive log. Differential snapshots do not rely on the implementation mechanism of the information sources, and therefore demonstrates better universality and adaptability. Due to the lack of computing resources, the differential snapshots based on sort merge and hash partition are sometimes error and not effective. This paper proposes the differential snapshot of low cost and high efficiency which combines open source database and Hadoop MapReduce. The differential snapshot based data summary which is generated by the MD5 algorithm is very effective but I/O cost is very heavy. So the paper proposes the SQL statement which queries the database while generating the tuples summary only once I/O. We implement the SQL statement on the open source database MySQL. In addition the parallel programming of MapReduce is used to find difference of database files which improves the efficiency and avoids the error. Experiment verifies the different performances among differential snapshot algorithms difference algorithm. |
| Author | Xianxia Zou Wei Du |
| Author_xml | – sequence: 1 surname: Wei Du fullname: Wei Du organization: Dept. of Comput. Sci., GongDong Police Coll., Guangzhou, China – sequence: 2 surname: Xianxia Zou fullname: Xianxia Zou organization: Dept. of Comput. Sci., Jinan Univ., Guangzhou, China |
| BookMark | eNotz71OwzAUQGEjwQClD4BY_AIJvv6JYzbUUoooQoLu1XV8TS2ldpSEgbdnoNPZPuncsMtcMjF2B6IGEO5h8_W2rqUAU1vVSgB1wZbOtqAbq2zTgrlmj-sUI42U54Q9nzIO07HMHPvvMqb5eJq4x4kCL5lvMZQy8HccPin8dHTLriL2Ey3PXbD95nm_2la7j5fX1dOuSiDVXLlOB42y0w5lMMGa0DbSGxG10cY1nZIRyDppoog-RI_CkwOntG9ElFYt2P0_m4joMIzphOPv4Tyk_gDUOUQ3 |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/FSKD.2015.7382113 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9781467376815 1467376817 9781467376822 1467376825 |
| EndPage | 1208 |
| ExternalDocumentID | 7382113 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IL CBEJK RIE RIL |
| ID | FETCH-LOGICAL-i123t-9c4d4a2c49a2d5d75d862b50f454596c32f1e7925f0fbdfba0be91934b60f273 |
| IEDL.DBID | RIE |
| IngestDate | Thu Jun 29 18:36:03 EDT 2023 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i123t-9c4d4a2c49a2d5d75d862b50f454596c32f1e7925f0fbdfba0be91934b60f273 |
| PageCount | 6 |
| ParticipantIDs | ieee_primary_7382113 |
| PublicationCentury | 2000 |
| PublicationDate | 20150801 |
| PublicationDateYYYYMMDD | 2015-08-01 |
| PublicationDate_xml | – month: 08 year: 2015 text: 20150801 day: 01 |
| PublicationDecade | 2010 |
| PublicationTitle | 2015 12th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD) |
| PublicationTitleAbbrev | FSKD |
| PublicationYear | 2015 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| Score | 1.5755186 |
| Snippet | Change Data Capture from source system is the first step in the incremental maintenance of data warehouses and business intelligence and is a key component of... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1203 |
| SubjectTerms | Algorithm design and analysis Change Data Capture (CDC) Data mining Data warehouses differential snapshot algorithm Hadoop MapReduce MD5 algorithm Particle separators Partitioning algorithms Syntactics |
| Title | Differential snapshot algorithms based on Hadoop MapReduce |
| URI | https://ieeexplore.ieee.org/document/7382113 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA5t8eBJpRXf5ODRtNndZJN4tRZBLUV76K3kaQt1d-lu_f0mu0tF8OIthECYyeP7JjOTAeDWasnDvkFJEjtEaMyQSiOObKqUjgWnQtVVS17YdMoXCzHrgLt9Loy1tg4-s8PQrH35Jte78FQ2Ygn39krSBV3GWJOr1ToqIyxGk_fncYjVosN23K-CKTVeTI7-N9MxGPwk3sHZHlJOQMdmfXA_bouY-MO4gWUmi3KVV1BuPnJv2a8-SxigyMA8g_4eyfMCvsriLXzJagdgPnmcPzyhtuQBWnsIqZDQxBAZayJkbKhh1HiLQ1HsiGc6ItVeoZFlIqYOO2WcklhZ4TkYUSl2nomcgl6WZ_YMQBthQyJuqKYetHXKXUqVpwNceUKQEHwO-kHsZdF8arFsJb74u_sSHAbNNpFvV6BXbXf2Ghzor2pdbm_qlfgGYLWK9A |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB5qFfSk0opvc_DotvtIdhOvaqn0QdEeeit5rS3UzdLd-vtNtktF8OIthECYyeP7JjOTAbjXklO3b7woClMPkzDxRBxQT8dCyJBRwkRVtWSYjMd0NmOTBjzscmG01lXwme64ZuXLV0Zu3FNZN4motVeiPdgnGIfBNlurdlUGPuv23gfPLlqLdOqRv0qmVIjRO_7fXCfQ_km9Q5MdqJxCQ2cteHyuy5jY47hCRcbzYmFKxFcfxtr2i88COTBSyGTI3iTG5GjE8zf3Katuw7T3Mn3qe3XRA29pQaT0mMQK81BixkNFVEKUtTkE8VNsuQ6LpVVpoBMWktRPhUoF94VmloVhEfup5SJn0MxMps8B6cBXOKCKSGJhW8Y0jYmwhIAKSwki7F9Ay4k9z7ffWsxriS__7r6Dw_50NJwPX8eDKzhyWt7GwV1Ds1xv9A0cyK9yWaxvq1X5Bszyjjs |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2015+12th+International+Conference+on+Fuzzy+Systems+and+Knowledge+Discovery+%28FSKD%29&rft.atitle=Differential+snapshot+algorithms+based+on+Hadoop+MapReduce&rft.au=Wei+Du&rft.au=Xianxia+Zou&rft.date=2015-08-01&rft.pub=IEEE&rft.spage=1203&rft.epage=1208&rft_id=info:doi/10.1109%2FFSKD.2015.7382113&rft.externalDocID=7382113 |