Differential snapshot algorithms based on Hadoop MapReduce

Change Data Capture from source system is the first step in the incremental maintenance of data warehouses and business intelligence and is a key component of ETL (Extract, Transform and Load) technique. Methods of CDC are currently available, namely, time stamps, differential snapshots, triggers, a...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2015 12th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD) s. 1203 - 1208
Hlavní autoři: Wei Du, Xianxia Zou
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 01.08.2015
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Change Data Capture from source system is the first step in the incremental maintenance of data warehouses and business intelligence and is a key component of ETL (Extract, Transform and Load) technique. Methods of CDC are currently available, namely, time stamps, differential snapshots, triggers, and archive log. Differential snapshots do not rely on the implementation mechanism of the information sources, and therefore demonstrates better universality and adaptability. Due to the lack of computing resources, the differential snapshots based on sort merge and hash partition are sometimes error and not effective. This paper proposes the differential snapshot of low cost and high efficiency which combines open source database and Hadoop MapReduce. The differential snapshot based data summary which is generated by the MD5 algorithm is very effective but I/O cost is very heavy. So the paper proposes the SQL statement which queries the database while generating the tuples summary only once I/O. We implement the SQL statement on the open source database MySQL. In addition the parallel programming of MapReduce is used to find difference of database files which improves the efficiency and avoids the error. Experiment verifies the different performances among differential snapshot algorithms difference algorithm.
DOI:10.1109/FSKD.2015.7382113