A Compression-Based Data Structure on MapReduce System for Data Warehouse Management

Since business, financial, and scientific data are increasing, these data need to be stored in the data warehouse for decision making. Data needs to be compressed and stored without occupying more space to maintain efficiency. This paper presents a data storage technique called MapReduce-based, opti...

Full description

Saved in:
Bibliographic Details
Published in:2025 2nd International Conference on Next-Generation Computing, IoT and Machine Learning (NCIM) pp. 1 - 6
Main Authors: Rouf, Mohammad Abdur, Manirzzaman, Md, Hannan, Osama Abdul
Format: Conference Proceeding
Language:English
Published: IEEE 27.06.2025
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Since business, financial, and scientific data are increasing, these data need to be stored in the data warehouse for decision making. Data needs to be compressed and stored without occupying more space to maintain efficiency. This paper presents a data storage technique called MapReduce-based, optimized, and space-efficient data warehouse system. The proposed solution adopts the Optimized Row Columnar File (ORC File) format to optimize compression and quick data retrieval. This design offers higher storage efficiency along with query processing acceleration. The storage space is reduced by maintaining the data in columnar format along with more sophisticated compression schemes. This higher data retrieval rate makes it very efficient in the case of analytical workloads in large-scale data warehouses. Experimental results show that this structure significantly reduces disk space, reduces I/O operations and improves query execution time. These improvements make it a practical solution for large-scale data applications where performance and cost-effectiveness are essential. The findings demonstrate the efficiency of the data structure of ORC and query performance. This offers a practical solution for growing data size in modern data warehouses. This paper analyzes the query time and storage space for the raw text file (CSV file), MySQL and ORC File format. ORC File shows 33% to 52% in query efficiency and up to 78% reduction in storage space.
DOI:10.1109/NCIM65934.2025.11159932