Data lakes versus data warehouses: choosing the right approach for big data analytics

Uloženo v:
Podrobná bibliografie
Název: Data lakes versus data warehouses: choosing the right approach for big data analytics
Autoři: Saliha Mezzoudj, Meriem Khelifa, Yassmina Saadna
Zdroj: Journal of Electrical Systems and Information Technology, Vol 12, Iss 1, Pp 1-21 (2025)
Informace o vydavateli: SpringerOpen, 2025.
Rok vydání: 2025
Sbírka: LCC:Electrical engineering. Electronics. Nuclear engineering
LCC:Information technology
Témata: Big data analytics, Data lake, Data warehouse, Lakehouse architecture, Data management, Hadoop, Electrical engineering. Electronics. Nuclear engineering, TK1-9971, Information technology, T58.5-58.64
Popis: Abstract In the era of big data, organizations face critical decisions when selecting between data lakes and data warehouses to meet their analytics requirements. This article presents a comprehensive comparative analysis of these two predominant data management architectures, emphasizing their structural differences, functional capabilities, and suitability for diverse analytics workloads. Data lakes offer scalable, cost-effective storage for raw, unstructured, and semi-structured data, supporting advanced analytics and machine learning applications. In contrast, data warehouses provide optimized, schema-on-write frameworks for fast querying and reliable reporting on structured data. Through detailed examination of architectural designs, integration with big data tools including Hadoop, Spark, and Kafka, and evaluations based on performance, scalability, cost, and governance, this paper provides organizations with evidence-based guidance to align their data strategies with business objectives. Case studies from healthcare and retail sectors illustrate practical implications of each approach, while emerging trends such as lakehouse architectures, AI integration, blockchain security, edge computing, and quantum computing highlight future directions. The findings support for a hybrid data management solution that leverages the strengths of both data lakes and warehouses to enable robust, scalable, and innovative big data analytics.
Druh dokumentu: article
Popis souboru: electronic resource
Jazyk: English
ISSN: 2314-7172
Relation: https://doaj.org/toc/2314-7172
DOI: 10.1186/s43067-025-00275-0
Přístupová URL adresa: https://doaj.org/article/0a9e7c17f86f4ea686133e271b7dee9c
Přístupové číslo: edsdoj.0a9e7c17f86f4ea686133e271b7dee9c
Databáze: Directory of Open Access Journals
Popis
Abstrakt:Abstract In the era of big data, organizations face critical decisions when selecting between data lakes and data warehouses to meet their analytics requirements. This article presents a comprehensive comparative analysis of these two predominant data management architectures, emphasizing their structural differences, functional capabilities, and suitability for diverse analytics workloads. Data lakes offer scalable, cost-effective storage for raw, unstructured, and semi-structured data, supporting advanced analytics and machine learning applications. In contrast, data warehouses provide optimized, schema-on-write frameworks for fast querying and reliable reporting on structured data. Through detailed examination of architectural designs, integration with big data tools including Hadoop, Spark, and Kafka, and evaluations based on performance, scalability, cost, and governance, this paper provides organizations with evidence-based guidance to align their data strategies with business objectives. Case studies from healthcare and retail sectors illustrate practical implications of each approach, while emerging trends such as lakehouse architectures, AI integration, blockchain security, edge computing, and quantum computing highlight future directions. The findings support for a hybrid data management solution that leverages the strengths of both data lakes and warehouses to enable robust, scalable, and innovative big data analytics.
ISSN:23147172
DOI:10.1186/s43067-025-00275-0