A project-based learning framework for teaching distributed data processing

Saved in:
Bibliographic Details
Title: A project-based learning framework for teaching distributed data processing
Authors: Rashid Turgunbaev
Source: Technical Science Integrated Research; Vol. 1 No. 5 (2025): Technical Science Integrated Research; 3-7 ; 3051-3855
Publisher Information: Technical Science Integrated Research
Publication Year: 2025
Subject Terms: distributed data processing, project-based learning, big data education, apache spark, computational pedagogy, data engineering
Description: The rapid ascent of big data technologies has fundamentally reshaped the computational landscape, creating a significant demand for a workforce proficient in distributed data processing. Traditional pedagogical methods in computer science, which often emphasize discrete algorithmic problems and localized execution environments, are increasingly misaligned with the practical, systems-oriented challenges inherent in this domain. This article proposes a comprehensive project-based learning framework designed specifically for teaching distributed data processing. The framework moves beyond theoretical exposition and simple syntax tutorials, instead situating learning within the context of a sustained, complex, and authentic project that mirrors the realities of data engineering in industry and research. We argue that this approach is not merely beneficial but essential for cultivating a deep, integrated understanding of concepts such as parallelization, fault tolerance, and cluster resource management. The article details the core principles of the framework, outlines a phased implementation strategy, discusses the challenges of managing a distributed systems classroom, and presents a qualitative analysis of the competencies developed. The primary thesis is that by grappling with the entire data lifecycle - from ingestion and storage to processing and analysis - within a project-based paradigm, students develop the robust technical skills and, more critically, the systemic problem-solving mindset required to navigate the complexities of modern data infrastructure.
Document Type: article in journal/newspaper
File Description: application/pdf
Language: English
Relation: https://altumnova.com/index.php/tsir/article/view/25/22; https://altumnova.com/index.php/tsir/article/view/25
Availability: https://altumnova.com/index.php/tsir/article/view/25
Rights: https://creativecommons.org/licenses/by/4.0
Accession Number: edsbas.4F52FB06
Database: BASE
Description
Abstract:The rapid ascent of big data technologies has fundamentally reshaped the computational landscape, creating a significant demand for a workforce proficient in distributed data processing. Traditional pedagogical methods in computer science, which often emphasize discrete algorithmic problems and localized execution environments, are increasingly misaligned with the practical, systems-oriented challenges inherent in this domain. This article proposes a comprehensive project-based learning framework designed specifically for teaching distributed data processing. The framework moves beyond theoretical exposition and simple syntax tutorials, instead situating learning within the context of a sustained, complex, and authentic project that mirrors the realities of data engineering in industry and research. We argue that this approach is not merely beneficial but essential for cultivating a deep, integrated understanding of concepts such as parallelization, fault tolerance, and cluster resource management. The article details the core principles of the framework, outlines a phased implementation strategy, discusses the challenges of managing a distributed systems classroom, and presents a qualitative analysis of the competencies developed. The primary thesis is that by grappling with the entire data lifecycle - from ingestion and storage to processing and analysis - within a project-based paradigm, students develop the robust technical skills and, more critically, the systemic problem-solving mindset required to navigate the complexities of modern data infrastructure.