The Cutting-Edge Hadoop Distributed File System: Un-leashing Optimal Performance

Despite the widespread adoption of 1000-node Hadoop clusters by the end of 2022, Hadoop implementation still encounters various challenges. As a vital software paradigm for managing big data, Hadoop relies on the Hadoop Distributed File System (HDFS), a distributed file system designed to handle dat...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	EAI endorsed transactions on scalable information systems Ročník 12; číslo 5
Hlavní autoři:	Gupta, Anish, Santhiya, P., Thiyagarajan, C., Gupta, Anurag, Gupta, Manish, Dwivedi, Rajendra Kr
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Ghent European Alliance for Innovation (EAI) 13.10.2025
Témata:	Big Data Data replication Data transfer (computers) Fault tolerance Optimization Performance enhancement Pipelines Reliability analysis
ISSN:	2032-9407, 2032-9407
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Despite the widespread adoption of 1000-node Hadoop clusters by the end of 2022, Hadoop implementation still encounters various challenges. As a vital software paradigm for managing big data, Hadoop relies on the Hadoop Distributed File System (HDFS), a distributed file system designed to handle data replication for fault tolerance. This technique involves duplicating data across multiple DataNodes (DN) to ensure data reliability and availability. While data replication is effective, it suffers from inefficiencies due to its reliance on a single-pipelined paradigm, leading to time wastage. To tackle this limitation and optimize HDFS performance, a novel approach is proposed, utilizing multiple pipelines for data block transfers in-stead of a single pipeline. Additionally, the proposed approach incorporates dynamic reliability evaluation, wherein each DN updates its reliability value after each round and sends this information to the NameNode (NN). The NN then sorts the DN based on their reliability values. When a client requests to upload a data block, the NN responds with a list of high-reliability DN, ensuring high-performance data transfer. This proposed approach has been fully implemented and tested through rigorous experiments. The results reveal significant improvements in HDFS write operations, providing a promising solution to overcome the challenges associated with traditional HDFS implementations. By leveraging multiple pipelines and dynamic reliability assessment, this approach enhances the overall performance and responsiveness of Hadoop's distributed file system.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2032-9407 2032-9407
DOI:	10.4108/eetsis.9027