Hadoop Distributed File System Write Operations

Uloženo v:
Podrobná bibliografie
Název: Hadoop Distributed File System Write Operations
Autoři: Renukadevi Chuppala
Zdroj: International Journal of Intelligent Systems and Applications in Engineering; Vol. 12 No. 23s (2024); 720 – 731
Informace o vydavateli: International Journal of Intelligent Systems and Applications in Engineering, 2024.
Rok vydání: 2024
Témata: Hadoop Distributed File System (HDFS),NameNode , DataNode, Replica, Rackawareness, Data Packet, Data Packet Transfer Time, Pipeline, Fully Connected Digrapgh Network Topology, A* algorithm
Popis: Hadoop is an open-source version of the MapReduce Framework for distributed processing. A Hadoop cluster possesses the capacity to manage substantial volumes of data. Hadoop utilizes the Hadoop Distributed File System, also known as HDFS, to manage large amounts of data. The client will transfer data to the DataNodes by retrieving block information from the NameNode. The pipeline configuration will connect the DataNodes that store the blocks. If a DataNode or network fails during the data writing process, the pipeline will remove the failed DataNode. The pipeline will add the new DataNode based on the existing DataNodes in the cluster. If there is a scarcity of spare nodes in the cluster, customers may encounter an abnormally high frequency of pipeline failures due to the inability to locate additional DataNodes or replacements. In the event of a network failure, the data packet is unable to reach the target DataNode due to their interconnected pipeline structure. Interconnecting each DataNode with every other DataNode ensures that multiple pathways are available through other DataNodes, thereby preventing network failure. The copy operation will take longer due to pipeline connectivity. On the other hand, a direct connection between a DataNode and all other DataNodes significantly reduces the time required, as the datapacket doesn't have to traverse through all other DataNodes to reach the final DataNode. This paper presents the utilization of the A* algorithm to enhance the performance of write operations in the Hadoop Distributed File System.
Druh dokumentu: Article
Popis souboru: application/pdf
Jazyk: English
ISSN: 2147-6799
Přístupová URL adresa: https://www.ijisae.org/index.php/IJISAE/article/view/6987
Rights: CC BY SA
Přístupové číslo: edsair.issn21476799..846b6d970bcb19a1a3c97fa010dae56b
Databáze: OpenAIRE
Popis
Abstrakt:Hadoop is an open-source version of the MapReduce Framework for distributed processing. A Hadoop cluster possesses the capacity to manage substantial volumes of data. Hadoop utilizes the Hadoop Distributed File System, also known as HDFS, to manage large amounts of data. The client will transfer data to the DataNodes by retrieving block information from the NameNode. The pipeline configuration will connect the DataNodes that store the blocks. If a DataNode or network fails during the data writing process, the pipeline will remove the failed DataNode. The pipeline will add the new DataNode based on the existing DataNodes in the cluster. If there is a scarcity of spare nodes in the cluster, customers may encounter an abnormally high frequency of pipeline failures due to the inability to locate additional DataNodes or replacements. In the event of a network failure, the data packet is unable to reach the target DataNode due to their interconnected pipeline structure. Interconnecting each DataNode with every other DataNode ensures that multiple pathways are available through other DataNodes, thereby preventing network failure. The copy operation will take longer due to pipeline connectivity. On the other hand, a direct connection between a DataNode and all other DataNodes significantly reduces the time required, as the datapacket doesn't have to traverse through all other DataNodes to reach the final DataNode. This paper presents the utilization of the A* algorithm to enhance the performance of write operations in the Hadoop Distributed File System.
ISSN:21476799