To Ameliorate Classification Accuracy using Ensemble Distributed Decision Tree (DDT) Vote Approach: An Empirical discourse of Geographical Data Mining

Weather data of Kashmir province has 6 attributes recorded at three different substations. This paper proposes a distributed decision tree algorithm and its implementation on Historical Geographical data of Kashmir province. The machine learning Decision tree algorithm applied on the Kashmir provinc...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Procedia computer science Ročník 184; s. 935 - 940
Hlavní autoři: Fayaz, Sheikh Amir, Zaman, Majid, Butt, Muheet Ahmed
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier B.V 2021
Témata:
ISSN:1877-0509, 1877-0509
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Weather data of Kashmir province has 6 attributes recorded at three different substations. This paper proposes a distributed decision tree algorithm and its implementation on Historical Geographical data of Kashmir province. The machine learning Decision tree algorithm applied on the Kashmir province dataset generates the accuracy of 81.54%. The distributed decision tree generates multiple trees based on the partitions of the original dataset in which the data is segregated according to the substations (42026, 42027 and 42044). The ratio between generated data sets was distributed in 32.38%, 34.19% and 33.42% respectively which is appropriate for the parallelism. Its distributed implementation, i.e. Distributed Decision Tree produces a specified number of sub-trees (depending upon number of partitions of input dataset) and at the end collects votes or averages the prediction or classification. In this paper, we have implemented the hard- voting approach to calculate the overall performance of the n-number of trees in distributed environment. The empirical results demonstrate that distributed decision trees approach has not improved the overall accuracy as compared to the original dataset without partitioning.
ISSN:1877-0509
1877-0509
DOI:10.1016/j.procs.2021.03.116