Classification of Imbalanced Data Using Random Forest Algorithm with SMOTE and SMOTE-ENN (Case Study on Stunting Data ; Klasifikasi Data Tak Seimbang menggunakan Algoritma Random Forest dengan SMOTE dan SMOTE-ENN (Studi Kasus pada Data Stunting)
Saved in:
| Title: | Classification of Imbalanced Data Using Random Forest Algorithm with SMOTE and SMOTE-ENN (Case Study on Stunting Data ; Klasifikasi Data Tak Seimbang menggunakan Algoritma Random Forest dengan SMOTE dan SMOTE-ENN (Studi Kasus pada Data Stunting) |
|---|---|
| Authors: | Fauziah, Anju, Julan Hernadi |
| Source: | Jurnal Riset Sistem dan Teknologi Informasi; Vol. 3 No. 2 (2025): Jurnal Riset Sistem dan Teknologi Informasi (RESTIA) ; 112-121 ; 2988-5663 ; 10.30787/restia.v3i2 |
| Publisher Information: | Universitas Aisyiyah Surakarta |
| Publication Year: | 2025 |
| Subject Terms: | Informatics Engineering, Information Systems, Distributed Computer Systems, Artificial Intelligence, artificial intelligence system |
| Description: | The random forest algorithm is one of the widely used machine learning classification methods because it has the advantage of reducing the risk of overfitting while improving general prediction performance. However, for data with unbalanced classes, this algorithm lacks to achieve its best performance, particularly in predicting data in the minority class. As a result, this article proposes two resampling approaches to balance the data: the Synthetic Minority Oversampling Technique (SMOTE) and the Synthetic Minority Oversampling Technique with Edited Nearest Neighbors (SMOTE-ENN). For the data classification technique, the random forest algorithm is applied to the original data, then to the resampling results using both SMOTE as well as SMOTE-ENN. The case study was applied to stunting data consisting of 421 cases in the majority class and 79 in the minority class. An accuracy of 89% was obtained on the original data, 90% on the resampled data with SMOTE-ENN, and 91% on the resampled data with SMOTE. The best accuracy was obtained using resampling technique with SMOTE, however it was not particularlysignificant. ; Algoritma random forest merupakan salah satu metode klasifikasi pembelajaran mesin yang banyak digunakan karena memiliki keunggulan dalam mengurangi resiko overfitting sekaligus meningkatkan kinerja prediksi secara umum. Namun untuk data dengan kelas tidak seimbang, algoritma ini tidak mampu mencapai performa maksimal khususnya dalam memprediksi data pada kelas minoritas. Untuk itu artikel ini menawarkan dua metode resampling untuk menyeimbangkan data, yaitu Synthetic Minority Oversampling Technique (SMOTE) dan Synthetic Minority Oversampling Technique with Edited Nearest Neighbors (SMOTE-ENN). Untuk klasifikasi data diterapkan algoritma random forest terhadap data asli dan hasil resampling baik menggunakan SMOTE maupun SMOTE-ENN. Studi kasus diterapkan pada data stunting yang berjumlah 421 pada kelas mayoritas dan 79 pada kelas minoritas. Diperoleh akurasi 89% pada data asli, 90% pada data hasil ... |
| Document Type: | article in journal/newspaper |
| File Description: | application/pdf |
| Language: | English |
| Relation: | https://journal.aiska-university.ac.id/index.php/restia/article/view/1906/853 |
| DOI: | 10.30787/restia.v3i2.1906 |
| Availability: | https://journal.aiska-university.ac.id/index.php/restia/article/view/1906 https://doi.org/10.30787/restia.v3i2.1906 |
| Rights: | Copyright (c) 2025 Anju Fauziah, Julan Hernadi ; https://creativecommons.org/licenses/by-sa/4.0 |
| Accession Number: | edsbas.8D1C4B0E |
| Database: | BASE |
| Abstract: | The random forest algorithm is one of the widely used machine learning classification methods because it has the advantage of reducing the risk of overfitting while improving general prediction performance. However, for data with unbalanced classes, this algorithm lacks to achieve its best performance, particularly in predicting data in the minority class. As a result, this article proposes two resampling approaches to balance the data: the Synthetic Minority Oversampling Technique (SMOTE) and the Synthetic Minority Oversampling Technique with Edited Nearest Neighbors (SMOTE-ENN). For the data classification technique, the random forest algorithm is applied to the original data, then to the resampling results using both SMOTE as well as SMOTE-ENN. The case study was applied to stunting data consisting of 421 cases in the majority class and 79 in the minority class. An accuracy of 89% was obtained on the original data, 90% on the resampled data with SMOTE-ENN, and 91% on the resampled data with SMOTE. The best accuracy was obtained using resampling technique with SMOTE, however it was not particularlysignificant. ; Algoritma random forest merupakan salah satu metode klasifikasi pembelajaran mesin yang banyak digunakan karena memiliki keunggulan dalam mengurangi resiko overfitting sekaligus meningkatkan kinerja prediksi secara umum. Namun untuk data dengan kelas tidak seimbang, algoritma ini tidak mampu mencapai performa maksimal khususnya dalam memprediksi data pada kelas minoritas. Untuk itu artikel ini menawarkan dua metode resampling untuk menyeimbangkan data, yaitu Synthetic Minority Oversampling Technique (SMOTE) dan Synthetic Minority Oversampling Technique with Edited Nearest Neighbors (SMOTE-ENN). Untuk klasifikasi data diterapkan algoritma random forest terhadap data asli dan hasil resampling baik menggunakan SMOTE maupun SMOTE-ENN. Studi kasus diterapkan pada data stunting yang berjumlah 421 pada kelas mayoritas dan 79 pada kelas minoritas. Diperoleh akurasi 89% pada data asli, 90% pada data hasil ... |
|---|---|
| DOI: | 10.30787/restia.v3i2.1906 |
Nájsť tento článok vo Web of Science