Enhancing Arabic text classification through mobile virtual keypad-based encoding algorithm
The proposed work designs a new encoding scheme-based algorithm to improve the classification of Arabic text, utilizing the substitution scheme of the Arabic typewriter in a virtual mobile environment. Any letter of the dataset is coded into its respective numeric representation. This dataset compri...
Saved in:
| Published in: | Franklin Open Vol. 12; p. 100373 |
|---|---|
| Main Author: | |
| Format: | Journal Article |
| Language: | English |
| Published: |
Elsevier
01.09.2025
|
| Subjects: | |
| ISSN: | 2773-1863 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The proposed work designs a new encoding scheme-based algorithm to improve the classification of Arabic text, utilizing the substitution scheme of the Arabic typewriter in a virtual mobile environment. Any letter of the dataset is coded into its respective numeric representation. This dataset comprises 10,000 records, equally divided into five groups: culture, diversity, sports, politics, and economy, with 2000 records in each group. The analysis involves comparing the performance of standard machine learning algorithms on the original data (without encoding) against their performance on the transformed data (with encoding applied). As the results indicate, the encoding algorithm consistently outperforms in terms of classification performance across all measures, including accuracy, precision, recall, and F1-score. It is worth noting that the accuracy of Random Forest made a significant leap, increasing after encoding and ultimately reaching 0.9035. On the same note, the accuracy of the Naive Bayes experienced a significant increase, increasing to 0.924 after the corrective surge. The Support Vector Machine (SVM) also showed significant improvement in accuracy, increasing from 0.6707 to 0.9315, which was matched by the F1-score, which also increased from 0.6751 to 0.9315. London was seen as one of the biggest beneficiaries of logistic regression, as the accuracy level rose to 0.932 and the F1-score increased to 0.9320. These findings suggest that the proposed encoding algorithm will enhance the representation and preprocessing of features, ultimately yielding the best classification results across all tested models. |
|---|---|
| ISSN: | 2773-1863 |
| DOI: | 10.1016/j.fraope.2025.100373 |