Comparative analysis of decision tree and random forest classifiers for structured data classification in machine learning

Agnes Nola Sekar Kinasih; Anik Nur Handayani; Jevri Tri Ardiansah; Nor Salwa Damanhuri

doi:10.31763/sitech.v5i2.1746


Comparative analysis of decision tree and random forest classifiers for structured data classification in machine learning

⁽¹⁾ Agnes Nola Sekar Kinasih

(Universitas Negeri Malang, Indonesia)
^{(2) *} Anik Nur Handayani

(Universitas Negeri Malang, Indonesia)
⁽³⁾ Jevri Tri Ardiansah

(Universitas Negeri Malang, Indonesia)
⁽⁴⁾ Nor Salwa Damanhuri

(Electrical Engineering Studies, Universiti Teknologi MARA (UiTM) Cawangan Pulau Pinang, Malaysia)
^*corresponding author

Abstract

This study explores the application of machine learning techniques, specifically classification, to improve data analysis outcomes. The primary objective is to evaluate and compare the performance of Decision Tree and Random Forest classifiers in the context of a structured dataset. Using the Elbow Method for optimal clustering alongside decision tree and random forest for classification algorithms, this research investigates the effectiveness of each method in accurately categorizing data. The study employs K-Means clustering to segment the data and Decision Trees and Random Forests for classification tasks. Dataset used in this research was obtained from Kaggle consisting of 13 attributes and 1048575 rows, all of which are numeric. The key results show that Random Forest outperforms Decision Trees in terms of classification accuracy, precision, recall, and F1 score, providing a more robust model for data classification. The performance improvement observed in Random Forest, particularly in handling complex datasets, demonstrates its superiority in generalizing across varied classes. The findings suggest that for applications requiring high accuracy and reliability, Random Forest is preferable to Decision Trees, especially when the dataset exhibits high variability. This research contributes to a deeper understanding of how different machine learning models can be applied to real-world classification problems, offering insights into the selection of the most appropriate model based on specific data characteristics.

Keywords

Machine Learning; Random Forest; Decision Tree; Clustering

DOI

https://doi.org/10.31763/sitech.v5i2.1746

Article metrics

10.31763/sitech.v5i2.1746 Abstract views : 234 | PDF views : 187

Cite

How to cite item

Full Text

Download

References

P. Carracedo-Reboredo et al., â€œA review on machine learning approaches and trends in drug discovery,â€ 2021. doi: 10.1016/j.csbj.2021.08.011. L. E. Lwakatare, A. Raj, I. Crnkovic, J. Bosch, and H. H. Olsson, â€œLarge-scale machine learning systems in real-world industrial settings: A review of challenges and solutions,â€ Inf Softw Technol, vol. 127, 2020, doi: 10.1016/j.infsof.2020.106368. H. Sun, H. V. Burton, and H. Huang, â€œMachine learning applications for building structural design and performance assessment: State-of-the-art review,â€ 2021. doi: 10.1016/j.jobe.2020.101816. H. Humaira and R. Rasyidah, â€œDetermining The Appropiate Cluster Number Using Elbow Method for K-Means Algorithm,â€ 2020. doi: 10.4108/eai.24-1-2018.2292388. F. Sutomo et al., â€œOPTIMIZATION OF THE K-NEAREST NEIGHBORS ALGORITHM USING THE ELBOW METHOD ON STROKE PREDICTION,â€ Jurnal Teknik Informatika (Jutif), vol. 4, no. 1, 2023, doi: 10.52436/1.jutif.2023.4.1.839. M. Cui, â€œIntroduction to the K-Means Clustering Algorithm Based on the Elbow Method,â€ Accounting, Auditing and Finance, vol. 1, no. 1, 2020. J. Deng and J. G. Yu, â€œA simple graph-based semi-supervised learning approach for imbalanced classification,â€ Pattern Recognit, vol. 118, 2021, doi: 10.1016/j.patcog.2021.108026. S. C. Huang, A. Pareek, M. Jensen, M. P. Lungren, S. Yeung, and A. S. Chaudhari, â€œSelf-supervised learning for medical image classification: a systematic review and implementation guidelines,â€ 2023. doi: 10.1038/s41746-023-00811-0. L. L. Custode and G. Iacca, â€œEvolutionary Learning of Interpretable Decision Trees,â€ IEEE Access, vol. 11, 2023, doi: 10.1109/ACCESS.2023.3236260. M. M. Mijwil and R. A. Abttan, â€œUtilizing the Genetic Algorithm to Pruning the C4.5 Decision Tree Algorithm,â€ Asian Journal of Applied Sciences, vol. 9, no. 1, Feb. 2021, doi: 10.24203/ajas.v9i1.6503. I. D. Mienye and Y. Sun, â€œA Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects,â€ 2022. doi: 10.1109/ACCESS.2022.3207287. W. KirchgÃ¤ssner, O. Wallscheid, and J. BÃ¶cker, â€œElectric Motor Temperature,â€ Kaggle. Accessed: Dec. 01, 2024. [Online]. Available: https://www.kaggle.com/datasets/wkirgsn/electric-motor-temperature/data D. Singh and B. Singh, â€œInvestigating the impact of data normalization on classification performance,â€ Appl Soft Comput, vol. 97, 2020, doi: 10.1016/j.asoc.2019.105524. L. Huang, J. Qin, Y. Zhou, F. Zhu, L. Liu, and L. Shao, â€œNormalization Techniques in Training DNNs: Methodology, Analysis and Application,â€ IEEE Trans Pattern Anal Mach Intell, vol. 45, no. 8, 2023, doi: 10.1109/TPAMI.2023.3250241. H. Henderi, â€œComparison of Min-Max normalization and Z-Score Normalization in the K-nearest neighbor (kNN) Algorithm to Test the Accuracy of Types of Breast Cancer,â€ IJIIS: International Journal of Informatics and Information Systems, vol. 4, no. 1, 2021, doi: 10.47738/ijiis.v4i1.73. M. C. Bagaskoro, F. Prasojo, A. N. Handayani, E. Hitipeuw, A. P. Wibawa, and Y. W. Liang, â€œHand image reading approach method to Indonesian Language Signing System (SIBI) using neural network and multi layer perseptron,â€ Science in Information Technology Letters, vol. 4, no. 2, pp. 97â€“108, Nov. 2023, doi: 10.31763/sitech.v4i2.1362. J. Yun, â€œZNorm: Z-Score Gradient Normalization Accelerating Skip-Connected Network Training without Architectural Modification,â€ Aug. 2024, [Online]. Available: http://arxiv.org/abs/2408.01215 Q. H. Nguyen et al., â€œInfluence of data splitting on performance of machine learning models in prediction of shear strength of soil,â€ Math Probl Eng, vol. 2021, 2021, doi: 10.1155/2021/4832864. I. O. Muraina, â€œIdeal Dataset Splitting Ratios in Machine Learning Algorithms: General Concerns for Data Scientists and Data Analysts,â€ 7th International Mardin Artuklu Scientific Researches Conference, no. February, 2022. W. Jia, M. Sun, J. Lian, and S. Hou, â€œFeature dimensionality reduction: a review,â€ Complex and Intelligent Systems, vol. 8, no. 3, 2022, doi: 10.1007/s40747-021-00637-x. J. P. Bharadiya, â€œA Tutorial on Principal Component Analysis for Dimensionality Reduction in Machine Learning,â€ Int J Innov Res Sci Eng Technol, vol. 8, no. 5, 2023. G. T. Reddy et al., â€œAnalysis of Dimensionality Reduction Techniques on Big Data,â€ IEEE Access, vol. 8, 2020, doi: 10.1109/ACCESS.2020.2980942. A. M. Ikotun, A. E. Ezugwu, L. Abualigah, B. Abuhaija, and J. Heming, â€œK-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data,â€ Inf Sci (N Y), vol. 622, 2023, doi: 10.1016/j.ins.2022.11.139. K. P. Sinaga and M. S. Yang, â€œUnsupervised K-means clustering algorithm,â€ IEEE Access, vol. 8, 2020, doi: 10.1109/ACCESS.2020.2988796. S. Mishra, P. Shukla, and R. Agarwal, â€œAnalyzing Machine Learning Enabled Fake News Detection Techniques for Diversified Datasets,â€ 2022. doi: 10.1155/2022/1575365. Y. Wu and Y. Chang, â€œRansomware Detection on Linux Using Machine Learning with Random Forest Algorithm,â€ Jun. 07, 2024. doi: 10.36227/techrxiv.171778770.06550236/v1. Z. Azam, M. M. Islam, and M. N. Huda, â€œComparative Analysis of Intrusion Detection Systems and Machine Learning-Based Model Analysis Through Decision Tree,â€ IEEE Access, vol. 11, 2023, doi: 10.1109/ACCESS.2023.3296444. M. Janota and A. Morgado, â€œSAT-Based Encodings for Optimal Decision Trees with Explicit Paths,â€ in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2020. doi: 10.1007/978-3-030-51825-7_35. M. Mafarja et al., â€œClassification framework for faulty-software using enhanced exploratory whale optimizer-based feature selection scheme and random forest ensemble learning,â€ Applied Intelligence, vol. 53, no. 15, 2023, doi: 10.1007/s10489-022-04427-x. D. Elavarasan and P. M. D. R. Vincent, â€œA reinforced random forest model for enhanced crop yield prediction by integrating agrarian parameters,â€ J Ambient Intell Humaniz Comput, vol. 12, no. 11, 2021, doi: 10.1007/s12652-020-02752-y. C. Dunn, N. Moustafa, and B. Turnbull, â€œRobustness evaluations of sustainable machine learning models against data poisoning attacks in the internet of things,â€ Sustainability (Switzerland), vol. 12, no. 17, 2020, doi: 10.3390/SU12166434. D. Sun, J. Xu, H. Wen, and Y. Wang, â€œAn Optimized Random Forest Model and Its Generalization Ability in Landslide Susceptibility Mapping: Application in Two Areas of Three Gorges Reservoir, China,â€ Journal of Earth Science, vol. 31, no. 6, 2020, doi: 10.1007/s12583-020-1072-9. A. Shebl, D. Abriha, M. Dawoud, M. Ali Hussein Ali, and Ã. CsÃ¡mer, â€œPRISMA vs. Landsat 9 in lithological mapping âˆ’ a K-fold Cross-Validation implementation with Random Forest,â€ Egyptian Journal of Remote Sensing and Space Science, vol. 27, no. 3, pp. 577â€“596, Sep. 2024, doi: 10.1016/j.ejrs.2024.07.003. I. K. Nti, O. Nyarko-Boateng, and J. Aning, â€œPerformance of Machine Learning Algorithms with Different K Values in K-fold CrossValidation,â€ International Journal of Information Technology and Computer Science, vol. 13, no. 6, pp. 61â€“71, Dec. 2021, doi: 10.5815/ijitcs.2021.06.05. R. Yacouby and D. Axman, â€œProbabilistic Extension of Precision, Recall, and F1 Score for More Thorough Evaluation of Classification Models,â€ 2020. doi: 10.18653/v1/2020.eval4nlp-1.9.

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

___________________________________________________________
Science in Information Technology Letters
ISSN 2722-4139
Published by Association for Scientific Computing Electrical and Engineering (ASCEE)
W : http://pubs2.ascee.org/index.php/sitech
E : sitech@ascee.org, andri@ascee.org, andri.pranolo.id@ieee.org

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0

View My Stats

Username
Password
Remember me