
(2) * Anik Nur Handayani

(3) Jevri Tri Ardiansah

(4) Nor Salwa Damanhuri

*corresponding author
AbstractThis study explores the application of machine learning techniques, specifically classification, to improve data analysis outcomes. The primary objective is to evaluate and compare the performance of Decision Tree and Random Forest classifiers in the context of a structured dataset. Using the Elbow Method for optimal clustering alongside decision tree and random forest for classification algorithms, this research investigates the effectiveness of each method in accurately categorizing data. The study employs K-Means clustering to segment the data and Decision Trees and Random Forests for classification tasks. Dataset used in this research was obtained from Kaggle consisting of 13 attributes and 1048575 rows, all of which are numeric. The key results show that Random Forest outperforms Decision Trees in terms of classification accuracy, precision, recall, and F1 score, providing a more robust model for data classification. The performance improvement observed in Random Forest, particularly in handling complex datasets, demonstrates its superiority in generalizing across varied classes. The findings suggest that for applications requiring high accuracy and reliability, Random Forest is preferable to Decision Trees, especially when the dataset exhibits high variability. This research contributes to a deeper understanding of how different machine learning models can be applied to real-world classification problems, offering insights into the selection of the most appropriate model based on specific data characteristics.
KeywordsMachine Learning; Random Forest; Decision Tree; Clustering
|
DOIhttps://doi.org/10.31763/sitech.v5i2.1746 |
Article metrics10.31763/sitech.v5i2.1746 Abstract views : 160 | PDF views : 93 |
Cite |
Full Text![]() |
References
P. Carracedo-Reboredo et al., “A review on machine learning approaches and trends in drug discovery,†2021. doi: 10.1016/j.csbj.2021.08.011. L. E. Lwakatare, A. Raj, I. Crnkovic, J. Bosch, and H. H. Olsson, “Large-scale machine learning systems in real-world industrial settings: A review of challenges and solutions,†Inf Softw Technol, vol. 127, 2020, doi: 10.1016/j.infsof.2020.106368. H. Sun, H. V. Burton, and H. Huang, “Machine learning applications for building structural design and performance assessment: State-of-the-art review,†2021. doi: 10.1016/j.jobe.2020.101816. H. Humaira and R. Rasyidah, “Determining The Appropiate Cluster Number Using Elbow Method for K-Means Algorithm,†2020. doi: 10.4108/eai.24-1-2018.2292388. F. Sutomo et al., “OPTIMIZATION OF THE K-NEAREST NEIGHBORS ALGORITHM USING THE ELBOW METHOD ON STROKE PREDICTION,†Jurnal Teknik Informatika (Jutif), vol. 4, no. 1, 2023, doi: 10.52436/1.jutif.2023.4.1.839. M. Cui, “Introduction to the K-Means Clustering Algorithm Based on the Elbow Method,†Accounting, Auditing and Finance, vol. 1, no. 1, 2020. J. Deng and J. G. Yu, “A simple graph-based semi-supervised learning approach for imbalanced classification,†Pattern Recognit, vol. 118, 2021, doi: 10.1016/j.patcog.2021.108026. S. C. Huang, A. Pareek, M. Jensen, M. P. Lungren, S. Yeung, and A. S. Chaudhari, “Self-supervised learning for medical image classification: a systematic review and implementation guidelines,†2023. doi: 10.1038/s41746-023-00811-0. L. L. Custode and G. Iacca, “Evolutionary Learning of Interpretable Decision Trees,†IEEE Access, vol. 11, 2023, doi: 10.1109/ACCESS.2023.3236260. M. M. Mijwil and R. A. Abttan, “Utilizing the Genetic Algorithm to Pruning the C4.5 Decision Tree Algorithm,†Asian Journal of Applied Sciences, vol. 9, no. 1, Feb. 2021, doi: 10.24203/ajas.v9i1.6503. I. D. Mienye and Y. Sun, “A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects,†2022. doi: 10.1109/ACCESS.2022.3207287. W. Kirchgässner, O. Wallscheid, and J. Böcker, “Electric Motor Temperature,†Kaggle. Accessed: Dec. 01, 2024. [Online]. Available: https://www.kaggle.com/datasets/wkirgsn/electric-motor-temperature/data D. Singh and B. Singh, “Investigating the impact of data normalization on classification performance,†Appl Soft Comput, vol. 97, 2020, doi: 10.1016/j.asoc.2019.105524. L. Huang, J. Qin, Y. Zhou, F. Zhu, L. Liu, and L. Shao, “Normalization Techniques in Training DNNs: Methodology, Analysis and Application,†IEEE Trans Pattern Anal Mach Intell, vol. 45, no. 8, 2023, doi: 10.1109/TPAMI.2023.3250241. H. Henderi, “Comparison of Min-Max normalization and Z-Score Normalization in the K-nearest neighbor (kNN) Algorithm to Test the Accuracy of Types of Breast Cancer,†IJIIS: International Journal of Informatics and Information Systems, vol. 4, no. 1, 2021, doi: 10.47738/ijiis.v4i1.73. M. C. Bagaskoro, F. Prasojo, A. N. Handayani, E. Hitipeuw, A. P. Wibawa, and Y. W. Liang, “Hand image reading approach method to Indonesian Language Signing System (SIBI) using neural network and multi layer perseptron,†Science in Information Technology Letters, vol. 4, no. 2, pp. 97–108, Nov. 2023, doi: 10.31763/sitech.v4i2.1362. J. Yun, “ZNorm: Z-Score Gradient Normalization Accelerating Skip-Connected Network Training without Architectural Modification,†Aug. 2024, [Online]. Available: http://arxiv.org/abs/2408.01215 Q. H. Nguyen et al., “Influence of data splitting on performance of machine learning models in prediction of shear strength of soil,†Math Probl Eng, vol. 2021, 2021, doi: 10.1155/2021/4832864. I. O. Muraina, “Ideal Dataset Splitting Ratios in Machine Learning Algorithms: General Concerns for Data Scientists and Data Analysts,†7th International Mardin Artuklu Scientific Researches Conference, no. February, 2022. W. Jia, M. Sun, J. Lian, and S. Hou, “Feature dimensionality reduction: a review,†Complex and Intelligent Systems, vol. 8, no. 3, 2022, doi: 10.1007/s40747-021-00637-x. J. P. Bharadiya, “A Tutorial on Principal Component Analysis for Dimensionality Reduction in Machine Learning,†Int J Innov Res Sci Eng Technol, vol. 8, no. 5, 2023. G. T. Reddy et al., “Analysis of Dimensionality Reduction Techniques on Big Data,†IEEE Access, vol. 8, 2020, doi: 10.1109/ACCESS.2020.2980942. A. M. Ikotun, A. E. Ezugwu, L. Abualigah, B. Abuhaija, and J. Heming, “K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data,†Inf Sci (N Y), vol. 622, 2023, doi: 10.1016/j.ins.2022.11.139. K. P. Sinaga and M. S. Yang, “Unsupervised K-means clustering algorithm,†IEEE Access, vol. 8, 2020, doi: 10.1109/ACCESS.2020.2988796. S. Mishra, P. Shukla, and R. Agarwal, “Analyzing Machine Learning Enabled Fake News Detection Techniques for Diversified Datasets,†2022. doi: 10.1155/2022/1575365. Y. Wu and Y. Chang, “Ransomware Detection on Linux Using Machine Learning with Random Forest Algorithm,†Jun. 07, 2024. doi: 10.36227/techrxiv.171778770.06550236/v1. Z. Azam, M. M. Islam, and M. N. Huda, “Comparative Analysis of Intrusion Detection Systems and Machine Learning-Based Model Analysis Through Decision Tree,†IEEE Access, vol. 11, 2023, doi: 10.1109/ACCESS.2023.3296444. M. Janota and A. Morgado, “SAT-Based Encodings for Optimal Decision Trees with Explicit Paths,†in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2020. doi: 10.1007/978-3-030-51825-7_35. M. Mafarja et al., “Classification framework for faulty-software using enhanced exploratory whale optimizer-based feature selection scheme and random forest ensemble learning,†Applied Intelligence, vol. 53, no. 15, 2023, doi: 10.1007/s10489-022-04427-x. D. Elavarasan and P. M. D. R. Vincent, “A reinforced random forest model for enhanced crop yield prediction by integrating agrarian parameters,†J Ambient Intell Humaniz Comput, vol. 12, no. 11, 2021, doi: 10.1007/s12652-020-02752-y. C. Dunn, N. Moustafa, and B. Turnbull, “Robustness evaluations of sustainable machine learning models against data poisoning attacks in the internet of things,†Sustainability (Switzerland), vol. 12, no. 17, 2020, doi: 10.3390/SU12166434. D. Sun, J. Xu, H. Wen, and Y. Wang, “An Optimized Random Forest Model and Its Generalization Ability in Landslide Susceptibility Mapping: Application in Two Areas of Three Gorges Reservoir, China,†Journal of Earth Science, vol. 31, no. 6, 2020, doi: 10.1007/s12583-020-1072-9. A. Shebl, D. Abriha, M. Dawoud, M. Ali Hussein Ali, and Ã. Csámer, “PRISMA vs. Landsat 9 in lithological mapping − a K-fold Cross-Validation implementation with Random Forest,†Egyptian Journal of Remote Sensing and Space Science, vol. 27, no. 3, pp. 577–596, Sep. 2024, doi: 10.1016/j.ejrs.2024.07.003. I. K. Nti, O. Nyarko-Boateng, and J. Aning, “Performance of Machine Learning Algorithms with Different K Values in K-fold CrossValidation,†International Journal of Information Technology and Computer Science, vol. 13, no. 6, pp. 61–71, Dec. 2021, doi: 10.5815/ijitcs.2021.06.05. R. Yacouby and D. Axman, “Probabilistic Extension of Precision, Recall, and F1 Score for More Thorough Evaluation of Classification Models,†2020. doi: 10.18653/v1/2020.eval4nlp-1.9.
Refbacks
- There are currently no refbacks.
Copyright (c) 2025 Agnes Nola Sekar Kinasih, Anik Nur Handayani, Jevri Tri Ardiansah

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
___________________________________________________________
Science in Information Technology Letters
ISSN 2722-4139
Published by Association for Scientific Computing Electrical and Engineering (ASCEE)
W : http://pubs2.ascee.org/index.php/sitech
E : sitech@ascee.org, andri@ascee.org, andri.pranolo.id@ieee.org
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0