Person and Activity Recognition Based on Joint Motion Features Using Deep Learning with Drone Camera

(1) * Riky Tri Yunardi Mail (Universitas Airlangga, Indonesia)
(2) Tri Arief Sardjono Mail (Institut Teknologi Sepuluh Nopember, Indonesia)
(3) Ronny Mardiyanto Mail (Institut Teknologi Sepuluh Nopember, Indonesia)
*corresponding author

Abstract


The increasing demand for drone-based surveillance systems has raised significant concerns about advancements in person and activity recognition based on joint motion features within visual monitoring frameworks. This study contributes to developing deep learning models that improve surveillance systems by using RGB video data recorded by drone cameras. In this study, a framework for person and activity recognition based on 120 datasets is proposed, from drone camera-recorded videos of 10 subjects, each performing six movements: walking, running, jogging, boxing, waving, and clapping. Joint motion features, including joint positions and joint angles, were extracted and processed as one-dimensional series data. The 1D-CNN, LeNet, AlexNet, and AlexNet-LSTM architectures were developed and evaluated for classification tasks. Evaluation results show that AlexNet-LSTM outperformed the other models in person recognition, achieving a classification accuracy of 0.8544, a precision of 0.9161, a recall of 0.8575, and an F1-score of 0.8332, while AlexNet delivered superior performance in activity recognition with an accuracy of 0.8571, a precision of 0.8442, a recall of 0.8599, and an F1-score of 0.8463. The relatively small dataset size used likely favors simpler architectures like AlexNet. These findings highlight the effectiveness of joint motion features for person identification and emphasize the suitability of simpler classifier architectures for activity classification when working with small datasets.

Keywords


Person Recognition; Activity Recognition; Joint Motion Features; Deep Learning Architectures; Drone

   

DOI

https://doi.org/10.31763/ijrcs.v5i3.1949
      

Article metrics

10.31763/ijrcs.v5i3.1949 Abstract views : 15 | PDF views : 14

   

Cite

   

Full Text

Download

References


[1] B. Kwon and T. Kim, “Toward an Online Continual Learning Architecture for Intrusion Detection of Video Surveillance,” IEEE Access, vol. 10, pp. 89732-89744, 2022, https://doi.org/10.1109/ACCESS.2022.3201139.

[2] M. Mohamed Zaidi et al., “Suspicious Human Activity Recognition From Surveillance Videos Using Deep Learning,” IEEE Access, vol. 12, pp. 105497-105510, 2024, https://doi.org/10.1109/ACCESS.2024.3436653.

[3] M. M. Taye, “Understanding of Machine Learning with Deep Learning: Architectures, Workflow, Applications and Future Directions,” Computers, vol. 12, no. 5, p. 91, 2023, https://doi.org/10.3390/computers12050091.

[4] S. Dargan, M. Kumar, M. R. Ayyagari, and G. Kumar, “A Survey of Deep Learning and Its Applications: A New Paradigm to Machine Learning,” Archives of Computational Methods in Engineering, vol. 27, no. 4, pp. 1071-1092, 2020, https://doi.org/10.1016/j.jbiomech.2024.112027.

[5] H. Wang et al., “Markerless gait analysis through a single camera and computer vision,” Journal of Biomechanics, vol. 165, p. 112027, 2024, https://doi.org/10.1016/j.jbiomech.2024.112027.

[6] E. A. Tunggadewi, E. I. Agustin, and R. T. Yunardi, “A smart wearable device based on internet of things for the safety of children in online transportation,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 22, no. 2, pp. 708-716, 2021, http://doi.org/10.11591/ijeecs.v22.i2.pp708-716.

[7] C. Zhou, D. Feng, S. Chen, N. Ban, and J. Pan, “Portable vision-based gait assessment for post-stroke rehabilitation using an attention-based lightweight CNN,” Expert Systems with Applications, vol. 238, p. 122074, 2024, https://doi.org/10.1016/j.eswa.2023.122074.

[8] Y. -K. Wang, C. -T. Fan, K. -Y. Cheng and P. S. Deng, “Real-time camera anomaly detection for real-world video surveillance,” 2011 International Conference on Machine Learning and Cybernetics, pp. 1520-1525, 2011, https://doi.org/10.1109/ICMLC.2011.6017032.

[9] S. Kapoor, A. Sharma, A. Verma, and S. Singh, “Aeriform in-action: A novel dataset for human action recognition in aerial videos,” Pattern Recognition, vol. 140, p. 109505, 2023, https://doi.org/10.1016/j.patcog.2023.109505.

[10] R. G. Sinclair, J. Gaio, S. D. Huazano, S. A. Wiafe, and W. C. Porter, “A Balloon Mapping Approach to Forecast Increases in PM10 from the Shrinking Shoreline of the Salton Sea,” Geographies, vol. 4, no. 4, pp. 630-640, 2024, https://doi.org/10.3390/geographies4040034.

[11] Y. Kaya, H. ?. ?enol, A. Y. Yi?it, and M. Yakar, “Car Detection from Very High-Resolution UAV Images Using Deep Learning Algorithms,” Photogrammetric Engineering & Remote Sensing, vol. 89, no. 2, pp. 117-123, 2023, https://doi.org/10.14358/PERS.22-00101R2.

[12] B. Mishra, D. Garg, P. Narang, and V. Mishra, “Drone-surveillance for search and rescue in natural disaster,” Computer Communications, vol. 156, pp. 1-10, 2020, https://doi.org/10.1016/j.comcom.2020.03.012.

[13] A. Srivastava, T. Badal, A. Garg, A. Vidyarthi, and R. Singh, “Recognizing human violent action using drone surveillance within real-time proximity,” Journal of Real-Time Image Processing, vol. 18, pp. 1851-1863, 2021, https://doi.org/10.1007/s11554-021-01171-2.

[14] E. S. Rahayu, E. M. Yuniarno, I. K. E. Purnama and M. H. Purnomo, “A Combination Model of Shifting Joint Angle Changes With 3D-Deep Convolutional Neural Network to Recognize Human Activity,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 32, pp. 1078-1089, 2024, https://doi.org/10.1109/TNSRE.2024.3371474.

[15] B. Kwolek, A. Michalczuk, T. Krzeszowski, A. Switonski, H. Josinski, and K. Wojciechowski, “Calibrated and synchronized multi-view video and motion capture dataset for evaluation of gait recognition,” Multimedia Tools and Applications, vol. 78, no. 22, pp. 32437-32465, 2019, https://doi.org/10.1007/s11042-019-07945-y.

[16] L. Zhang, Q. Zhang, L. Zhang, D. Tao, X. Huang, and B. Du, “Ensemble manifold regularized sparse low-rank approximation for multiview feature embedding,” Pattern Recognition, vol. 48, no. 10, pp. 3102-3112, 2015, https://doi.org/10.1016/j.patcog.2014.12.016.

[17] J. H. Yoo and M. S. Nixon, “Automated markerless analysis of human gait motion for recognition and classification,” ETRI Journal, vol. 33, no. 2, pp. 259-266, 2011, https://doi.org/10.4218/etrij.11.1510.0068.

[18] K. M. Oikonomou, I. Kansizoglou, I. T. Papapetros and A. Gasteratos, “A Bio-Inspired Elderly Action Recognition System for Ambient Assisted Living,” 2023 18th International Workshop on Cellular Nanoscale Networks and their Applications (CNNA), pp. 1-6, 2023, https://doi.org/10.1109/CNNA60945.2023.10652802.

[19] M. Grimmer et al., “Lower limb joint biomechanics-based identification of gait transitions in between level walking and stair ambulation,” PLoS One, vol. 15, no. 9, p. e0239148, 2020, https://doi.org/10.1371/journal.pone.0239148.

[20] M. A. R. Ahad, M. Ahmed, A. Das Antar, Y. Makihara, and Y. Yagi, “Action recognition using kinematics posture feature on 3D skeleton joint locations,” Pattern Recognition Letters, vol. 145, pp. 216-224, 2021, https://doi.org/10.1016/j.patrec.2021.02.013.

[21] M. Ota, H. Tateuchi, T. Hashiguchi, and N. Ichihashi, “Verification of validity of gait analysis systems during treadmill walking and running using human pose tracking algorithm,” Gait Posture, vol. 85, pp. 290-297, 2021, https://doi.org/10.1016/j.gaitpost.2021.02.006.

[22] H. Ullah and A. Munir, “Human Activity Recognition Using Cascaded Dual Attention CNN and Bi-Directional GRU Framework,” Journal of Imaging, vol. 9, no. 7, p. 130, 2023, https://doi.org/10.3390/jimaging9070130.

[23] J. Stenum, M. M. Hsu, A. Y. Pantelyat, and R. T. Roemmich, “Clinical gait analysis using video-based pose estimation: Multiple perspectives, clinical populations, and measuring change,” PLOS Digital Health, vol. 3, no. 3, p. e0000467, 2024, https://doi.org/10.1371/journal.pdig.0000467.

[24] M. Mundt, Z. Born, M. Goldacre, and J. Alderson, “Estimating Ground Reaction Forces from Two-Dimensional Pose Data: A Biomechanics-Based Comparison of AlphaPose, BlazePose, and OpenPose,” Sensors, vol. 23, no. 1, p. 78, 2023, https://doi.org/10.3390/s23010078.

[25] N. Nakano et al., “Evaluation of 3D Markerless Motion Capture Accuracy Using OpenPose With Multiple Video Cameras,” Frontiers in Sports and Active Living, vol. 2, 2020, https://doi.org/10.3389/fspor.2020.00050.

[26] R. T. Yunardi, T. A. Sardjono and R. Mardiyanto, “Motion Capture System based on RGB Camera for Human Walking Recognition using Marker-based and Markerless for Kinematics of Gait,” 2023 IEEE 13th Symposium on Computer Applications & Industrial Electronics (ISCAIE), pp. 262-267, 2023, https://doi.org/10.1109/ISCAIE57739.2023.10164935.

[27] J. W. Kim, J. Y. Choi, E. J. Ha, and J. H. Choi, “Human Pose Estimation Using MediaPipe Pose and Optimization Method Based on a Humanoid Model,” Applied Sciences, vol. 13, no. 4, p. 2700, 2023, https://doi.org/10.3390/app13042700.

[28] Y. Lin, X. Jiao, and L. Zhao, “Detection of 3D Human Posture Based on Improved Mediapipe,” Journal of Computer and Communications, vol. 11, no. 02, pp. 102-121, 2023, https://doi.org/10.4236/jcc.2023.112008.

[29] Z. Liu, Q. Liu, W. Xu, Z. Liu, Z. Zhou, and J. Chen, “Deep learning-based human motion prediction considering context awareness for human-robot collaboration in manufacturing,” Procedia CIRP, vol. 83, pp. 272-278, 2019, https://doi.org/10.1016/j.procir.2019.04.080.

[30] M. Y. Heravi, Y. Jang, I. Jeong, and S. Sarkar, “Deep learning-based activity-aware 3D human motion trajectory prediction in construction,” Expert Systems with Applications, vol. 239, p. 122423, 2024, https://doi.org/10.1016/j.eswa.2023.122423.

[31] L. Zhang, “Applying Deep Learning-Based Human Motion Recognition System in Sports Competition,” Frontiers in Neurorobotics, vol. 16, 2022, https://doi.org/10.3389/fnbot.2022.860981.

[32] T. H. Lee et al., “Comparative Analysis of 1D – CNN, GRU, and LSTM for Classifying Step Duration in Elderly and Adolescents Using Computer Vision,” International Journal of Robotics and Control Systems, vol. 5, no. 1, pp. 426-439, 2025, https://doi.org/10.31763/ijrcs.v5i1.1588.

[33] S. Nam and S. Lee, “JT-MGCN: Joint-temporal Motion Graph Convolutional Network for Skeleton-Based Action Recognition,” 2020 25th International Conference on Pattern Recognition (ICPR), pp. 6383-6390, 2021, https://doi.org/10.1109/ICPR48806.2021.9412533.

[34] H. Wang, B. Yu, K. Xia, J. Li, and X. Zuo, “Skeleton edge motion networks for human action recognition,” Neurocomputing, vol. 423, pp. 1-12, 2021, https://doi.org/10.1016/j.neucom.2020.10.037.

[35] S. S. Patil, S. S. Pardeshi, and A. D. Patange, “Health Monitoring of Milling Tool Inserts Using CNN Architectures Trained by Vibration Spectrograms,” CMES - Computer Modeling in Engineering and Sciences, vol. 136, no. 1, pp. 177-199, 2023, https://doi.org/10.32604/cmes.2023.025516.

[36] J. A. Gamble and J. Huang, “Convolutional Neural Network for Human Activity Recognition and Identification,” 2020 IEEE International Systems Conference (SysCon), pp. 1-7, 2020, https://doi.org/10.1109/SysCon47679.2020.9275924.

[37] Irfanullah, T. Hussain, A. Iqbal, B. Yang, and A. Hussain, “Real time violence detection in surveillance videos using Convolutional Neural Networks,” Multimedia Tools and Applications, vol. 81, no. 26, pp. 38151-38173, 2022, https://doi.org/10.1007/s11042-022-13169-4.

[38] H. Lee and D. Shin, “Beyond Information Distortion: Imaging Variable-Length Time Series Data for Classification,” Sensors, vol. 25, no. 3, p. 621, 2025, https://doi.org/10.3390/s25030621.

[39] J. Zhu, H. Chen and W. Ye, “Classification of Human Activities Based on Radar Signals using 1D-CNN and LSTM,” 2020 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1-5, 2020, https://doi.org/10.1109/ISCAS45731.2020.9181233.

[40] R. T. Yunardi, T. A. Sardjono, and R. Mardiyanto, “Enhancing Surveillance Vision-Based Human Action Recognition Using Skeleton Joint Swing and Angle Feature and Modified AlexNet-LSTM,” International Journal of Intelligent Engineering and Systems, vol. 18, no. 1, pp. 754-768, 2025, https://doi.org/10.22266/ijies2025.0229.53.

[41] V. B. Semwal et al., “Development of the LSTM Model and Universal Polynomial Equation for All the Sub-Phases of Human Gait,” IEEE Sensors Journal, vol. 23, no. 14, pp. 15892-15900, 2023, https://doi.org/10.1109/JSEN.2023.3281401.

[42] M. -K. Yi, K. Han and S. O. Hwang, “Fall Detection of the Elderly Using Denoising LSTM-Based Convolutional Variant Autoencoder,” IEEE Sensors Journal, vol. 24, no. 11, pp. 18556-18567, 2024, https://doi.org/10.1109/JSEN.2024.3388478.

[43] J. Engel, J. Sturm and D. Cremers, “Camera-based navigation of a low-cost quadrocopter,” 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2815-2821, 2012, https://doi.org/10.1109/IROS.2012.6385458.

[44] N. Gageik, P. Benz and S. Montenegro, “Obstacle Detection and Collision Avoidance for a UAV With Complementary Low-Cost Sensors,” IEEE Access, vol. 3, pp. 599-609, 2015, https://doi.org/10.1109/ACCESS.2015.2432455.

[45] R. T. Yunardi, T. A. Sardjono, and R. Mardiyanto, “Skeleton-Based Gait Recognition Using Modified Deep Convolutional Neural Networks and Long Short-Term Memory for Person Recognition,” IEEE Access, vol. 12, pp. 121131-121143, 2024, https://doi.org/10.1109/ACCESS.2024.3451495.

[46] A. K. Ozcanli and M. Baysal, “Islanding detection in microgrid using deep learning based on 1D CNN and CNN-LSTM networks,” Sustainable Energy, Grids and Networks, vol. 32, p. 100839, 2022, https://doi.org/10.1016/j.segan.2022.100839.

[47] S. Guessoum et al., “The Short-Term Prediction of Length of Day Using 1D Convolutional Neural Networks (1D CNN),” Sensors, vol. 22, no. 23, p. 9517, 2022, https://doi.org/10.3390/s22239517.

[48] U. Ileri, Y. Altun, and A. Narin, “An Efficient Approach for Automatic Fault Classification Based on Data Balance and One-Dimensional Deep Learning,” Applied Sciences, vol. 14, no. 11, p. 4899, 2024, https://doi.org/10.3390/app14114899.

[49] A. Zaibi, A. Ladgham, and A. Sakly, “A Lightweight Model for Traffic Sign Classification Based on Enhanced LeNet-5 Network,” Journal of Sensors, vol. 2021, no. 1, pp. 1-13, 2021, https://doi.org/10.1155%2F2021%2F8870529.

[50] S. Balasubramaniam, Y. Velmurugan, D. Jaganathan, and S. Dhanasekaran, “A Modified LeNet CNN for Breast Cancer Diagnosis in Ultrasound Images,” Diagnostics, vol. 13, no. 17, p. 2746, 2023, https://doi.org/10.3390/diagnostics13172746.

[51] S. G. Lee, Y. Sung, Y. G. Kim, and E. Y. Cha, “Variations of AlexNet and GoogLeNet to improve Korean character recognition performance,” Journal of Information Processing Systems, vol. 14, no. 1, pp. 205-217, 2018, https://doi.org/10.3745/JIPS.04.0061.

[52] N. N. A. A. Hamid, R. A. Razali, and Z. Ibrahim, “Comparing bags of features, conventional convolutional neural network and alexnet for fruit recognition,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 14, no. 1, pp. 333-339, 2019, http://doi.org/10.11591/ijeecs.v14.i1.pp333-339.

[53] K. Prastika and Lina, “Application of individual activity recognition in the room using CNN Alexnet method,” IOP Conference Series: Materials Science and Engineering, 2020, https://doi.org/10.1088/1757-899X/1007/1/012162.

[54] S. Kiranyaz, O. Avci, O. Abdeljaber, T. Ince, M. Gabbouj, and D. J. Inman, “1D convolutional neural networks and applications: A survey,” Mechanical Systems and Signal Processing, vol. 151, p. 107398, 2021, https://doi.org/10.1016/j.ymssp.2020.107398.

[55] S. Lu, S. H. Wang, and Y. D. Zhang, “Detection of abnormal brain in MRI via improved AlexNet and ELM optimized by chaotic bat algorithm,” Neural Computing and Applications, vol. 33, pp. 10799-10811, 2021, https://doi.org/10.1007/s00521-020-05082-4.

[56] E. Mohan et al., “Thyroid Detection and Classification Using DNN Based on Hybrid Meta-Heuristic and LSTM Technique,” IEEE Access, vol. 11, pp. 68127-68138, 2023, https://doi.org/10.1109/ACCESS.2023.3289511.

[57] N. N. Jafery, S. N. Sulaiman, M. K. Osman, N. K. A. Karim, Z. H. C. Soh, and N. A. M. Isa, “Comparative Analysis of Hybrid 1D-CNN-LSTM and VGG16-1D-LSTM for Lung Lesion Classification,” Journal of Electrical Engineering and Technology, vol. 20, pp. 2617-2630, 2025, https://doi.org/10.1007/s42835-025-02182-w.

[58] M. R. Ahmed, S. Islam, A. K. M. M. Islam, and S. Shatabda, “An ensemble 1D-CNN-LSTM-GRU model with data augmentation for speech emotion recognition,” Expert Systems with Applications, vol. 218, p. 119633, 2023, https://doi.org/10.1016/j.eswa.2023.119633.

[59] S. E. Mansour and A. Sakhi, “Detection of Sealing Defects in Canned Sardines Using Local Binary Pattern and Perceptron Techniques for Enhanced Quality Control,” International Journal of Robotics and Control Systems, vol. 5, no. 1, pp. 585-598, 2025, https://doi.org/10.31763/ijrcs.v5i1.1737.

[60] D. C. E. Saputra, E. I. Muryadi, I. Futri, T. A. Win, K. Sunat, and T. Ratnaningsih, “Revolutionizing Anemia Classification with Multilayer Extremely Randomized Tree Learning Machine for Unprecedented Accuracy,” International Journal of Robotics and Control Systems, vol. 4, no. 2, pp. 758-778, 2024, https://doi.org/10.31763/ijrcs.v4i2.1379.


Refbacks

  • There are currently no refbacks.


Copyright (c) 2025 Riky Tri Yunardi

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

 


About the JournalJournal PoliciesAuthor Information

International Journal of Robotics and Control Systems
e-ISSN: 2775-2658
Website: https://pubs2.ascee.org/index.php/IJRCS
Email: ijrcs@ascee.org
Organized by: Association for Scientific Computing Electronics and Engineering (ASCEE)Peneliti Teknologi Teknik IndonesiaDepartment of Electrical Engineering, Universitas Ahmad Dahlan and Kuliah Teknik Elektro
Published by: Association for Scientific Computing Electronics and Engineering (ASCEE)
Office: Jalan Janti, Karangjambe 130B, Banguntapan, Bantul, Daerah Istimewa Yogyakarta, Indonesia