Design and Implementation of a Surveillance Security System Based on YOLO Algorithm and IoT Technology on Mobile Data Network
Subject Areas : Electronic EngineeringMohamadreza Masaeli 1 , Sayed Mohammadali Zanjani 2
1 - Department of Electrical Engineering, Najafabad Branch, Islamic Azad University, Najafabad, Iran
2 - Department of Electrical Engineering, Najafabad Branch, Islamic Azad University, Najafabad, Iran
Keywords: Human activity recognition system (HAR), Machine vision, Violence prevention, Exporting and quantization, YOLO algorithm,
Abstract :
The design and implementation of a surveillance security system based on the YOLO algorithm and Internet of Things (IoT) technology has significant advantages in terms of security, efficiency, scalability, rapid response, and reliability, compared to traditional methods. This paper discusses the aspects of creating a novel security system that activates an alert by detecting five categories: human, human head, gun, knife, and fall detection. The system is monitored online and can connect to the internet via a cellular data network at any location to upload images to the management panel and send a report to the user if threats are detected. The YOLOv8 algorithm is used for object training to take advantage of its user-friendly command line interface, object detection support, sample segmentation, and image classification capabilities. To increase processing speed while maintaining accuracy, the optimized model is deployed on the Raspberry Pi 4th generation board. It is clear that optimizing processing speed and using quantization techniques lead to reduced energy consumption (green energy system) and reduced operational costs of the system. To improve the speed of the model in the object detection process, the techniques of exporting, quantizing the training weights, and increasing the processor frequency (overclocking) are used. A comparison of the newly exported weights with the original training weights in terms of accuracy and speed shows that the two techniques of exporting and quantization lead to an increase in processing speed at the cost of a decrease in detection accuracy.Finally, in the training model with the proposed improvement methods, an average accuracy of mAP ≅ 0.67 with a frame rate of FPS ≅ 4.3 can be achieved.
[1] K. Y. Loh and S. C. Reddy, “Understanding and preventing computer vision syndrome,” Malaysian Family Physician, vol. 3, no. 3. Academy of Family Physicians of Malaysia, p. 128, 2008. Accessed: May 17, 2024. [Online]. Available: /pmc/articles/PMC4170366/
[2] R. Hebbalaguppe, “A computer vision based approach for reducing false alarms caused by spiders and cobwebs in surveillance camera networks,” 2014.
[3] W. Aitfares, A. Kobbane, and A. Kriouile, Suspicious behavior detection of people by monitoring camera, vol. 0. pp. 113–117. doi: 10.1109/ICMCS.2016.7905601.
[4] W. E. I. B. W. N. Afandi and N. M. Isa, “Object Detection: Harmful Weapons Detection Using YOLOv4,” IEEE Symp. Wirel. Technol. Appl. ISWTA, vol. 2021-August, pp. 63–70, 2021, doi: 10.1109/ISWTA52208.2021.9587423.
[5] L. Zhang, L. Lin, X. Liang, and K. He, “Is faster R-CNN doing well for pedestrian detection?,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 9906 LNCS, pp. 443–457, 2016, doi: 10.1007/978-3-319-46475-6_28/TABLES/5.
[6] J. Li, X. Liang, S. Shen, T. Xu, J. Feng, and S. Yan, “Scale-Aware Fast R-CNN for Pedestrian Detection,” IEEE Trans. Multimed., vol. 20, no. 4, pp. 985–996, Apr. 2018, doi: 10.1109/TMM.2017.2759508.
[7] S. Zhang, R. Benenson, and B. Schiele, “Filtered Channel Features for Pedestrian Detection”, doi: 10.48550/arXiv.1501.05759.
[8] S. Zhang, R. Benenson, M. Omran, J. Hosang, and B. Schiele, “How far are we from solving pedestrian detection?,” in Proceedings of the iEEE conference on computer vision and pattern recognition, 2016, pp. 1259–1267. doi: 10.48550/arXiv.1602.01237.
[9] J. Hosang, M. Omran, R. Benenson, and B. Schiele, “Taking a Deeper Look at Pedestrians.” pp. 4073–4082, 2015. doi: 10.48550/arXiv.1501.05790.
[10] P. Dollar, Z. Tu, P. Perona, and S. Belongie, “Integral Channel Features,” in Procedings of the British Machine Vision Conference 2009, British Machine Vision Association, 2009, pp. 91.1-91.11. doi: 10.5244/C.23.91.
[11] P. Dollar, R. Appel, S. Belongie, and P. Perona, “Fast feature pyramids for object detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 8, pp. 1532–1545, 2014, doi: 10.1109/TPAMI.2014.2300479.
[12] P. Dollar, C. Wojek, B. Schiele, and P. Perona, “Pedestrian detection: A benchmark,” pp. 304–311, Mar. 2010, doi: 10.1109/CVPR.2009.5206631.
[13] M. Fabbri, G. Brasó, G. Maugeri, O. Cetintas, R. Gasparini, A. Ošep, S. Calderara, L. Leal-Taixé, and R. Cucchiara, “MOTSynth: How Can Synthetic Data Help Pedestrian Detection and Tracking?,” Proc. IEEE Int. Conf. Comput. Vis., pp. 10829–10839, Aug. 2021, doi: 10.1109/ICCV48922.2021.01067.
[14] J. Mao, T. Xiao, Y. Jiang, and Z. Cao, “What Can Help Pedestrian Detection?” pp. 3127–3136, 2017. doi: 10.48550/arXiv.1705.02757.
[15] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Commun. ACM, vol. 60, no. 6, pp. 84–90, May 2017, doi: 10.1145/3065386.
[16] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc., Sep. 2014, doi: 10.48550/arXiv.1409.1556.
[17] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition.” pp. 770–778, 2016. doi: 10.48550/arXiv.1512.03385.
[18] Z. Cai, Q. Fan, R. S. Feris, and N. Vasconcelos, “A unified multi-scale deep convolutional neural network for fast object detection,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 9908 LNCS, pp. 354–370, 2016, doi: 10.1007/978-3-319-46493-0_22/FIGURES/8.
[19] X. Wang, T. Xiao, Y. Jiang, S. Shao, J. Sun, and C. Shen, “Repulsion Loss: Detecting Pedestrians in a Crowd.” pp. 7774–7783, 2018. doi: 10.48550/arXiv.1711.07752.
[20] R. K. Tiwari and G. K. Verma, “A Computer Vision based Framework for Visual Gun Detection Using Harris Interest Point Detector,” Procedia Comput. Sci., vol. 54, pp. 703–712, Jan. 2015, doi: 10.1016/J.PROCS.2015.06.083.
[21] H. Jain, A. Vikram, Mohana, A. Kashyap, and A. Jain, “Weapon Detection using Artificial Intelligence and Deep Learning for Security Applications,” Proc. Int. Conf. Electron. Sustain. Commun. Syst. ICESC 2020, pp. 193–198, Jul. 2020, doi: 10.1109/ICESC48915.2020.9155832.
[22] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “SSD: Single shot multibox detector,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 9905 LNCS, pp. 21–37, 2016, doi: 10.1007/978-3-319-46448-0_2/FIGURES/5.
[23] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, Jun. , Accessed: May 17, 2024. [Online]. Available: https://github.com/
[24] T. S. S. Hashmi, N. U. Haq, M. M. Fraz, and M. Shahzad, “Application of Deep Learning for Weapons Detection in Surveillance Videos,” 2021 Int. Conf. Digit. Futur. Transform. Technol. ICoDT2 2021, May 2021, doi: 10.1109/ICODT252288.2021.9441523.
[25] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection.” pp. 779–788, 2016. doi: 10.48550/arXiv.1506.02640.
[26] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “YOLOv4: Optimal Speed and Accuracy of Object Detection,” Apr. 2020, doi: 10.48550/arXiv.2004.10934.
[27] A. Singh, T. Anand, S. Sharma, and P. Singh, “IoT Based Weapons Detection System for Surveillance and Security Using YOLOV4,” Proc. 6th Int. Conf. Commun. Electron. Syst. ICCES 2021, pp. 488–493, Jul. 2021, doi: 10.1109/ICCES51350.2021.9489224.
[28] M. T. Bhatti, M. G. Khan, M. Aslam, and M. J. Fiaz, “Weapon Detection in Real-Time CCTV Videos Using Deep Learning,” IEEE Access, vol. 9, pp. 34366–34382, 2021, doi: 10.1109/ACCESS.2021.3059170.
[29] X. Zhang, J. Zou, K. He, and J. Sun, “Accelerating Very Deep Convolutional Networks for Classification and Detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 10, pp. 1943–1955, Oct. 2016, doi: 10.1109/TPAMI.2015.2502579.
[30] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the Inception Architecture for Computer Vision.” pp. 2818–2826, 2016. doi: 10.48550/arXiv.1512.00567.
[31] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi, “Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning,” Proc. AAAI Conf. Artif. Intell., vol. 31, no. 1, pp. 4278–4284, Feb. 2017, doi: 10.1609/aaai.v31i1.11231.
[32] S. Barratt and R. Sharma, “A Note on the Inception Score,” Jan. 2018, doi: 10.48550/arXiv.1801.01973.
[33] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” Apr. 2017, doi: 10.48550/arXiv.1704.04861.
[34] Y. Harjoseputro, I. P. Yuda, and K. P. Danukusumo, “MobileNets: Efficient Convolutional Neural Network for Identification of Protected Birds,” Int. J. Adv. Sci. Eng. Inf. Technol., vol. 10, no. 6, pp. 2290–2296, Dec. 2020, doi: 10.18517/ijaseit.10.6.10948.
[35] İ. Karakaya, I. Şafak, O. Öztürk, M. Bal, and Y. E. Esin, “Gun Detection with Faster R-CNN in X-Ray Images,” in 2020 28th Signal Processing and Communications Applications Conference (SIU), IEEE, Oct. 2020, pp. 1–4. doi: 10.1109/SIU49456.2020.9302457.
[36] J. Lim, M. I. Al Jobayer, V. M. Baskaran, J. M. Lim, K. Wong, and J. See, “Gun detection in surveillance videos using deep neural networks,” 2019 Asia-Pacific Signal Inf. Process. Assoc. Annu. Summit Conf. APSIPA ASC 2019, pp. 1998–2002, Nov. 2019, doi: 10.1109/APSIPAASC47483.2019.9023182.
[37] S. Shao, Z. Zhao, B. Li, T. Xiao, G. Yu, X. Zhang, and J. Sun, “CrowdHuman: A Benchmark for Detecting Human in a Crowd,” Apr. 2018, doi: 10.48550/arXiv.1805.00123.
[38] N. Yu and J. Lv, “Human body posture recognition algorithm for still images,” J. Eng., vol. 2020, no. 13, pp. 322–325, 2020, doi: 10.1049/joe.2019.1146.
[39] G. Santos, P. Endo, K. Monteiro, E. Rocha, I. Silva, and T. Lynn, “Accelerometer-Based Human Fall Detection Using Convolutional Neural Networks,” Sensors, vol. 19, no. 7, p. 1644, Apr. 2019, doi: 10.3390/s19071644.
[40] B. LUO and L. U. O. Bo, “Human Fall Detection for Smart Home Caring using Yolo Networks,” Int. J. Adv. Comput. Sci. Appl., vol. 14, no. 4, p. 2023, 2023, doi: 10.14569/IJACSA.2023.0140409.
[41] R. Girshick, F. Iandola, T. Darrell, and J. Malik, “Deformable part models are convolutional neural networks,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2015, pp. 437–446. doi: 10.48550/arXiv.1409.5403.
[42] R. Padilla, W. L. Passos, T. L. B. Dias, S. L. Netto, and E. A. B. Da Silva, “A Comparative Analysis of Object Detection Metrics with a Companion Open-Source Toolkit,” Electron. 2021, Vol. 10, Page 279, vol. 10, no. 3, p. 279, Jan. 2021, doi: 10.3390/ELECTRONICS10030279.
[43] P. Henderson and V. Ferrari, “End-to-End Training of Object Class Detectors for Mean Average Precision,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 10115 LNCS, pp. 198–213, 2017, doi: 10.1007/978-3-319-54193-8_13.
[44] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (VOC) challenge,” Int. J. Comput. Vis., vol. 88, no. 2, pp. 303–338, Jun. 2010, doi: 10.1007/S11263-009-0275-4/METRICS.
[45] E. Bisong, “Building Machine Learning and Deep Learning Models on Google Cloud Platform,” Build. Mach. Learn. Deep Learn. Model. Google Cloud Platf., 2019, doi: 10.1007/978-1-4842-4470-8.
[46] H. K. Jabbar and R. Z. Khan, “Methods to Avoid Over-Fitting and Under-Fitting in Supervised Machine Learning (Comparative Study),” in Computer Science, Communication and Instrumentation Devices, Singapore: Research Publishing Services, 2014, pp. 163–172. doi: 10.3850/978-981-09-5247-1_017.
[47] K. Raza, H. Song, and S. Hong, “Fast and Accurate Fish Detection Design with Improved YOLO-v3 Model and Transfer Learning,” Artic. Int. J. Adv. Comput. Sci. Appl., vol. 11, no. 2, 2020, doi: 10.14569/IJACSA.2020.0110202.