شبکه همجوشی کانولوشنی چند مقیاسی برای بازیابی تصویر
محورهای موضوعی : Computer Engineering
1 - Department of Computer engineering, Sari Branch, Islamic Azad University, Sari, IRAN.
کلید واژه: بازیابی تصویر, استخراج ویژگی, یادگیری عمیق, فیلترهای موازی, شبکه کانولوشنی,
چکیده مقاله :
بازیابی تصویر از پایگاههای داده بزرگمقیاس به دلیل محدودیتهای روشهای سنتی متنی و مبتنی بر محتوا چالشی مهم در بینایی ماشین است که اغلب منجر به ایجاد "شکاف معنایی" میشود. شبکه همجوشی کانولوشنی چند مقیاسی (MSCFNet) به عنوان روشی نوین برای بهبود دقت و کارایی در بازیابی تصویر معرفی شده است. این شبکه با استفاده از فیلترهایی با اندازههای مختلف، ویژگیهای کوچک، متوسط و بزرگمقیاس را بهطور همزمان استخراج میکند و یک نمایه جامعتر از تصاویر ارائه میدهد. این رویکرد باعث بهبود تشخیص الگوهای متنوع و جزئیات بصری شده و عملکرد بازیابی تصویر را افزایش میدهد. MSCFNet با عملیات "جمع" برای همجوشی ویژگیها پیچیدگی مدل را کاهش داده و کارایی محاسباتی را حفظ میکند. این شبکه در دو نسخه با 2 و 4 لایه چند مقیاسی پیادهسازی شده و در مجموعه دادههای CIFAR-10، CIFAR-100 و Fashion-MNIST ارزیابی شده است. نتایج نشان میدهد که MSCFNet نسبت به مدلهای پیچیدهتر مانند ResNet18 و ResNet50 عملکرد بهتری دارد و در CIFAR-10 دقت 74.43٪، در CIFAR-100 دقت 38.87٪ و در Fashion-MNIST دقت 92.47٪ به دست میآورد. همچنین، نسخه دو لایه با زمان آموزش 113.1 ثانیه در CIFAR-10، دقت بالا و پیچیدگی کمتری را ارائه میدهد. MSCFNet به دلیل تعادل در دقت و کارایی محاسباتی، برای محیطهای با منابع محدود مناسب است.
The task of image retrieval from large-scale databases presents a major challenge in computer vision due to the limitations of traditional text-based and content-based methods in fully capturing visual features, often leading to a "semantic gap." A novel method, the Multi-Scale Convolutional Fusion Network (MSCFNet), has been introduced to improve both the accuracy and efficiency of image retrieval by employing multi-scale convolutional layers. MSCFNet uses filters of different sizes to simultaneously extract fine, medium, and large-scale features, offering a more detailed representation of images. This enables better detection of diverse patterns and visual details, enhancing image matching and retrieval performance. Additionally, MSCFNet minimizes model complexity by using the "addition" operation for feature fusion, maintaining computational efficiency without increasing feature map dimensionality. MSCFNet was implemented in two versions, one with 2 layers and another with 4 layers, and tested on CIFAR-10, CIFAR-100, and Fashion-MNIST datasets. The results show MSCFNet consistently outperforms more complex models like ResNet18 and ResNet50, with accuracies of 74.43% on CIFAR-10, 38.87% on CIFAR-100, and 92.47% on Fashion-MNIST. Furthermore, MSCFNet greatly reduces parameters and training time, with the 2-layer version requiring just 113.1 seconds on CIFAR-10 while maintaining high accuracy. The 4-layer version further improves accuracy and F-Score across all datasets. MSCFNet's balance of accuracy, efficiency, and reduced complexity makes it ideal for use in resource-limited environments.
[1] a, G. G., & Khanna, A. (2024). Content Based Image Retrieval System Using CNN based Deep Learning Models. Procedia Computer Science, 235,
3131-3141. doi:https://doi.org/10.1016/j.procs.2024.04.296 [2] a, L. C., & Liu, M. (2024). An intelligent deep hash coding network for content-based medical image retrieval for healthcare applications. Egyptian
Informatics Journal, 27(100499). doi:https://doi.org/10.1016/j.eij.2024.100499 [3] a, Y. L., b, J. M., & Zhang, Y. (2021). Image retrieval from remote sensing big data: A survey. Information Fusion, 67, 94-115.
doi:https://doi.org/10.1016/j.inffus.2020.10.008 [4] b, Y. Y. a., a, S. J., b, J. H., b, B. X., a, J. L., & Xiao, R. ( 2020). Image retrieval via learning content-based deep quality model towards big data. Future
Generation Computer Systems, 112, 243-249. [5] Chen, Y., Ling, M., Liu, Y., Chen, X., Li, Y., & Tong, B. (2024). Enhancing MRI image retrieval using autoencoder-based deep learning: A solution for
efficient clinical and teaching applications. 17(3). doi:https://doi.org/10.1016/j.jrras.2024.100932 [6] Ciocca, G., Napoletano, P., & Schettini, R. (2018). CNN-based features for retrieval and classification of food images. Computer Vision and Image
Understanding, 176, 70-77. doi:https://doi.org/10.1016/j.cviu.2018.09.001 [7] Dubey, S. R. (2022). A Decade Survey of Content Based Image Retrieval Using Deep Learning. EEE Transactions on Circuits and Systems for Video
Technology, 32(5), 2687-2704. doi:0.1109/TCSVT.2021.3080920 [8] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. Paper presented at the 2016 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA. [9] Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. Paper presented at the In Proceedings of the IEEE conference on computer
vision and pattern recognition, Salt Lake City, UT, USA. [10] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Advances in neural
information processing systems, 25, 1097-1105. [11] Kumar, M., Singh, R., & Mukherjee, P. (2024). VTHSC-MIR: Vision Transformer Hashing with Supervised Contrastive learning based medical image
retrieval. 184, 28-36. doi:https://doi.org/10.1016/j.patrec.2024.06.003 [12] Lecun, Y., Bottou, L., Bengio, Y., & Haffne, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-
2324. doi:10.1109/5.726791 [13] Leticio, G. R., Kawai, V. S., Valem, L. P., Pedronette, D. C. G., & Torres, R. d. S. (2024). Manifold information through neighbor embedding projection
for image retrieval. Pattern Recognition Letters, 183, 17-25. doi:https://doi.org/10.1016/j.patrec.2024.04.022 [14] Li, X., Yang, J., & Ma, J. (2021). Recent developments of content-based image retrieval (CBIR). Neurocomputing, 452, 675-689.
[15] Liu, C., Ma, J., Tang, X., Liu, F., Zhang, X., & Jiao, L. (2021). Deep Hash Learning for Remote Sensing Image Retrieval. IEEE Transactions on Geoscience
and Remote Sensing, 59(4), 3420-3443. doi:10.1109/TGRS.2020.3007533 [16] Qin, J., Chen, J., Xiang, X., Tan, Y., Ma, W., & Wang, J. (2020). A privacy-preserving image retrieval method based on deep learning and adaptive
weighted fusion. J Real-Time Image Proc, 17, 161–173. doi:https://doi.org/10.1007/s11554-019-00909-3 [17] Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
[18] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., . . . Rabinovich, A. (2015). Going deeper with convolutions. Paper presented at the
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA. [19] Wang, S., Xia, Y., Xiang, N., Qian, K., Yang, X., You, L., & Zhang, J. (2024). Multi-colour sketch-based image retrieval with an explicable feature
embedding. Engineering Applications of Artificial Intelligence, 135. doi: https://doi.org/10.1016/j.engappai.2024.108757. [20] Yan, C., Gong, B., Wei, Y., & Gao, Y. (2021). Deep Multi-View Enhancement Hashing for Image Retrieval. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 43(4), 1445-1451. doi:10.1109/TPAMI.2020.297579 [21] Yu, W., Yang, K., Yao, H., Sun, X., & Xu, P. (2017). Exploiting the complementary strengths of multi-layer CNN features for image retrieval.
Neurocomputing, 237, 235-241. doi:https://doi.org/10.1016/j.neucom.2016.12.002 [22] Zhang, Z., Cheng, S., & Wang, L. (2023). Combined query image retrieval based on hybrid coding of CNN and Mix-Transformer. Expert Systems With
Applications, 234. doi:https://doi.org/10.1016/j.eswa.2023.121060 [23] Zhao, D., Qiu, Z., Jiang, Y., Zhu, X., Zhang, X., & Tao, Z. (2024). A depthwise separable CNN-based interpretable feature extraction network for
automatic pathological voice detection. Biomedical Signal Processing and Control, 88. doi:https://doi.org/10.1016/j.bspc.2023.105624 [24] Zhao, K., Xiao, J., Li, C., Xu, Z., & Yue, M. (2023). Fault diagnosis of rolling bearing using CNN and PCA fractal based feature extraction.
Measurement, 223. doi:https://doi.org/10.1016/j.measurement.2023.113754