Multi Scale Convolutional Fusion Network for Image Retrieval
Subject Areas : Computer Engineering
1 - Department of Computer Engineering, Sari Branch, Islamic Azad University, Sari, Iran
Keywords: Image Retrieval, Feature Extraction, Deep Learning, Parallel Filters, Convolutional Network, ,
Abstract :
The task of image retrieval from large-scale databases presents a major challenge in computer vision due to the limitations of traditional text-based and content-based methods in fully capturing visual features, often leading to a "semantic gap." A novel method, the Multi-Scale Convolutional Fusion Network (MSCFNet), has been introduced to improve both the accuracy and efficiency of image retrieval by employing multi-scale convolutional layers. MSCFNet uses filters of different sizes to simultaneously extract fine, medium, and large-scale features, offering a more detailed representation of images. This enables better detection of diverse patterns and visual details, enhancing image matching and retrieval performance. Additionally, MSCFNet minimizes model complexity by using the "addition" operation for feature fusion, maintaining computational efficiency without increasing feature map dimensionality. MSCFNet was implemented in two versions, one with 2 layers and another with 4 layers, and tested on CIFAR-10, CIFAR-100, and Fashion-MNIST datasets. The results show MSCFNet consistently outperforms more complex models like ResNet18 and ResNet50, with accuracies of 74.43% on CIFAR-10, 38.87% on CIFAR-100, and 92.47% on Fashion-MNIST. Furthermore, MSCFNet greatly reduces parameters and training time, with the 2-layer version requiring just 113.1 seconds on CIFAR-10 while maintaining high accuracy. The 4-layer version further improves accuracy and F-Score across all datasets. MSCFNet's balance of accuracy, efficiency, and reduced complexity makes it ideal for use in resource-limited environments.
[1] a, G. G., & Khanna, A. (2024). Content Based Image Retrieval System Using CNN based Deep Learning Models. Procedia Computer Science, 235,
3131-3141. doi:https://doi.org/10.1016/j.procs.2024.04.296 [2] a, L. C., & Liu, M. (2024). An intelligent deep hash coding network for content-based medical image retrieval for healthcare applications. Egyptian
Informatics Journal, 27(100499). doi:https://doi.org/10.1016/j.eij.2024.100499 [3] a, Y. L., b, J. M., & Zhang, Y. (2021). Image retrieval from remote sensing big data: A survey. Information Fusion, 67, 94-115.
doi:https://doi.org/10.1016/j.inffus.2020.10.008 [4] b, Y. Y. a., a, S. J., b, J. H., b, B. X., a, J. L., & Xiao, R. ( 2020). Image retrieval via learning content-based deep quality model towards big data. Future
Generation Computer Systems, 112, 243-249. [5] Chen, Y., Ling, M., Liu, Y., Chen, X., Li, Y., & Tong, B. (2024). Enhancing MRI image retrieval using autoencoder-based deep learning: A solution for
efficient clinical and teaching applications. 17(3). doi:https://doi.org/10.1016/j.jrras.2024.100932 [6] Ciocca, G., Napoletano, P., & Schettini, R. (2018). CNN-based features for retrieval and classification of food images. Computer Vision and Image
Understanding, 176, 70-77. doi:https://doi.org/10.1016/j.cviu.2018.09.001 [7] Dubey, S. R. (2022). A Decade Survey of Content Based Image Retrieval Using Deep Learning. EEE Transactions on Circuits and Systems for Video
Technology, 32(5), 2687-2704. doi:0.1109/TCSVT.2021.3080920 [8] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. Paper presented at the 2016 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA. [9] Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. Paper presented at the In Proceedings of the IEEE conference on computer
vision and pattern recognition, Salt Lake City, UT, USA. [10] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Advances in neural
information processing systems, 25, 1097-1105. [11] Kumar, M., Singh, R., & Mukherjee, P. (2024). VTHSC-MIR: Vision Transformer Hashing with Supervised Contrastive learning based medical image
retrieval. 184, 28-36. doi:https://doi.org/10.1016/j.patrec.2024.06.003 [12] Lecun, Y., Bottou, L., Bengio, Y., & Haffne, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-
2324. doi:10.1109/5.726791 [13] Leticio, G. R., Kawai, V. S., Valem, L. P., Pedronette, D. C. G., & Torres, R. d. S. (2024). Manifold information through neighbor embedding projection
for image retrieval. Pattern Recognition Letters, 183, 17-25. doi:https://doi.org/10.1016/j.patrec.2024.04.022 [14] Li, X., Yang, J., & Ma, J. (2021). Recent developments of content-based image retrieval (CBIR). Neurocomputing, 452, 675-689.
[15] Liu, C., Ma, J., Tang, X., Liu, F., Zhang, X., & Jiao, L. (2021). Deep Hash Learning for Remote Sensing Image Retrieval. IEEE Transactions on Geoscience
and Remote Sensing, 59(4), 3420-3443. doi:10.1109/TGRS.2020.3007533 [16] Qin, J., Chen, J., Xiang, X., Tan, Y., Ma, W., & Wang, J. (2020). A privacy-preserving image retrieval method based on deep learning and adaptive
weighted fusion. J Real-Time Image Proc, 17, 161–173. doi:https://doi.org/10.1007/s11554-019-00909-3 [17] Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
[18] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., . . . Rabinovich, A. (2015). Going deeper with convolutions. Paper presented at the
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA. [19] Wang, S., Xia, Y., Xiang, N., Qian, K., Yang, X., You, L., & Zhang, J. (2024). Multi-colour sketch-based image retrieval with an explicable feature
embedding. Engineering Applications of Artificial Intelligence, 135. doi: https://doi.org/10.1016/j.engappai.2024.108757. [20] Yan, C., Gong, B., Wei, Y., & Gao, Y. (2021). Deep Multi-View Enhancement Hashing for Image Retrieval. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 43(4), 1445-1451. doi:10.1109/TPAMI.2020.297579 [21] Yu, W., Yang, K., Yao, H., Sun, X., & Xu, P. (2017). Exploiting the complementary strengths of multi-layer CNN features for image retrieval.
Neurocomputing, 237, 235-241. doi:https://doi.org/10.1016/j.neucom.2016.12.002 [22] Zhang, Z., Cheng, S., & Wang, L. (2023). Combined query image retrieval based on hybrid coding of CNN and Mix-Transformer. Expert Systems With
Applications, 234. doi:https://doi.org/10.1016/j.eswa.2023.121060 [23] Zhao, D., Qiu, Z., Jiang, Y., Zhu, X., Zhang, X., & Tao, Z. (2024). A depthwise separable CNN-based interpretable feature extraction network for
automatic pathological voice detection. Biomedical Signal Processing and Control, 88. doi:https://doi.org/10.1016/j.bspc.2023.105624 [24] Zhao, K., Xiao, J., Li, C., Xu, Z., & Yue, M. (2023). Fault diagnosis of rolling bearing using CNN and PCA fractal based feature extraction.
Measurement, 223. doi:https://doi.org/10.1016/j.measurement.2023.113754