Multi-Criteria Attention-Based Graph Neural Network: A Heterogeneous Representation Learning Framework Abstract. for Logistics System Optimization
Subject Areas : Data Engineering
Mohammad Shahbazi
1
,
Hamid Tohidi
2
,
Majid Nojavan
3
1 -
2 -
3 -
Keywords: Machine learning, Deep learning, Representation learning, Heterogeneous systems, Logistics optimization,
Abstract :
Modeling the intricate relationships within complex logistics systems is essential for op- timizing various operations—such as routing, scheduling, and distribution—in modern supply chains. These systems often exhibit significant diversity in their facilities, transportation modes, and capacity constraints, introducing a phenomenon known as “heterogeneity,” which complicates the modeling pro- cess. To simplify calculations, some researchers assume homogeneous systems, overlooking critical variability in nodes (e.g., warehouses, distribution centers) and edges (e.g., transportation routes, capac- ities). However, ignoring this heterogeneity can lead to a marked decrease in model accuracy.
A representation learning method specifically tailored for heterogeneous logistics systems is proposed here, preserving the multifaceted relationships among components and enhancing model performance in real-world scenarios. Two novel extensions refine the underlying graph-based deep learning architecture by incorporating techniques from deep learning, graph probability models, and machine learning. The approach is evaluated on a representative logistics network dataset, with precision, F1 score, and recall as performance metrics. Experimental results indicate that this method outperforms existing methods by providing higher precision, enabling more accurate classification of system components and better extraction of relationships within complex logistics networks.
[1] Y. Bengio, A. Courville, and P. Vincent, "Representation learning: A review and new per- spectives," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1798–1828, Aug. 2013. doi: 10.1109/TPAMI.2013.50
[2] J. Han, "Mining heterogeneous information networks by exploring the power of links," in Discovery Science. Berlin, Germany: Springer, 2009, pp. 13–30. doi: 10.1007/978-3-642- 04955-63
[3] L. Getoor and C. P. Diehl, "Link mining: A survey," SIGKDD Explorer Newsletter, vol. 7, no. 2, pp. 3–12, Dec. 2005. doi: 10.1145/1089815.1089817
[4] E. Otte and R. Rousseau, "Social network analysis: a powerful strategy, also for the informa- tion science," Journal of Information Science, vol. 28, no. 6, pp. 441–453, Dec. 2002. doi: 10.1177/016555150202800601
[5] S. Chakrabarti, Mining the Web: Analysis of Hypertext and Semi Structured Data, Aug. 2002. doi: 10.1007/3-540-45778-81 3
[6] T. Lewis, Network Science: Theory and Applications, Wiley, 2013. doi: 10.1002/9781118670008
[7] D. J. Cook and L. B. Holdelder, "Graph-based data mining," IEEE Intelligent Systems, vol. 15, no.
2, pp. 32–41, Mar./Apr. 2000. doi: 10.1109/5254.846291
[8] P. Pham, L. T. T. Nguyen, B. Vo, and U. Yun, "Bot2Vec: A general approach of intra-community oriented representation learning for bot detection in different types of social networks," IEEE Access, vol. 10, pp. 12162–12174, 2022. doi: 10.1109/ACCESS.2021.3067189
[9] Y. Sun and J. Han, "Mining heterogeneous information networks: A structural analysis approach," SIGKDD Explorer Newsletter, vol. 14, no. 2, pp. 20–28, Dec. 2013. doi: 10.1145/2541141.2541145
[10] M. Nazari, A. Oroojlooy, L. V. Snyder, and M. Takáč, "Reinforcement Learning for Solving the Vehicle Routing Problem," in NeurIPS, pp. 9839–9849, 2018.
[11] J.Chen,H.Zhou,S.Xie,andS.Sun,"LearningtoOptimizeCapacitatedVehicleRoutingProblems with Heterogeneous Demands via Graph Neural Networks," European Journal of Operational Research, vol. 292, no. 3, pp. 887–901, 2021. doi: 10.1016/j.ejor.2020.11.033
[12] Z.Chen,Y.Liu,andQ.Song,"Agraphneuralnetworkapproachforsolvingcapacitatedvehiclerout- ing problems," Computers & Operations Research, vol. 140, 2022. doi: 10.1016/j.cor.2022.105643
[13] Y. Sun, J. Han, X. Yan, P. Yu, and T. Wu, "Pathsim: Meta path-based top-k similarity search in heterogeneous information networks," in Proceedings of the VLDB Endowment, vol. 4, no. 11, pp. 992–1003, Aug. 2011. doi: 10.14778/2078331.2078334
[14] W. Wang, T. Tang, F. Xia, Z. Gong, Z. Chen, and H. Liu, "Collaborative filtering with network representation learning for citation recommendation," IEEE Transactions on Big Data, vol. 6, no. 2, pp. 283–294, Apr.–Jun. 2020. doi: 10.1109/TBDATA.2018.2839571
10
[15] L.Guo,H.Yin,T.Chen,X.Zhang,andK.Zheng,"Hierarchicalhyperedgeembedding-basedrepre- sentation learning for group recommendation," CoRR, abs/2103.13506, Mar. 2021. doi: 2103.13506
[16] Y. Xu, E. Wang, Y. Yang, and Y. Chang, "A unified collaborative representation learning for neural network based recommender systems," IEEE Transactions on Knowledge and Data Engineering, 2021 (Early Access). doi: 10.1109/TKDE.2021.3061575
[17] D. Zhou, Z. Xu, W. Li, X. Xie, and S. Peng, "MultiDTI: Drug-target interaction prediction based on multi-modal representation learning to bridge the gap between new chemical entities and known heterogeneous network," IEEE Access, vol. 9, pp. 60596–60606, 2021. doi: 10.1109/AC- CESS.2021.3072153
[18] M.M.Li,K.Huang,andM.Zitnik,"Representationlearningfornetworksinbiologyandmedicine: Advancements, challenges, and opportunities," IEEE Access, vol. 9, pp. 40539–40555, 2021. doi: 10.1109/ACCESS.2021.3065465
[19] M. Girvan and M. E. J. Newman, "Community structure in social and biological networks," Proceedings of the National Academy of Sciences, vol. 99, pp. 7821–7826, 2002. doi: 10.1073/pnas.122653799
[20] M. Valueva, N. Nagornov, P. Lyakhov, G. Valuev, and N. Chervyakov, "Application of the residue number system to reduce hardware costs of the convolutional neural network imple- mentation," Mathematics and Computers in Simulation, vol. 177, pp. 232–243, 2020. doi: 10.1016/j.matcom.2020.07.006
[21] W. Zhang, K. Itoh, J. Tanida, and Y. Ichioka, "Parallel distributed processing model with local space-invariant interconnections and its optical architecture," Applied Optics, vol. 29, no. 32, pp. 4790–4799, Nov. 1990. doi: 10.1364/AO.29.004790
[22] R. Salakhutdinov and G. Hinton, "Semantic hashing," International Journal of Approximate Rea- soning, vol. 50, no. 7, pp. 969–978, 2009. doi: 10.1016/j.ijar.2008.11.006
[23] S. Cao, W. Lu, and Q. Xu, "Grarep: Learning graph representations with global structural infor- mation," in Proceedings of the 24th ACM International Conference on Information and Knowledge Management, 2015, pp. 891–900. doi: 10.1145/2806416.2806512
[24] M. E. J. Newman, "Finding community structure in networks using the eigenvectors of matrices," Physical Review E, vol. 74, no. 3, Sep. 2006. doi: 10.1103/PhysRevE.74.036104
[25] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol, "Stacked denoising autoen- coders: Learning useful representations in a deep network with a local denoising criterion," Journal of Machine Learning Research, vol. 11, pp. 3371–3408, 2010.
[26] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin, "Liblinear: A library for large linear classification," The Journal of Machine Learning Research, vol. 9, pp. 1871–1874, 2008. doi: 10.1145/2001269.2001293
[27] Q. V. Le and T. Mikolov, "Distributed representations of sentences and documents," in CoRR, vol. abs/1405.4053, 2014. doi: 10.3115/1963501.1963531
11
[28] N.Djuric,H.Wu,V.Radosavljevic,M.Grbovic,andN.Bhamidipati,"Hierarchicalneurallanguage models for joint representation of streaming documents and their content," in Proceedings of the 24th International Conference on World Wide Web, 2015. doi: 10.1145/2736277.2741093
[29] A. Vaswani, N. Shazeer, N. Parmar, et al., "Attention is all you need," in CoRR, abs/1706.03762, 2017. doi: 10.1145/3292500.3330644
[30] C.Yang,Z.Liu,D.Zhao,M.Sun,andE.Y.Chang,"Networkrepresentationlearningwithrichtext information," in Proceedings of the 24th International Conference on Artificial Intelligence, 2015, pp. 2111–2117. doi: 10.1145/2806416.2806458
[31] L. Tang and H. Liu, "Relational learning via latent social dimensions," in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2009, pp. 817–826. doi: 10.1145/1557019.1557107
[32] T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," in Proceedings of 1st International Conference on Learning Representations (ICLR), 2013.
[33] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality," in Proceedings of the 26th International Conference on Neural Information Processing Systems, 2013, pp. 3111–3119.
[34] B. Perozzi, R. Al-Rfou, and S. Skiena, "Deepwalk: Online learning of social representations," in CoRR, abs/1403.6652, 2014. doi: 10.1145/2736277.2741093
[35] A. Grover and J. Leskovec, "node2vec: Scalable feature learning for networks," in CoRR, abs/1607.00653, 2016. doi: 10.1145/3097983.3098135
[36] J. Li, J. Zhu, and B. Zhang, "Discriminative deep random walk for network classification," in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016. doi: 10.18653/v1/P16-1119
[37] J.Chen,Q.Zhang,andX.Huang,"Incorporategroupinformationtoenhancenetworkembedding," in Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, 2016, pp. 1901–1904. doi: 10.1145/2983323.2983888
[38] M.Gong,C.Yao,Y.Xie,andM.Xu,"Semi-supervisednetworkembeddingwithtextinformation," Pattern Recognition, vol. 104, pp. 337–347, 2020. doi: 10.1016/j.patcog.2020.107311
[39] C. Li, S. Wang, D. Yang, Z. Li, Y. Yang, X. Zhang, and J. Zhou, "Ppne: Property preserving network embedding," in Database Systems for Advanced Applications, pp. 163–179, 2017. doi: 10.1007/978-3-319-56608-51 2
[40] S. Pan, J. Wu, X. Zhu, C. Zhang, and Y. Wang, "Tri-party deep network representation," in Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016, pp. 1895–1901. doi: 10.24963/ijcai.2016/278
12
[41] C. Zhou, Y. Liu, X. Liu, Z. Liu, and J. Gao, "Scalable graph embedding for asymmetric proximity," in Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017, pp. 2942–2948. doi: 10.1609/aaai.v31i1.31115019
[42] D. Zhang, J. Yin, X. Zhu, and C. Zhang, "User Profile Preserving Social Network Embedding," in Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017, pp. 3378–3384. doi: 10.24963/ijcai.2017/469
[43] S. Wang, J. Tang, C. Aggarwal, and H. Liu, "Linked document embedding for classification," in Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 115–124, 2016. doi: 10.1145/2983323.2983742
[44] S. Wang, J. Tang, F. Morstatter, and H. Liu, "Paired restricted Boltzmann machine for linked data," in Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 1753–1762, 2016. doi: 10.1145/2983323.2983769
[45] G. E. Hinton, "Reducing the dimensionality of data with neural networks," Science, vol. 313, no. 5786, pp. 504–507, Jul. 2006. doi: 10.1126/science.1127647
[46] H. Wang, J. Wang, J. Wang, M. Zhao, W. Zhang, F. Zhang, X. Xie, and M. Guo, "Graphgan: Graph representation learning with generative adversarial nets," in CoRR, abs/1711.08267, 2017. doi: 10.1145/3192448.3192454
[47] I.J.Goodfellow,J.Pouget-Abadie,M.Mirza,etal.,"Generativeadversarialnets,"inProceedingsof the 27th International Conference on Neural Information Processing Systems, 2014, pp. 2672–2680. doi: 10.1145/2969033.2969125
[48] R. B. Joshi and S. Mishra, "Learning graph representations," CoRR, abs/2102.02026, 2021.
[49] D. Wang, P. Cui, and W. Zhu, "Structural deep network embedding," in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1225– 1234, 2016. doi: 10.1145/2939672.2939754
[50] H. Chen, B. Perozzi, Y. Hu, and S. Skiena, "HARP: hierarchical representation learning for net- works," CoRR, abs/1706.07845, 2017.
[51] L. Tang and H. Liu, "Leveraging social media networks for classification," Data Mining and Knowledge Discovery, vol. 23, no. 3, pp. 447–478, Jan. 2011. doi: 10.1007/s10618-010-0197-8
[52] M.Ou,P.Cui,J.Pei,Z.Zhang,andW.Zhu,"Asymmetrictransitivitypreservinggraphembedding," in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1105–1114, 2016. doi: 10.1145/2939672.2939811
[53] C. Donnat, M. Zitnik, D. Hallac, and J. Leskovec, "Spectral graph wavelets for structural role similarity in networks," in Proceedings of the International Conference on Learning Representations (ICLR), 2018, abs/1710.10321.
[54] X. Wang, P. Cui, J. Wang, J. Pei, W. Zhu, and S. Yang, "Community Preserving Network Em- bedding," in Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017, pp. 203–209. doi: 10.1609/aaai.v31i1.3012
13
[55] D. Zhang, J. Yin, X. Zhu, and C. Zhang, "Homophily, Structure, and Content Augmented Network Representation Learning," in IEEE 16th International Conference on Data Mining (ICDM), 2016. doi: 10.1109/ICDM.2016.0082
[56] C. Tu, W. Zhang, Z. Liu, and M. Sun, "Max-Margin DeepWalk: Discriminative Learning of Network Representation," in Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016, pp. 3889–3895. doi: 10.24963/ijcai.2016/539
[57] D.Zhang,J.Yin,X.Zhu,andC.Zhang,"CollectiveClassificationviaDiscriminativeMatrixFactor- ization on Sparsely Labeled Networks," in Proceedings of the 25th ACM International Conference on Information and Knowledge Management, 2016, pp. 1563–1572. doi: 10.1145/2983323.2983729
[58] X. Huang, J. Li, and X. Hu, "Label Informed Attributed Network Embedding," in Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, 2017, pp. 731–739. doi: 10.1145/3018661.3018733
[59] C.Zhou,Y.Liu,X.Liu,Z.Liu,andJ.Gao,"ScalableGraphEmbeddingforAsymmetricProximity," in Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017, pp. 2942–2948. doi: 10.1609/aaai.v31i1.31115019
[60] Y. Dong, N. V. Chawla, and A. Swami, "Metapath2vec: Scalable representation learning for heterogeneous networks," in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 135–144, 2017. doi: 10.1145/3097983.3098036
[61] S. Shi, B. Hu, W. X. Zhao, and P. S. Yu, "Heterogeneous information network embedding for recommendation," in CoRR, abs/1711.10730, 2017. doi: 10.1109/TKDE.2019.2919125
[62] M. Schlichtkrull, T. N. Kipf, P. Bloem, R. van den Berg, I. Titov, and M. Welling, "Modeling relational data with graph convolutional networks," IEEE Access, vol. 6, pp. 46069–46080, 2018. doi: 10.1109/ACCESS.2018.2842263
[63] J. Chen, S. Yang, and Z. Wang, "Multi-view representation learning for data stream clustering," IEEE Access, vol. 10, pp. 2945–2960, 2022. doi: 10.1109/ACCESS.2021.3069892
[64] Q.Luo,D.Yu,A.M.VeraVenkataSai,Z.Cai,andX.Cheng,"Asurveyofstructuralrepresentation learning for social networks," IEEE Access, vol. 10, pp. 24669–24683, 2022. doi: 10.1109/AC- CESS.2022.3088288
[65] X. Wang, P. Cui, J. Wang, J. Pei, W. Zhu, and S. Yang, "Community Preserving Network Em- bedding," in Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017, pp. 203–209. doi: 10.1609/aaai.v31i1.3012
[66] S. Cao, W. Lu, and Q. Xu, "Deep Neural Networks for Learning Graph Representations," in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016, pp. 1145–1152. doi: 10.1007/978-3-319-71249-91
[67] R. Feng, Y. Yang, W. Hu, F. Wu, and Y. Zhuang, "Representation learning for scale-free networks," CoRR, abs/1711.10755, 2017. doi: 10.1145/3308558.3313377
14
[68] Z. Yang, W. W. Cohen, and R. Salakhutdinov, "Revisiting semi-supervised learning with graph embeddings," in CoRR, abs/1603.08861, 2016. doi: 10.1145/3045390.3045594
[69] X. Wang, H. Ji, C. Shi, B. Wang, Y. Ye, P. Cui, and P. S. Yu, "Heterogeneous graph attention net- work," in The World Wide Web Conference, 2019, pp. 2022–2032. doi: 10.1145/3308558.3313417
[70] X. Wang, H. Ji, C. Shi, B. Wang, P. Cui, P. S. Yu, and Y. Ye, "Heterogeneous graph attention network," in CoRR, vol. abs/1903.07293, 2019. doi: 10.1145/3292500.3330851
[71] H. Hong, H. Guo, Y. Lin, X. Yang, Z. Li, and J. Ye, "An attention-based graph neu- ral network for heterogeneous structural learning," in CoRR, vol. abs/1912.10832, 2019. doi: 10.1145/3442381.3449843
[72] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol, "Stacked denoising au- toencoders: Learning useful representations in a deep network with a local denoising crite- rion," Journal of Machine Learning Research, vol. 11, no. 110, pp. 3371–3408, 2010. doi: 10.5555/1756006.1953039
[73] H. H. Song, T. W. Cho, V. Dave, Y. Zhang, and L. Qiu, "Scalable proximity estimation and link prediction in online social networks," in Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement, 2009, pp. 322–335. doi: 10.1145/1644893.1644928
[74] M. Belkin and P. Niyogi, "Laplacian eigenmaps and spectral techniques for embedding and clus- tering," in Proceedings of the 14th International Conference on Neural Information Processing Systems, 2001, pp. 585–591. doi: 10.1016/j.patcog.2007.03.014
[75] J. Mairal, J. Ponce, G. Sapiro, A. Zisserman, and F. Bach, "Supervised dictionary learning," in Advances in Neural Information Processing Systems, vol. 21, 2009. doi: 10.5555/2984093.2984214
[76] J. Zhu, A. Ahmed, and E. P. Xing, "Medlda: Maximum margin supervised topic models," Journal of Machine Learning Research, vol. 13, no. 74, pp. 2237–2278, 2012. doi: 10.5555/2984093.2984214
[77] Q. V. Le and T. Mikolov, "Distributed representations of sentences and documents," in CoRR, vol. abs/1405.4053, 2014. doi: 10.3115/1963501.1963531
[78] D. P. Kingma and J. Ba, "Adam: A Method for Stochastic Optimization," in 3rd International Conference on Learning Representations, 2015. doi: 10.1145/3045118.3045167
[79] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: A simple way to prevent neural networks from overfitting," Journal of Machine Learning Research, vol. 15, no. 56, pp. 1929–1958, 2014. doi: 10.1007/s10994-015-5521-0
[80] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in CoRR, vol. abs/1512.03385, 2015. doi: 10.1109/CVPR.2016.90
[81] Y. Dong, N. V. Chawla, and A. Swami, "Metapath2vec: Scalable representation learning for heterogeneous networks," in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 135–144, 2017. doi: 10.1145/3097983.3098036
15
[82] C. Shi, B. Hu, W. X. Zhao, and P. S. Yu, "Heterogeneous information network embedding for recommendation," in CoRR, abs/1711.10730, 2017. doi: 10.1109/TKDE.2019.2919125
[83] M. Schlichtkrull, T. N. Kipf, P. Bloem, R. van den Berg, I. Titov, and M. Welling, "Modeling relational data with graph convolutional networks," IEEE Access, vol. 6, pp. 46069–46080, 2018. doi: 10.1109/ACCESS.2018.2842263
[84] D. P. Kingma and J. Ba, "Adam: A Method for Stochastic Optimization," in 3rd International Conference on Learning Representations, 2015. doi: 10.1145/3045118.3045167
[85] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: A simple way to prevent neural networks from overfitting," Journal of Machine Learning Research, vol. 15, no. 56, pp. 1929–1958, 2014. doi: 10.1007/s10994-015-5521-0
[86] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in CoRR, vol. abs/1512.03385, 2015. doi: 10.1109/CVPR.2016.90
[87] J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su, "Arnetminer: Extraction and mining of academic social networks," in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 990–998, 2008. doi: 10.1145/1401890.1402024
[88] Y. Zhang, R. Jin, and Z.-H. Zhou, "Understanding bag-of-words model: a statistical framework," International Journal of Machine Learning and Cybernetics, vol. 1, no. 1–4, pp. 43–52, Aug. 2010. doi: 10.1007/s13042-010-0001-0
[89] M. Schlichtkrull, T. N. Kipf, P. Bloem, R. van den Berg, I. Titov, and M. Welling, "Modeling relational data with graph convolutional networks," IEEE Access, vol. 6, pp. 46069–46080, 2018. doi: 10.1109/ACCESS.2018.2842263
[90] J. Chen, S. Yang, and Z. Wang, "Multi-view representation learning for data stream clustering," IEEE Access, vol. 10, pp. 2945–2960, 2022. doi: 10.1109/ACCESS.2021.3069892
[91] Q.Luo,D.Yu,A.M.VeraVenkataSai,Z.Cai,andX.Cheng,"Asurveyofstructuralrepresentation learning for social networks," IEEE Access, vol. 10, pp. 24669–24683, 2022. doi: 10.1109/AC- CESS.2022.3088288
[92] E. Wang, Q. Yu, Y. Chen, W. Slamu, and X. Luo, "Multi-modal knowledge graphs representa- tion learning via multi-headed self-attention," IEEE Access, vol. 10, pp. 3467–3476, 2022. doi: 10.1109/ACCESS.2021.3054909
[93] Q. Feng, Z. Liu, and C. L. P. Chen, "Broad and deep neural network for high-dimensional data representation learning," IEEE Access, vol. 10, pp. 17736–17750, 2022. doi: 10.1109/AC- CESS.2022.3074505
Original Research
Multi-Criteria Attention-Based Graph Neural Network: A Heterogeneous Representation Learning Framework for Logistics System Optimization
Mohammad Shahbazi1. Hamid Tohidi1*. Majid Nojavan1
Received: 04 Feb 2025/ Accepted: 13 March 2025/ Published online: 17 March 2025
*Corresponding Author, h_tohidi@azad.ac.ir
1-Department of Industrial Engineering, South Tehran Branch, Islamic Azad University, Tehran, Iran
Abstract
Modeling the intricate relationships within complex logistics systems is essential for optimizing various operations—such as routing, scheduling, and distribution—in modern supply chains. These systems often exhibit significant diversity in their facilities, transportation modes, and capacity constraints, introducing a phenomenon known as “heterogeneity,” which complicates the modeling process. To simplify calculations, some researchers assume homogeneous systems, overlooking critical variability in nodes (e.g., warehouses, distribution centers) and edges (e.g., transportation routes, capacities). However, ignoring this heterogeneity can lead to a marked decrease in model accuracy. In this paper, a representation learning method specifically tailored for heterogeneous logistics systems is proposed, in which the multifaceted relationships among components are preserved and model performance in real-world scenarios is enhanced. Two novel extensions refine the underlying graph-based deep learning architecture by incorporating techniques from deep learning, graph probability models, and machine learning. The approach is evaluated on two popular Vehicle Routing Problem with Time Windows (VRPTW) datasets, using precision, F1 score, and recall as performance metrics. Experimental results indicate that this method outperforms existing approaches by providing higher precision and F1 scores, enabling more accurate classification of system components and better extraction of relationships within complex logistics networks.
Keywords- Machine learning; Deep learning; Representation learning; Heterogeneous systems; Logistics optimization
Modeling is a critical step in analyzing big data generated by logistics and supply chain operations. Incomplete or imprecise modeling can omit essential relationships within raw data, ultimately reducing the accuracy of subsequent analytical models. Data is frequently categorized into structured (e.g., tabular demand records) and unstructured (e.g., route networks, vehicle tracking logs). Graph modeling is particularly valuable for representing unstructured data in logistics, as many distribution and transportation networks include diverse nodes (facilities, vehicles, customers) and edges (shipment routes, capacity constraints) [1]. In such heterogeneous logistics systems, disregarding the diversity of components introduces a fundamental challenge in modeling. Various supply chain entities and transport modes often require flexible forms of representation. Consequently, capturing the complexity of real-world relationships remains a priority, especially in large-scale networks [2, 3]. In parallel, researchers have noted that analyzing these multifaceted connections can be aided by data mining techniques on link structures, social interactions, and hyperlinked documents [4, 5, 6]. Representation learning provides a framework that allows feature representations to be learned automatically from data [7]. This approach is particularly powerful in logistics scenarios when the variety of nodes (e.g., hubs, warehouses, trucks) and edges (e.g., dynamic route constraints) must be captured without oversimplifications. However, many modeling efforts still assume a homogeneous structure [8], despite recent work suggesting that heterogeneity can reveal important patterns and improve predictive accuracy [9].
Contributions of this study
· Novel Multi-Criteria Attention-Based GNN: A GNN framework is proposed that captures heterogeneity in logistics systems by considering multiple types of nodes and edges simultaneously.
· Two Architectural Extensions: Two extensions are developed: (i) a feature refinement strategy and (ii) multiple aggregation functions, which enhance embedding quality for more accurate classification in complex VRPTW datasets.
· Comprehensive Evaluation: Evaluation is performed on two well-known VRPTW benchmarks (Solomon, Gehring & Homberger) and faster convergence and consistently higher F1 scores than previous methods are demonstrated.
I. Unsupervised Representation Learning
One line of research focuses on extracting meaningful features from unlabeled graph data. [10] introduced DeepWalk, which uses random walks to capture local neighborhood information. Similarly, node2vec [11] generalizes this idea by applying biased walks to better sample diverse neighborhoods. Both methods excel at micro-level tasks such as node classification. Building on these, approaches like SDNE [12] and DNGR [13] adopt deep architectures (autoencoders) to preserve non-linear proximity. Although these unsupervised methods can uncover important structural patterns, they frequently do not incorporate additional node attributes (e.g., capacity or scheduling constraints) that are critical for logistics operations.
II. Semi-Supervised Representation Learning
When partial labels are available, semi-supervised approaches can blend labeled and unlabeled data to learn more robust embeddings. Early frameworks like graph convolutional networks introduced refined message-passing
III. Summary and Motivation
Despite these advances, the majority of existing works primarily address micro-level predictions (e.g., node classification, link prediction) without explicitly focusing on complex operational constraints present in logistics [8]. Many ignore the richness of node attributes by assuming uniform embeddings. In contrast, our proposed method leverages multi-criteria attention and meta-path expansions to capture subtle relationships essential for optimizing routes or schedules in large-scale supply chains. Table 1 illustrates some representative methods. We group them by methodology (random walk, deep learning, hybrid) and list their main advantages and disadvantages for potential logistics applications.
TABLE 1
Related studies in network representation learning.
Method Analysis | ALGORITHM [REFERENCE] | Advantages / Disadvantages |
Random walk | DeepWalk [10], Node2Vec [11] | + Preserves local neighborhood structure – Neglects node attributes, typically micro-level |
Deep learning | SDNE [12], DNGR [13] | + Learns complex, non-linear embeddings – Higher risk of overfitting, ignores heterogeneity |
Hybrid | GAT [15], R-GCN [14], HAN [16], HetSANN [17] | + Uses partial labels, meta-paths for diverse types – Often focused on node-level tasks, needs heavy tuning |
Proposed Model
A novel graph-based deep learning framework is presented that captures heterogeneity in logistics systems. The method leverages a graph neural network (GNN) architecture, enriched by meta-paths and multi-criteria attention mechanisms, allowing multiple node and edge types to be modeled effectively.
I. Meta-Path Integration
Meta-paths are used to reveal hidden indirect relationships, especially when logistics networks have specialized transport modes or multi-hop connections between hubs. By selecting meta-paths that encode domain-specific constraints (e.g., capacity limits, preferred transport modes), relevant relationships are captured in the embedding.
II. Neighborhood Expansion via Graph Edges
To reinforce each node’s representation, its neighborhood is expanded through both direct edges and meta-paths. A ring-like structure is employed so that nodes can retain their own features (such as capacity, cost, or time-window constraints) in the final embedding. An illustration of neighbor contributions to the embedding of a central node is provided in Figure 1.
FIGURE 1
ILLUSTRATION OF NEIGHBOR CONTRIBUTIONS TO THE EMBEDDING OF A CENTRAL NODE I.
III. Unified Embedding and Attention
All features are projected into a shared embedding dimension. An attention layer is then applied to weight each neighboring node, with distinct coefficients for different edge or meta-path types. This ensures that the model is focused on the most relevant neighbors in highly heterogeneous logistics systems. The overall data flow of the proposed model is depicted in Figure 2.
FIGURE 2
DATA FLOW IN THE PROPOSED MODEL
IV. Representation Vector Generation
After attention-based weighting, node features are aggregated to yield a robust embedding that captures both local and indirect relationships, which is crucial for tasks such as routing, scheduling, and facility placement.
V. Representation Learning Block: Mathematical Formulation
Within each block (or GNN layer), computations are performed either in parallel or sequentially:
• Let be denoted as the features of node
at layer
. A linear transformation is performed as:
(1)
where is a type-specific weight matrix, and
is used to indicate the type of node
.
• For an edge of type
, the attention coefficient is computed as:
, (2)
and softmax normalization is then applied:
, (3)
where denotes the set of edges incident on node
.
• The new embedding for node is then computed as:
. (4)
- First Extension (Feature Refinement)
Before the linear transformation in (1), a feature-level attention is applied:
, (5)
and then
. (6)
- Second Extension (Multiple Aggregation Functions)
Instead of a single aggregator, multiple functions (e.g., mean, max, variance) are combined. For the -th embedding dimension:
|
|
Method | Macro F1 | Micro F1 |
DeepWalk [23] | 78.32 | 79.12 |
Metapath2Vec [41] | 74.22 | 74.69 |
HERec [37] | 82.93 | 83.32 |
GCN [42] | 84.66 | 84.99 |
HAN [36] | 85.20 | 85.66 |
R-GCN [42] | 85.31 | 85.54 |
GAT [35] | 86.59 | 86.78 |
HetSANN [37] | 87.12 | 87.46 |
HetSANN.M [37] | 88.33 | 88.59 |
HetSANN.M.R [37] | 88.78 | 89.18 |
HetSANN.M.R.V [37] | 89.12 | 89.46 |
Proposed Model | 91.48 | 91.95 |
· Comparison on Gehring & Homberger
Table 3 shows that the proposed model also achieves superior performance on the Gehring & Homberger dataset.
TABLE 3
COMPARISON ON THE SOLOMON VRPTW DATASET (C1, C2, R1, R2).
Method | Macro F1 | Micro F1 |
DeepWalk [23] | 77.95 | 78.48 |
Metapath2Vec [41] | 74.67 | 75.22 |
HERec [37] | 83.57 | 84.16 |
GCN [42] | 85.63 | 86.02 |
HAN [36] | 86.15 | 86.71 |
R-GCN [42] | 86.41 | 86.85 |
GAT [35] | 87.78 | 88.19 |
HetSANN [37] | 88.46 | 88.67 |
HetSANN.M [37] | 89.11 | 89.45 |
HetSANN.M.R [37] | 89.67 | 90.03 |
HetSANN.M.R.V [37] | 90.08 | 90.46 |
Proposed Model | 92.72 | 93.01 |
· Discussion and Convergence Analysis
- A 2–4% improvement in F1 is observed, highlighting the importance of modeling heterogeneous edges.
- Meta-path-based expansions capture indirect relationships that purely local methods tend to miss.
- The approach converges quickly despite additional architectural complexity.
Figure 3 plots the F1 score (%) versus training epochs for the proposed model and a strong baseline, demonstrating faster convergence and higher final accuracy.
FIGURE 3
CONVERGENCE ANALYSIS ON VRPTW DATA.
Conclusion
In this study, a novel graph-based deep learning framework was proposed that employs multi-criteria attention mechanisms along with two key architectural extensions—feature refinement and multiple aggregation—to improve the representation of heterogeneous components within logistics systems. The model was specifically designed to address the complexity of real-world transportation and distribution networks, in which diverse nodes (e.g., facilities, hubs, vehicles) and edges (e.g., routes, capacity constraints) can significantly affect routing, scheduling, and other optimization tasks.
Initially, meta-paths were utilized to uncover hidden, indirect relationships among nodes, ensuring that relevant domain-specific interactions—such as capacity limits or multi-modal connections—were captured in the representation space. Subsequently, a multi-criteria attention module was implemented to selectively weight nodes and edges based on their importance, thereby facilitating a more focused and accurate modeling of complex logistics data. This attention-based weighting allowed the network to concentrate on the most critical elements, preventing dilution of valuable information in node and edge features.The evaluation was conducted using two well-known Vehicle Routing Problem with Time Windows (VRPTW) datasets: Solomon and Gehring & Homberger. These datasets incorporate various operational constraints, such as multiple depots, time windows, and vehicle capacity limitations, making them highly representative testbeds for heterogeneous logistics networks. The experimental results demonstrated that the proposed framework achieved a 2–4% improvement in F1 score compared to existing baseline methods, indicating its enhanced capability to classify and predict system components accurately under challenging, real-world conditions.
The feature-refinement extension played a key role by filtering and strengthening the input features before embedding, thereby preserving high-level information essential for downstream tasks. Furthermore, the multi-aggregation mechanism effectively captured a wide range of statistical characteristics within local neighborhoods—from means to variances and maxima—alleviating potential inaccuracies caused by the diversity and imbalance commonly observed in large-scale logistics networks. This multi-aggregation approach enabled the model to handle fluctuations and structural changes in the network more robustly, contributing to the observed performance gains. In addition to consistently outperforming baselines in precision and recall, the proposed method exhibited faster convergence, suggesting its suitability for scenarios requiring rapid or online inference. This higher convergence speed underscores the architecture’s potential for practical deployment in real-time decision-making processes related to routing and scheduling. By leveraging both local and indirect relationships through meta-paths, the model succeeded in uncovering nuanced interactions in logistics data—interactions that traditional homogeneous approaches often overlook.
In summary, the findings confirmed that a graph neural network enhanced by multi-criteria attention, feature refinement, and multiple aggregations can effectively address the challenges posed by heterogeneous nodes and edges in logistics optimization. Through this architecture, a robust and flexible tool was introduced for analyzing and predicting hidden patterns in complex transportation networks. By incorporating these advanced representation-learning methods, the model not only achieved superior accuracy and comprehensiveness but also demonstrated heightened resilience and efficiency compared to standard methods. Consequently, it is anticipated that future work focusing on real-time learning, uncertainty management, and large-scale distributed training would further broaden the model’s applicability, paving the way for advanced, data-driven decision-making in large, dynamic logistics systems.
· Limitations and Future Work:
- Sparse Adjacency Challenge: Highly sparse networks may degrade performance; advanced sampling or hierarchical modeling techniques could be explored.
- Hyperparameter Tuning: More extensive cross-validation or Bayesian optimization may further improve performance.
- Online Learning: Real-time adaptation is needed when dynamic factors (e.g., traffic, weather) change edge weights.
- Scalability: Distributed or GPU-accelerated strategies could tackle ultra-large networks more efficiently.
References
[1] Cook, D. J., & Holder, L. B. (2000). Graph-based data mining. IEEE Intelligent Systems, 15(2), 32–41. https://doi.org/10.1109/5254.848965
[2] Otte, E., & Rousseau, R. (2002). Social network analysis: A powerful strategy, also for the information sciences. Journal of Information Science, 28(6), 441–453. https://doi.org/10.1177/016555150202800601
[3] Chakrabarti, S. (2002). Mining the web: Analysis of hypertext and semi-structured data. Morgan Kaufmann. https://doi.org/10.1016/B978-012457530-7/50007-9
[4] Getoor, L., & Diehl, C. P. (2005). Link mining: A survey. SIGKDD Explorations, 7(2), 3–12. https://doi.org/10.1145/1081870.1081872
[5] Han, J. (2009). Mining heterogeneous information networks by exploring the power of links. In Discovery Science (pp. 13–30). Springer. https://doi.org/10.1007/978-3-642-00582-52
[6] Lewis, T. (2013). Network Science: Theory and Applications. Wiley.
[7] Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798–1828. https://doi.org/10.1109/TPAMI.2013.50
[8] Sun, Y., & Han, J. (2013). Mining heterogeneous information networks: A structural analysis approach. SIGKDD Explorations, 14(2), 20–28. https://doi.org/10.1145/2491436.2491441
[9] Pham, P., Nguyen, L. T. T., Vo, B., & Yun, U. (2022). Bot2Vec: A general approach of intra-community oriented representation learning for bot detection in different types of social networks. IEEE Access, 10, 12162–12174. https://doi.org/10.1109/ACCESS.2022.3148327
[10] Perozzi, B., Al-Rfou, R., & Skiena, S. (2014). DeepWalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD (pp. 701–710). https://doi.org/10.1145/2623330.2623732
[11] Grover, A., & Leskovec, J. (2016). node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD (pp. 855–864). https://doi.org/10.1145/2939672.2939754
[12] Wang, D., Cui, P., & Zhu, W. (2016). Structural deep network embedding. In Proceedings of the 22nd ACM SIGKDD (pp. 1225–1234). https://doi.org/10.1145/2939672.2939755
[13] Cao, S., Lu, W., & Xu, Q. (2016). Deep neural networks for learning graph representations. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (pp. 1145–1151). https://doi.org/10.1609/aaai.v30i1.1145
[14] Schlichtkrull, M., Kipf, T. N., Bloem, P., van den Berg, R., Titov, I., & Welling, M. (2018). Modeling relational data with graph convolutional networks. In Proceedings of the 15th European Semantic Web Conference (ESWC) (pp. 593–607). https://doi.org/10.1007/978-3-319-93846-2_38
[15] Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., & Bengio, Y. (2018). Graph attention networks. In International Conference on Learning Representations. https://doi.org/10.48550/arXiv.1710.10903
[16] Wang, X., He, H., Cao, Y., Liu, M., & Chua, T.-S. (2019). Heterogeneous graph neural network. In Proceedings of the 25th ACM SIGKDD (pp. 793–803). https://doi.org/10.1145/3292500.3330924
[17] Shi, C., et al. (2017). Heterogeneous information network embedding for recommendation. IEEE Transactions on Knowledge and Data Engineering, 29(2), 242–255. https://doi.org/10.1109/TKDE.2016.2569684
[18] Dong, Y., Chawla, N. V., & Swami, A. (2017). Metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD (pp. 135–144). https://doi.org/10.1145/3097983.3098036