Real-Time Scalable Task Offloading in Edge Computing Using Semi-Markov Decision Processes and Attention-Based Deep Reinforcement Learning

Mirzaei, Abbas; Mikaeilvand, Nasser; Nouri-Moghaddam, Babak; Jahanbakhsh Gudakahriz, Sajjad; Khosravani, Ailin; Tahmasebizade, Fatemeh; Seifi, Ali; Hatami, Hosein

doi:https://doi.org/10.82553/josc.2025.140309061191686

Manuscript ID : 140309061191686 Visit : 151 Page: 1 - 15

https://doi.org/10.82553/josc.2025.140309061191686

Article Type: Original Research

Real-Time Scalable Task Offloading in Edge Computing Using Semi-Markov Decision Processes and Attention-Based Deep Reinforcement Learning

Subject Areas : Computer Networks

Abbas Mirzaei ¹ , Nasser Mikaeilvand ² , Babak Nouri-Moghaddam ³ , Sajjad Jahanbakhsh Gudakahriz ⁴ , Ailin Khosravani ⁵ , Fatemeh Tahmasebizade ⁶ , Ali Seifi ⁷ , Hosein Hatami ⁸

1 -
2 - Department of Computer Engineering, Central Tehran Branch, Islamic Azad University, Tehran, Iran
3 -
4 -
5 -
6 -
7 -
8 -

Received: 2024-11-28 Accepted : 2025-02-02 Published : 2025-04-05

Keywords: Edge Computing, Task Scheduling, Reinforcement Learning, System Scalability.,

Abstract :

Edge computing has emerged as a dynamic framework where computational tasks are offloaded to distributed edge servers (ESs) to provide low-latency and efficient services. As edge systems grow in scale and complexity, leveraging Deep Reinforcement Learning (DRL) has become a prominent approach to optimize task offloading and Resource management. However, traditional DRL-based methodologies encounter several challenges: (1) Discrete-time decision frameworks, such as Markov Decision Processes (MDPs), often enforce offloading in fixed timeslots, leading to scheduling delays and inefficient Resource utilization. (2) Static computational structures struggle to adapt to varying numbers of edge servers or user devices, resulting in scalability issues and system inefficiencies. To overcome these limitations, we introduce a novel DRL-driven real-time offloading mechanism tailored for dynamic and scalable edge environments. Our approach reformulates the offloading problem within a Semi-Markov Decision Process (SMDP) framework and introduces an adaptive optimization mechanism utilizing attention-based graph operations for heterogeneous Resource environments. This system, like how we prioritize tasks and divide resources, figures out how much attention to pay to each task and which server should handle it, to make things work smoothly. To make this work even better in the real world, we use a special method to adjust the rewards, which helps the system learn and improve its performance over time.

References:

[1] Y. Mao, C. You, J. Zhang, K. Huang, and K. B. Letaief, “A survey on mobile edge computing: The communication perspective,” IEEE Communications Surveys & Tutorials, vol. 19, no. 4, pp. 2322–2358, 2017.
[2] Mirzaei, A. and Najafi Souha, A., 2021. Towards optimal configuration in MEC Neural networks: deep learning-based optimal resource allocation. Wireless Personal Communications, 121(1), pp.221-243.
[3] Zhou, Guoliang, and Amin Mohajer. "Blind reconfigurable intelligent surfaces for dynamic offloading in fixed-NOMA mobile edge networks." International Journal of Sensor Networks 46, no. 3 (2024): 142-160.
[4] H. Guo, J. Li, J. Liu, N. Tian, and N. Kato, “A survey on space-airground- sea integrated network security in 6g,” IEEE Communications Surveys & Tutorials, vol. 24, no. 1, pp. 53–87, 2022.
[5] Duan, H., & Mirzaei, A. (2023). Adaptive Rate Maximization and Hierarchical Resource Management for Underlay Spectrum Sharing NOMA HetNets with Hybrid Power Supplies. Mobile Networks and Applications, 1-17.
[6] Zhou, Nan, Ya Nan Li, and Amin Mohajer. "Distributed capacity optimisation and resource allocation in heterogeneous mobile networks using advanced serverless connectivity strategies." International Journal of Sensor Networks 45, no. 3 (2024): 127-147.
[7] X. Huang, Y. Chen, J. Liu, M. Wang, P. Li, and Q. Zhao, “Joint interdependent task scheduling and energy balancing for multi-uav enabled aerial edge computing: A multi-objective optimization approach,” IEEE Internet of Things Journal, vol. 10, no. 4, pp. 3147–3160, 2023.
[8] Z. Yang, C. Pan, K. Wang, and M. Shikh-Bahaei, “Energy efficient Resource allocation in uav enabled mobile edge computing networks,”IEEE Transactions on Wireless Communications, vol. 18, no. 9, pp. 4576–4589, 2019.
[9] Mohajer, Amin, Mohammad Yousefvand, Ehsan Noori Ghalenoo, Parviz Mirzaei, and Ali Zamani. "Novel approach to sub-graph selection over coded wireless networks with QoS constraints." IETE Journal of Research 60, no. 3 (2014): 203-210.
[10] X. Zhang, J. Zhang, J. Xiong, L. Zhou, J. Wei, and H. Li, “Energyefficient multi-uav-enabled multiaccess edge computing incorporating noma,” IEEE Internet of Things Journal, vol. 7, no. 6, pp. 5613–5627, 2020.
[11] Mirzaei, A. (2022). A novel approach to QoS‐aware resource allocation in NOMA cellular HetNets using multi‐layer optimization. Concurrency and Computation: Practice and Experience, 34(21), e7068.
[12] T. Zhang, Y. Xu, J. Loo, D. Yang, L. Xiao, and Y. Zhao, “Joint computation and communication design for uav-assisted mobile edge computing in iot,” IEEE Transactions on Industrial Informatics, vol. 16, no. 8, pp. 5505–5516, 2020.
[13] Z. Liu, X. Tan, M. Wen, S. Wang, C. Liang, and Q. Zhao, “An energyefficient selection mechanism of relay and edge computing in uavassisted cellular networks,” IEEE Transactions on Green Communications and Networking, vol. 5, no. 3, pp. 1306–1318, 2021.
[14] Mohajer, Amin, Javad Hajipour, and Victor CM Leung. "Dynamic Offloading in Mobile Edge Computing with Traffic-Aware Network Slicing and Adaptive TD3 Strategy." IEEE Communications Letters (2024).
[15] Yang, Jiuting, and Amin Mohajer. "Multi objective constellation optimization and dynamic link utilization for sustainable information delivery using PD-NOMA deep reinforcement learning." Wireless Networks (2024): 1-21.
[16] Somarin, A. M., Barari, M., & Zarrabi, H. (2018). Big data based self-optimization networking in next generation mobile networks. Wireless Personal Communications, 101(3), 1499-1518.
[17] Kuang, Shuhong, Jiyong Zhang, and Amin Mohajer. "Reliable information delivery and dynamic link utilization in MANET cloud using deep reinforcement learning." Transactions on Emerging Telecommunications Technologies 35, no. 9 (2024): e5028.
[18] Hua, Yuxiu, Rongpeng Li, Zhifeng Zhao, Xianfu Chen, and Honggang Zhang. "GAN-powered deep distributional reinforcement learning for resource management in network slicing." IEEE Journal on Selected Areas in Communications 38, no. 2 (2019): 334-349.
[19] X. Qin, Z. Song, Y. Hao, and X. Sun, “Joint Resource allocation and trajectory optimization for multi-uav-assisted multi-access mobile edge computing,” IEEE Wireless Communications Letters, vol. 10, no. 7, pp. 1400–1404, 2021.
[20] Wang, Qianxing, Wei Li, and Amin Mohajer. "Load-aware continuous-time optimization for multi-agent systems: Toward dynamic resource allocation and real-time adaptability." Computer Networks 250 (2024): 110526.
[21] H. Hu, Z. Chen, F. Zhou, Z. Han, and H. Zhu, “Joint Resource and trajectory optimization for heterogeneous-uavs enabled aerial-ground cooperative computing networks,” IEEE Transactions on Vehicular Technology, vol. 72, no. 6, pp. 7119–7133, 2023.
[22] Mirzaei, A., Barari, M., & Zarrabi, H. (2019). Efficient resource management for non-orthogonal multiple access: A novel approach towards green hetnets. Intelligent Data Analysis, 23(2), 425-447.
[23] Gu, LiFen, and Amin Mohajer. "Joint throughput maximization, interference cancellation, and power efficiency for multi-IRS-empowered UAV communications." Signal, Image and Video Processing 18, no. 5 (2024): 4029-4043.
[24] G. Chen, Q. Wu, R. Liu, J. Wu, and C. Fang, “Irs aided mec systems with binary offloading: A unified framework for dynamic irs beamforming,”IEEE Journal on Selected Areas in Communications, vol. 41, no. 2, pp. 349–365, 2023.
[25] X. Li, Y. Qin, J. Huo, and W. Huangfu, “Computation offloading and trajectory planning of multi-uav-enabled mec: A knowledge-assisted multiagent reinforcement learning approach, IEEE Internet of Things Journal, 2023.
[26] Yang, Ting, Jiabao Sun, and Amin Mohajer. "Queue stability and dynamic throughput maximization in multi-agent heterogeneous wireless networks." Wireless Networks (2024): 1-27.
[27] Mirzaei, A., & Rahimi, A. (2019). A Novel Approach for Cluster Self-Optimization Using Big Data Analytics. Information Systems & Telecommunication, 50.
[28] Y. Gu, C. Yin, Y. Guo, B. Xia, and Z. Chen, “Communicationcomputation- aware user association in mec hetnets: A meta-analysis,” IEEE Transactions on Wireless Communications, vol. 22, no. 9, pp. 6090–6105, 2023.
[29] Zhang, Qi, Zhigang Li, Zhenteng Qin, Xiaochuan Sun, and Haijun Zhang. "Temporal Feature-Enhanced Deep Reinforcement Learning for RAN Slicing with User Mobility." IEEE Communications Letters (2023).
[30] F. Zhou, Y. Wu, R. Q. Hu, and Y. Qian, “Computation rate maximization in uav-enabled wireless-powered mobile-edge computing systems,” IEEE Journal on Selected Areas in Communications, vol. 36, no. 9, pp. 1927–1941, 2018.
[31] Q. Hu, Y. Cai, G. Yu, Z. Qin, M. Zhao, and G. Y. Li, “Joint offloading and trajectory design for uav-enabled mobile edge computing systems,”IEEE Internet of Things Journal, vol. 6, no. 2, pp. 1879–1892, 2019.
[32] Zhao, Zhongyong, Yu Chen, Jiangnan Liu, Yingying Cheng, Chao Tang, and Chenguo Yao. "Evaluation of operating state for smart electricity meters based on transformer–encoder–BiLSTM." IEEE Transactions on Industrial Informatics 19, no. 3 (2022): 2409-2420.
[33] Mohajer, Amin, Maryam Bavaghar, Rashin Saboor, and Ali Payandeh. "Secure dominating set-based routing protocol in MANET: Using reputation." In 2013 10th International ISC Conference on Information Security and Cryptology (ISCISC), pp. 1-7. IEEE, 2013.
[34] Y. Xu, T. Zhang, Y. Liu, D. Yang, L. Xiao, and M. Tao, “Cellular connected multi-uav mec networks: An online stochastic optimization approach,” IEEE Transactions on Communications, vol. 70, no. 10, pp. 6630–6647, 2022.
[35] Nemati, Z., Mohammadi, A., Bayat, A., & Mirzaei, A. (2024). Metaheuristic and Data Mining Algorithms-based Feature Selection Approach for Anomaly Detection. IETE Journal of Research, 1-15.
[36] Li, Rongpeng, Chujie Wang, Zhifeng Zhao, Rongbin Guo, and Honggang Zhang. "The LSTM-based advantage actor-critic learning for resource management in network slicing with user mobility." IEEE Communications Letters 24, no. 9 (2020): 2005-2009.
[37] L. Zhang, J. Li, Y. Wang, Z. Chen, Q. Liu, and Y. Sun, “Task offloading and trajectory control for uav-assisted mobile edge computing using deep reinforcement learning,” IEEE Access, vol. 9, pp. 53 708–53 719, 2021.
[38] X. Zhang, J. Zhang, J. Xiong, L. Zhou, J. Wei, and H. Li, “Energy efficient multi-uav-enabled multiaccess edge computing incorporating noma,” IEEE Internet of Things Journal, vol. 7, no. 6, pp. 5613–5627, 2020.
[39] L. Wang, K. Wang, C. Pan, W. Xu, N. Aslam, and L. Hanzo, “Multiagent deep reinforcement learning-based trajectory planning for multiuav assisted mobile edge computing,” IEEE Transactions on Cognitive Communications and Networking, vol. 7, no. 1, pp. 73–84, 2021.
[40] T. Zhang, Y. Xu, J. Loo, D. Yang, L. Xiao, and Y. Zhao, “Joint computation and communication design for uav-assisted mobile edge computing in iot,” IEEE Transactions on Industrial Informatics, vol. 16, no. 8, pp. 5505–5516, 2020.
[41] Z. Liu, X. Tan, M. Wen, S. Wang, C. Liang, and Q. Zhao, “An energy efficient selection mechanism of relay and edge computing in uavassisted cellular networks,” IEEE Transactions on Green Communications and Networking, vol. 5, no. 3, pp. 1306–1318, 2021.
[42] Yan, Dandan, Benjamin K. Ng, Wei Ke, and Chan-Tong Lam. "Deep reinforcement learning based resource allocation for network slicing with massive MIMO." IEEE Access (2023).
[43] C.-Y. Hsieh, Y. Ren, and J.-C. Chen, “Edge-cloud offloading: Knapsack potential game in 5g multi-access edge computing,” IEEE Transactions on Wireless Communications, vol. 22, no. 4, pp. 3124–3136, 2023.
[44] N. Zhao, C. Xu, W. Zhang, S. Yang, G.-M. Muntean, and F. Zhou,“5g-enabled uav-to community offloading: Joint trajectory design and task scheduling,” IEEE Journal on Selected Areas in Communications, vol. 39, no. 11, pp. 3306–3320, 2021.
[45] H. Guo and J. Liu, “Uav-enhanced intelligent offloading for internet of things at the edge, IEEE Transactions on Industrial Informatics, vol. 16, no. 4, pp. 2737–2746, 2020.
[46] Wang, Zhaoying, Yifei Wei, F. Richard Yu, and Zhu Han. "Utility optimization for resource allocation in multi-access edge network slicing: A twin-actor deep deterministic policy gradient approach." IEEE Transactions on Wireless Communications 21, no. 8 (2022): 5842-5856.
[47] X. Qin, Z. Song, Y. Hao, and X. Sun, “Joint Resource allocation and trajectory optimization for multi-uav-assisted multi-access mobile edge computing,” IEEE Wireless Communications Letters, vol. 10, no. 7, pp. 1400–1404, 2021.
[48] M. Li, N. Cheng, J. Gao, Y. Wang, L. Zhao, and X. Shen, “Energyefficient uav-assisted mobile edge computing: Resource allocation and trajectory optimization,” IEEE Transactions on Vehicular Technology, vol. 69, no. 3, pp. 3424–3438, 2020.
[49] Wang, Yue, Yu Gu, and Xiaofeng Tao. "Edge network slicing with statistical QoS provisioning." IEEE Wireless Communications Letters 8, no. 5 (2019): 1464-1467.
[50] H. Guo and J. Liu, “Uav-enhanced intelligent offloading for internet of things at the edge, IEEE Transactions on Industrial Informatics, vol. 16, no. 4, pp. 2737–2746, 2020.

Share To

Article Url

Real-Time Scalable Task Offloading in Edge Computing Using Semi-Markov Decision Processes and Attention-Based Deep Reinforcement Learning

Sanad

Links

Related Centers

Technical Support

Official pages