Hybrid Reinforcement Learning-Enabled Scheduler for 5G Burst Traffic

Title: Hybrid Reinforcement Learning-Enabled Scheduler for 5G Burst Traffic

Authors
Authors and Affiliations

Mohamed Mohsen Farouk, Chung Gwo Chin, Mardeni Roslee, Lee It Ee , Pang Wai Leong

Corresponding email: gcchung@mmu.edu.my

Published at : 29 May 2026
Volume : IJtech Vol 17, No 3 (2026)
DOI : https://doi.org/10.14716/ijtech.v17i3.8208

Cite this article as:

Farouk, M. M., Chin, C. G., Roslee, M., Ee, L. I., & Leong, P. W. (2026). Hybrid reinforcement learning-enabled scheduler for 5G burst traffic. International Journal of Technology, 17 (3), 866–882.

160

Downloads

Mohamed Mohsen Farouk	Engineering Science, Faculty of Artificial Intelligence and Engineering (FAIE), Multimedia University, 63000 Cyberjaya, Selangor, Malaysia
Chung Gwo Chin	Faculty of Artificial Intelligence and Engineering (FAIE), Multimedia University, 63000 Cyberjaya, Selangor, Malaysia
Mardeni Roslee	Faculty of Artificial Intelligence and Engineering (FAIE), Multimedia University, 63000 Cyberjaya, Selangor, Malaysia
Lee It Ee	Faculty of Artificial Intelligence and Engineering (FAIE), Multimedia University, 63000 Cyberjaya, Selangor, Malaysia
Pang Wai Leong	School of Engineering Faculty of Innovation & Technology, Taylor’s University, 47500 Subang Jaya, Selangor, Malaysia

Email to Corresponding Author

Abstract

Hybrid Reinforcement Learning-Enabled Scheduler for 5G Burst Traffic

Resource allocation in wireless networks is inherently complex, a problem intensified in 5G by heterogeneous traffic classes and stringent quality of service (QoS) requirements. This challenge poses significant difficulties for traditional scheduling methods. In this study, we address these limitations using novel hybrid reinforcement learning (RL) architectures evaluated in a dynamic and realistic network environment. We designed and implemented three hybrid RL algorithms: Asynchronous Advantage Actor-Critic integrated with Proximal Policy Optimization (A3C-PPO), A3C with Proximal Policy Optimization and Session Persistence (A3C-PPO-Persistent), and A3C with Twin Delayed Deep Deterministic Policy Gradients (A3C-TD3). These were compared against baseline A3C and Advantage Actor-Critic (A2C) approaches, as well as traditional proportional fair (PF), maximum rate (MR), and Round Robin (RR) schedulers. Simulations were performed in a challenging multicell environment with mobile user equipment and bursty traffic flows across four network traffic types: ultra-low latency (ULL), voice over IP (VoIP), vehicle-to-everything (V2X), and video streaming. Our hybrid RL schedulers showed promising performance in this highly dynamic setting, with A3C-PPO achieving the most balanced overall results, exhibiting 25%–40% lower average jitter and over four times higher packet delivery ratio (PDR) than traditional schedulers under heavy loads. Our results indicate that hybrid RL methods, particularly A3C+PPO, can provide resilient adaptive scheduling that can outperform both conventional techniques and standard RL algorithm models in realistic 5G networks.

Keywords

5G; Advantage Actor-Critic; Burst traffic; Reinforcement learning

References

Akyildiz, H. A., Gemici, O. F., Hokelek, I., & Cirpan, H. A. (2024). Hierarchical reinforcement learning based resource allocation for RAN slicing. IEEE Access, 12, 75818–75831. https://doi.org/10.1109/ACCESS.2024.3406949

Alanazi, R., Obayya, M., Alghamdi, A. M., Nemri, N., Alshahrani, S., Alduaiji, N., Hasanin, T., & Sorour, S. (2025). Machine learning-driven routing optimization for energy-efficient 6G-enabled wireless sensor networks. Alexandria Engineering Journal, 129, 877–888. https://doi.org/10.1016/j.aej.2025.07.032

Alsenwi, M., Tran, N. H., Bennis, M., Pandey, S. R., Bairagi, A. K., & Hong, C. S. (2021). Intelligent resource slicing for eMBB and URLLC coexistence in 5G and beyond: A deep reinforcement learning approach. IEEE Transactions on Wireless Communications. https://doi.org/10.1109/TWC.2021.3060514

Ashraf, N. M., Mostafa, R. R., Sakr, R. H., & Rashad, M. Z. (2021). Optimizing hyperparameters of deep reinforcement learning for autonomous driving based on whale optimization algorithm. PLOS ONE, 16(6), e0252754. https://doi.org/10.1371/journal.pone.0252754

Benmadani, H. E., Azni, M., Alharbi, T. E., Alzaidi, M. S., & Tounsi, M. (2025). Deep reinforcement learning-based dynamic scheduling for real-time applications in LTE and 5G. IEEE Access, 13, 33555–33570. https://doi.org/10.1109/ACCESS.2025.3541531

Bikkasani, D., & Yerabolu, M. (2024). AI-driven 5G network optimization: A review of resource allocation, traffic management, and network slicing. American Journal of Artificial Intelligence, 8(2), 55–62. https://doi.org/10.11648/j.ajai.20240802.14

Bozis, E. Z. G., Sagias, N. C., Batistatos, M. C., Kourtis, M. A., Xilouris, G. K., & Kourtis, A. (2024). Enhancing 5G performance: A standalone system platform with customizable features. AEU - International Journal of Electronics and Communications, 187. https://doi.org/10.1016/j.aeue.2024.155515

Carneiro, D. P. Q., Cardoso, A. A., & Vieira, F. H. T. (2023). Adaptive resource allocation in 5G systems using reinforcement learning. Neural Computing and Applications, 35(13), 9421–9435. https://doi.org/10.1007/s00521-023-08406-2

Del Rio, A., Jimenez, D., & Serrano, J. (2024). Comparative analysis of A3C and PPO algorithms in reinforcement learning: A survey on general environments. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3472473

Elsayed, M., & Erol-Kantarci, M. (2019). Reinforcement learning-based joint power and resource allocation for URLLC in 5G. IEEE GLOBECOM, 1–6. https://doi.org/10.1109/GLOBECOM38437.2019.9014032

Gaw?owicz, P., & Zubow, A. (2019). Ns-3 meets OpenAI Gym: Machine learning for networking research. MSWiM, 113–120. https://doi.org/10.1145/3345768.3355908

Gedikli, A. M., Koseoglu, M., & Sen, S. (2022). Deep reinforcement learning-based flexible preamble allocation for RAN slicing. Computer Networks, 215. https://doi.org/10.1016/j.comnet.2022.109202

Guo, Y., & Xie, Y. (2025). Adaptive network planning for 5G/6G networks under burst traffic. IEEE Communications Letters. https://doi.org/10.1109/LCOMM.2025.3537863

Hattori, R., Hedrick, N. G., Jain, A., Chen, S., You, H., Hattori, M., Choi, J.-H., Lim, B. K., Yasuda, R., & Komiyama, T. (2023). Meta-reinforcement learning via orbitofrontal cortex. Nature Neuroscience, 26(12), 2182–2191. https://doi.org/10.1038/s41593-023-01485-3

Ibrahim, A. A., & Ali, W. A. E. (2021). High gain, wideband and low mutual coupling AMC-based millimeter wave MIMO antenna for 5G NR networks. AEU - International Journal of Electronics and Communications, 142. https://doi.org/10.1016/j.aeue.2021.153990

Ibrahimi, K., Jouhari, M., Sow, S., Ayoub, F., Kamili, M. E., & Chougdali, K. (2024). Reinforcement learning for optimized resource allocation in 5G URLLC. CommNet 2024. https://doi.org/10.1109/CommNet63022.2024.10793310

Konstantoulas, I., Loi, I., Tsimas, D., Sgarbas, K., Gkamas, A., & Bouras, C. (2025). A framework for user traffic prediction and resource allocation in 5G networks. Applied Sciences, 15(13). https://doi.org/10.3390/app15137603

Koutlia, K., Bojovic, B., Ali, Z., & Lagén, S. (2022). Calibration of the 5G-LENA system level simulator in 3GPP reference scenarios. Simulation Modelling Practice and Theory. https://doi.org/10.1016/j.simpat.2022.102580

Li, S., Tang, Q., Pang, Y., Ma, X., & Wang, G. (2023). Realistic actor-critic: A framework for balancing value overestimation and underestimation. Frontiers in Neurorobotics, 16, 1081242. https://doi.org/10.3389/fnbot.2022.1081242

Liu, L., & Xu, Z. (2025). Combining meta reinforcement learning with neural plasticity mechanisms for improved AI performance. PLOS ONE, 20(5). https://doi.org/10.1371/journal.pone.0320777

Mao, L., Ma, Z., & Li, X. (2025). A multi-task dynamic weight optimization framework based on deep reinforcement learning. Applied Sciences, 15(5), 2473. https://doi.org/10.3390/app15052473

Mehta, D. (2020). State-of-the-art reinforcement learning algorithms. International Journal of Engineering Research & Technology, 8(12), 717–722. https://doi.org/10.17577/IJERTV8IS120332

Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T. P., Harley, T., Silver, D., & Kavukcuoglu, K. (2016). Asynchronous methods for deep reinforcement learning. arXiv preprint. https://doi.org/10.48550/arXiv.1602.01783

Mollahasani, S., Erol-Kantarci, M., Hirab, M., Dehghan, H., & Wilson, R. (2022). Actor-critic learning-based QoS-aware scheduler for reconfigurable wireless networks. IEEE Transactions on Network Science and Engineering, 9(1), 45–54. https://doi.org/10.1109/TNSE.2021.3070476

Nahhas, A., Kharitonov, A., & Turowski, K. (2022). Deep reinforcement learning techniques for solving hybrid flow shop scheduling problems: Proximal policy optimization and asynchronous advantage actor-critic. HICSS 2022. https://doi.org/10.24251/HICSS.2022.206

Navarro-Ortiz, J., Romero-Diaz, P., Sendra, S., Ameigeiras, P., Ramos-Munoz, J. J., & Lopez-Soler, J. M. (2020). A survey on 5G usage scenarios and traffic models. IEEE Communications Surveys and Tutorials, 22(2), 905–929. https://doi.org/10.1109/COMST.2020.2971781

Patriciello, N., Lagén, S., Bojovic, B., & Giupponi, L. (2019). An end-to-end simulator for 5G NR networks. Simulation Modelling Practice and Theory, 96. https://doi.org/10.1016/j.simpat.2019.101933

Paz-Perez, A., Tato, A., Escudero-Garzas, J. J., & Gomez-Cuba, F. (2024). Flexible reinforcement learning scheduler for 5G networks. IEEE ICMLCN 2024, 566–572. https://doi.org/10.1109/ICMLCN59089.2024.10625129

Raza, W., Farooq, M. U. B., Ijaz, A., Manalastas, M., & Imran, A. (2025). AI-powered resilience: A dual-approach for outage management in dense cellular networks. Computer Communications, 236. https://doi.org/10.1016/j.comcom.2025.108129

Samidi, F. S., Radzi, N. A. M., & Aripin, N. M. (2024). Reinforcement learning model selection for resource allocation and subcarrier spacing optimization in 5G sliced spectrum networks. IEEE ICAEE 2024. https://doi.org/10.1109/ICAEE62924.2024.10667637

Sánchez, J. A. H., Casilimas, K., & Rendon, O. M. C. (2022). Deep reinforcement learning for resource management on network slicing: A survey. Sensors. https://doi.org/10.3390/s22083031

Seid, A. M., Boateng, G. O., Mareri, B., Sun, G., & Jiang, W. (2021). Multi-agent deep reinforcement learning for task offloading and resource allocation in multi-UAV IoT edge networks. IEEE Transactions on Network and Service Management, 18(4), 4531–4547. https://doi.org/10.1109/TNSM.2021.3096673

Sewak, M. (2019). Actor-critic models and the A3C. In Deep reinforcement learning. Springer. https://doi.org/10.1007/978-981-13-8285-7_11

Tan, K. H., Lim, H. S., & Diong, K. S. (2022). Modelling and predicting quality-of-experience of online gaming users in 5G networks. International Journal of Technology, 13(5), 1035–1044. https://doi.org/10.14716/ijtech.v13i5.5866

Tsoulos, G., Athanasiadou, G., Zarbouti, D., Nikitopoulos, G., Tsoulos, V., & Christopoulos, N. (2024). Empirical analysis of 5G deployments: A comparative assessment of network performance with 4G. AEU - International Journal of Electronics and Communications, 186. https://doi.org/10.1016/j.aeue.2024.155479

Wai, J. H., Lee, Y. L., & Ke, F. (2021). Combined metric-based resource scheduling for 5G networks. IEEE MICC 2021, 19–24. https://doi.org/10.1109/MICC53484.2021.9642139

Wu, J., Wu, Q. M. J., Chen, S., Pourpanah, F., & Huang, D. (2022). A-TD3: An adaptive asynchronous twin delayed deep deterministic algorithm. IEEE Access, 10, 128077–128089. https://doi.org/10.1109/ACCESS.2022.3226446

Yan, P., Lu, J., Zeng, H., & Hou, Y. T. (2025). Near-real-time resource slicing for QoS optimization in 5G O-RAN using deep reinforcement learning. arXiv preprint. https://doi.org/10.48550/arXiv.2509.14343

Yan, P., Lu, J., Zeng, H., & Hou, Y. T. (2026). Near-real-time resource slicing for QoS optimization in 5G O-RAN using deep reinforcement learning. IEEE/ACM Transactions on Networking, 34, 1596–1611. https://doi.org/10.1109/TON.2025.3628209

Zhou, H., Elsayed, M., & Erol-Kantarci, M. (2021). RAN resource slicing in 5G using multi-agent correlated Q-learning. IEEE PIMRC 2021. https://doi.org/10.1109/PIMRC50174.2021.9569358

Download PDF

Who cite this paper

Table of Contents

Article

Abstract

References