Published at : 29 May 2026
Volume : IJtech
Vol 17, No 3 (2026)
DOI : https://doi.org/10.14716/ijtech.v17i3.8208
| Mohamed Mohsen Farouk | Engineering Science, Faculty of Artificial Intelligence and Engineering (FAIE), Multimedia University, 63000 Cyberjaya, Selangor, Malaysia |
| Chung Gwo Chin | Faculty of Artificial Intelligence and Engineering (FAIE), Multimedia University, 63000 Cyberjaya, Selangor, Malaysia |
| Mardeni Roslee | Faculty of Artificial Intelligence and Engineering (FAIE), Multimedia University, 63000 Cyberjaya, Selangor, Malaysia |
| Lee It Ee | Faculty of Artificial Intelligence and Engineering (FAIE), Multimedia University, 63000 Cyberjaya, Selangor, Malaysia |
| Pang Wai Leong | School of Engineering Faculty of Innovation & Technology, Taylor’s University, 47500 Subang Jaya, Selangor, Malaysia |
Resource allocation in wireless networks is inherently complex, a problem intensified in 5G by heterogeneous traffic classes and stringent quality of service (QoS) requirements. This challenge poses significant difficulties for traditional scheduling methods. In this study, we address these limitations using novel hybrid reinforcement learning (RL) architectures evaluated in a dynamic and realistic network environment. We designed and implemented three hybrid RL algorithms: Asynchronous Advantage Actor-Critic integrated with Proximal Policy Optimization (A3C-PPO), A3C with Proximal Policy Optimization and Session Persistence (A3C-PPO-Persistent), and A3C with Twin Delayed Deep Deterministic Policy Gradients (A3C-TD3). These were compared against baseline A3C and Advantage Actor-Critic (A2C) approaches, as well as traditional proportional fair (PF), maximum rate (MR), and Round Robin (RR) schedulers. Simulations were performed in a challenging multicell environment with mobile user equipment and bursty traffic flows across four network traffic types: ultra-low latency (ULL), voice over IP (VoIP), vehicle-to-everything (V2X), and video streaming. Our hybrid RL schedulers showed promising performance in this highly dynamic setting, with A3C-PPO achieving the most balanced overall results, exhibiting 25%–40% lower average jitter and over four times higher packet delivery ratio (PDR) than traditional schedulers under heavy loads. Our results indicate that hybrid RL methods, particularly A3C+PPO, can provide resilient adaptive scheduling that can outperform both conventional techniques and standard RL algorithm models in realistic 5G networks.
5G; Advantage Actor-Critic; Burst traffic; Reinforcement learning
Akyildiz, H. A., Gemici, O. F., Hokelek,
I., & Cirpan, H. A. (2024). Hierarchical reinforcement learning based
resource allocation for RAN slicing. IEEE Access, 12, 75818–75831. https://doi.org/10.1109/ACCESS.2024.3406949
Alanazi, R., Obayya, M., Alghamdi, A. M.,
Nemri, N., Alshahrani, S., Alduaiji, N., Hasanin, T., & Sorour, S. (2025).
Machine learning-driven routing optimization for energy-efficient 6G-enabled
wireless sensor networks. Alexandria Engineering Journal, 129, 877–888. https://doi.org/10.1016/j.aej.2025.07.032
Alsenwi, M., Tran, N. H., Bennis, M.,
Pandey, S. R., Bairagi, A. K., & Hong, C. S. (2021). Intelligent resource
slicing for eMBB and URLLC coexistence in 5G and beyond: A deep reinforcement
learning approach. IEEE Transactions on Wireless Communications. https://doi.org/10.1109/TWC.2021.3060514
Ashraf, N. M., Mostafa, R. R., Sakr, R.
H., & Rashad, M. Z. (2021). Optimizing hyperparameters of deep
reinforcement learning for autonomous driving based on whale optimization
algorithm. PLOS ONE, 16(6), e0252754. https://doi.org/10.1371/journal.pone.0252754
Benmadani, H. E., Azni, M., Alharbi, T.
E., Alzaidi, M. S., & Tounsi, M. (2025). Deep reinforcement learning-based
dynamic scheduling for real-time applications in LTE and 5G. IEEE Access,
13, 33555–33570. https://doi.org/10.1109/ACCESS.2025.3541531
Bikkasani, D., & Yerabolu, M. (2024).
AI-driven 5G network optimization: A review of resource allocation, traffic
management, and network slicing. American Journal of Artificial Intelligence,
8(2), 55–62. https://doi.org/10.11648/j.ajai.20240802.14
Bozis, E. Z. G., Sagias, N. C.,
Batistatos, M. C., Kourtis, M. A., Xilouris, G. K., & Kourtis, A. (2024).
Enhancing 5G performance: A standalone system platform with customizable
features. AEU - International Journal of Electronics and Communications,
187. https://doi.org/10.1016/j.aeue.2024.155515
Carneiro, D. P. Q., Cardoso, A. A., &
Vieira, F. H. T. (2023). Adaptive resource allocation in 5G systems using
reinforcement learning. Neural Computing and Applications, 35(13),
9421–9435. https://doi.org/10.1007/s00521-023-08406-2
Del Rio, A., Jimenez, D., & Serrano,
J. (2024). Comparative analysis of A3C and PPO algorithms in reinforcement
learning: A survey on general environments. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3472473
Elsayed, M., & Erol-Kantarci, M.
(2019). Reinforcement learning-based joint power and resource allocation for
URLLC in 5G. IEEE GLOBECOM, 1–6. https://doi.org/10.1109/GLOBECOM38437.2019.9014032
Gaw?owicz, P., & Zubow, A. (2019).
Ns-3 meets OpenAI Gym: Machine learning for networking research. MSWiM, 113–120. https://doi.org/10.1145/3345768.3355908
Gedikli, A. M., Koseoglu, M., & Sen, S. (2022). Deep
reinforcement learning-based flexible preamble allocation for RAN slicing. Computer
Networks, 215. https://doi.org/10.1016/j.comnet.2022.109202
Guo, Y., & Xie, Y. (2025). Adaptive
network planning for 5G/6G networks under burst traffic. IEEE Communications
Letters. https://doi.org/10.1109/LCOMM.2025.3537863
Hattori, R., Hedrick, N. G., Jain, A.,
Chen, S., You, H., Hattori, M., Choi, J.-H., Lim, B. K., Yasuda, R., &
Komiyama, T. (2023). Meta-reinforcement learning via orbitofrontal cortex. Nature
Neuroscience, 26(12), 2182–2191. https://doi.org/10.1038/s41593-023-01485-3
Ibrahim, A. A., & Ali, W. A. E.
(2021). High gain, wideband and low mutual coupling AMC-based millimeter wave
MIMO antenna for 5G NR networks. AEU - International Journal of Electronics
and Communications, 142. https://doi.org/10.1016/j.aeue.2021.153990
Ibrahimi, K., Jouhari, M., Sow, S.,
Ayoub, F., Kamili, M. E., & Chougdali, K. (2024). Reinforcement learning
for optimized resource allocation in 5G URLLC. CommNet 2024. https://doi.org/10.1109/CommNet63022.2024.10793310
Konstantoulas, I., Loi, I., Tsimas, D.,
Sgarbas, K., Gkamas, A., & Bouras, C. (2025). A framework for user traffic
prediction and resource allocation in 5G networks. Applied Sciences,
15(13). https://doi.org/10.3390/app15137603
Koutlia, K., Bojovic, B., Ali, Z., &
Lagén, S. (2022). Calibration of the 5G-LENA system level simulator in 3GPP
reference scenarios. Simulation Modelling Practice and Theory. https://doi.org/10.1016/j.simpat.2022.102580
Li, S., Tang, Q., Pang, Y., Ma, X., & Wang, G. (2023). Realistic
actor-critic: A framework for balancing value overestimation and
underestimation. Frontiers in Neurorobotics, 16, 1081242. https://doi.org/10.3389/fnbot.2022.1081242
Liu, L., & Xu, Z. (2025). Combining
meta reinforcement learning with neural plasticity mechanisms for improved AI
performance. PLOS ONE, 20(5). https://doi.org/10.1371/journal.pone.0320777
Mao, L., Ma, Z., & Li, X. (2025). A
multi-task dynamic weight optimization framework based on deep reinforcement
learning. Applied Sciences, 15(5), 2473. https://doi.org/10.3390/app15052473
Mehta, D. (2020). State-of-the-art
reinforcement learning algorithms. International Journal of Engineering
Research & Technology, 8(12), 717–722. https://doi.org/10.17577/IJERTV8IS120332
Mnih, V., Badia, A. P., Mirza, M.,
Graves, A., Lillicrap, T. P., Harley, T., Silver, D., & Kavukcuoglu, K.
(2016). Asynchronous methods for deep reinforcement learning. arXiv preprint.
https://doi.org/10.48550/arXiv.1602.01783
Mollahasani, S., Erol-Kantarci, M., Hirab, M., Dehghan, H., & Wilson,
R. (2022). Actor-critic learning-based QoS-aware scheduler for
reconfigurable wireless networks. IEEE Transactions on Network Science and
Engineering, 9(1), 45–54. https://doi.org/10.1109/TNSE.2021.3070476
Nahhas, A., Kharitonov, A., & Turowski, K. (2022). Deep
reinforcement learning techniques for solving hybrid flow shop scheduling
problems: Proximal policy optimization and asynchronous advantage actor-critic.
HICSS 2022. https://doi.org/10.24251/HICSS.2022.206
Navarro-Ortiz, J., Romero-Diaz, P.,
Sendra, S., Ameigeiras, P., Ramos-Munoz, J. J., & Lopez-Soler, J. M.
(2020). A survey on 5G usage scenarios and traffic models. IEEE
Communications Surveys and Tutorials, 22(2), 905–929. https://doi.org/10.1109/COMST.2020.2971781
Patriciello, N., Lagén, S., Bojovic, B.,
& Giupponi, L. (2019). An end-to-end simulator for 5G NR networks. Simulation
Modelling Practice and Theory, 96. https://doi.org/10.1016/j.simpat.2019.101933
Paz-Perez, A., Tato, A., Escudero-Garzas,
J. J., & Gomez-Cuba, F. (2024). Flexible reinforcement learning scheduler
for 5G networks. IEEE ICMLCN 2024, 566–572. https://doi.org/10.1109/ICMLCN59089.2024.10625129
Raza, W., Farooq, M. U. B., Ijaz, A.,
Manalastas, M., & Imran, A. (2025). AI-powered resilience: A dual-approach
for outage management in dense cellular networks. Computer Communications,
236. https://doi.org/10.1016/j.comcom.2025.108129
Samidi, F. S., Radzi, N. A. M., &
Aripin, N. M. (2024). Reinforcement learning model selection for resource
allocation and subcarrier spacing optimization in 5G sliced spectrum networks. IEEE
ICAEE 2024. https://doi.org/10.1109/ICAEE62924.2024.10667637
Sánchez, J. A. H., Casilimas, K., &
Rendon, O. M. C. (2022). Deep reinforcement learning for resource management on
network slicing: A survey. Sensors. https://doi.org/10.3390/s22083031
Seid, A. M., Boateng, G. O., Mareri, B., Sun, G., & Jiang, W. (2021). Multi-agent
deep reinforcement learning for task offloading and resource allocation in
multi-UAV IoT edge networks. IEEE Transactions on Network and Service
Management, 18(4), 4531–4547. https://doi.org/10.1109/TNSM.2021.3096673
Sewak, M. (2019). Actor-critic models and
the A3C. In Deep reinforcement learning. Springer. https://doi.org/10.1007/978-981-13-8285-7_11
Tan, K. H., Lim, H. S., & Diong, K.
S. (2022). Modelling and predicting quality-of-experience of online gaming
users in 5G networks. International Journal of Technology, 13(5),
1035–1044. https://doi.org/10.14716/ijtech.v13i5.5866
Tsoulos, G., Athanasiadou, G., Zarbouti,
D., Nikitopoulos, G., Tsoulos, V., & Christopoulos, N. (2024). Empirical
analysis of 5G deployments: A comparative assessment of network performance
with 4G. AEU - International Journal of Electronics and Communications,
186. https://doi.org/10.1016/j.aeue.2024.155479
Wai, J. H., Lee, Y. L., & Ke, F. (2021). Combined metric-based
resource scheduling for 5G networks. IEEE MICC 2021,
19–24. https://doi.org/10.1109/MICC53484.2021.9642139
Wu, J., Wu, Q. M. J., Chen, S., Pourpanah, F., & Huang, D. (2022). A-TD3:
An adaptive asynchronous twin delayed deep deterministic algorithm. IEEE
Access, 10, 128077–128089. https://doi.org/10.1109/ACCESS.2022.3226446
Yan, P., Lu, J., Zeng, H., & Hou, Y.
T. (2025). Near-real-time resource slicing for QoS optimization in 5G O-RAN
using deep reinforcement learning. arXiv preprint. https://doi.org/10.48550/arXiv.2509.14343
Yan, P., Lu, J., Zeng, H., & Hou, Y.
T. (2026). Near-real-time resource slicing for QoS optimization in 5G O-RAN
using deep reinforcement learning. IEEE/ACM Transactions on Networking,
34, 1596–1611. https://doi.org/10.1109/TON.2025.3628209
Zhou, H., Elsayed, M., & Erol-Kantarci, M. (2021). RAN resource slicing in 5G using multi-agent correlated Q-learning. IEEE PIMRC 2021. https://doi.org/10.1109/PIMRC50174.2021.9569358