TY - JOUR
TI - Deep reinforcement learning with discrete normalized advantage functions for resource management in network slicing
AU - Qi, Chen
AU - Hua, Yuxiu
AU - Li, Rongpeng
AU - Zhao, Zhifeng
AU - Zhang, Honggang
T2 - IEEE Communications Letters
AB - Network slicing promises to provision diversified services with distinct requirements in one infrastructure. Deep reinforcement learning (e.g., deep Q-learning, DQL) is assumed to be an appropriate algorithm to solve the demand-aware inter-slice resource management issue in network slicing by regarding the varying demands and the allocated bandwidth as the environment state and the action, respectively. However, allocating bandwidth in a finer resolution usually implies larger action space, and unfortunately DQL fails to quickly converge in this case. In this paper, we introduce discrete normalized advantage functions (DNAF) into DQL, by separating the Q-value function as a state-value function term and an advantage term and exploiting a deterministic policy gradient descent (DPGD) algorithm to avoid the unnecessary calculation of Q-value for every state-action pair. Furthermore, as DPGD only works in continuous action space, we embed a k-nearest neighbor algorithm into DQL to quickly find a valid action in the discrete space nearest to the DPGD output. Finally, we verify the faster convergence of the DNAF-based DQL through extensive simulations.
DA - 2019/08//
PY - 2019
DP - IEEE Xplore
VL - 23
IS - 6
SP - 1337
EP - 1341
J2 - IEEE Commun. Lett.
SN - 1089-7798
UR - https://www.rongpeng.info/files/Paper_CommLett2019DNAF.pdf
KW - Artificial neural networks
KW - Bandwidth
KW - Convergence
KW - Network slicing
KW - Quality of experience
KW - Reinforcement learning
KW - Resource management
ER -