A Reinforcement Learning Approach to Optimize Energy Usage in RF-Charging Sensor Networks

We consider a Radio Frequency (RF)-charging network where sensor devices harvest energy from a solar-powered Hybrid Access Point (HAP) and transmit their data to the HAP. We aim to optimize the power allocation of both the HAP and devices to maximize their Energy Efficiency (EE), which is defined as the total received data (in bits) for each Joule of consumed energy. Unlike prior works, we consider the case where both the HAP and devices have causal knowledge of channel state information and their energy arrival process. We model the the power problem as a Two-Layer Markov Decision Process (TMDP), where the first layer corresponds to the HAP and the second layer consists of devices. We then outline a novel, decentralized Q-Learning (QL) solution that employs linear function approximation to represent the large state space. The simulation results show that when the HAP and devices employ our solution, their EE is orders of magnitude higher than competing policies.