Abstract:
To address the overestimation bias and poor decision accuracy of conventional microgrid cluster energy management methods, an energy management strategy based on improved double deep Q-network is proposed. Firstly, this study constructed a dual-objective value network framework based on clipped double Q-learning, which enhances decision-making precision by suppressing value overestimation bias through parallel computation of temporal difference (TD) targets for dual value networks and clipping high TD target values. And then, a dynamic greedy strategy was adopted to calculate the value function of all possible actions based on the current state, avoiding persistent exploitation of the greedy actions to ensure sufficient exploration and prevent premature convergence of the agent. Finally, a case study of a microgrid cluster with three sub-microgrids was conducted for verification. The simulation results show that compared to the energy management strategies based on model predictive control and conventional double deep Q-network, the proposed method achieves superior optimization performance and convergence characteristics, while reducing system operating costs by 44.62% and 26.39% respectively.