Abstract: In autonomous air combat, tactics are inherently complex, and control inputs are continuous. Traditional reinforcement learning (RL) algorithms often rely on discretization or independent ...