dueling network原理和实现

news2024/11/24 14:30:54

算法原理：
$\begin{gathered}Q(s,a;\theta,\alpha,\beta)=V(s;\theta,\beta)+\left(A(s,a;\theta,\alpha)-\max_{a'\in|\mathcal{A}|}A(s,a';\theta,\alpha)\right).\end{gathered}$
注：DuelingNetwork只是改变最优动作价值网络的架构，原本用来训练DQN的策略依然可以使用：

1、优先级经验回放；

2、Double DQN;

3、Multi-step TD;

在这里插入图片描述

代码实现，只需要将原来的DQN的最优动作价值网络修改成Dueling Network的形式：

class DuelingNetwork(nn.Module):
    """QNet.
    Input: feature
    Output: num_act of values
    """

    def __init__(self, dim_state, num_action):
        super().__init__()
        # A分支
        self.a_fc1 = nn.Linear(dim_state, 64)
        self.a_fc2 = nn.Linear(64, 32)
        self.a_fc3 = nn.Linear(32, num_action)
        # V分支
        self.v_fc1 = nn.Linear(dim_state, 64)
        self.v_fc2 = nn.Linear(64, 32)
        self.v_fc3 = nn.Linear(32, 1)

    def forward(self, state):
        # 计算A
        a_x = F.relu(self.a_fc1(state))
        a_x = F.relu(self.a_fc2(a_x))
        a_x = self.a_fc3(a_x)
        # 计算V
        v_x = F.relu(self.v_fc1(state))
        v_x = F.relu(self.v_fc2(v_x))
        v_x = self.v_fc3(v_x)
        # 计算输出
        x = a_x - v_x - a_x.max()
        return x