在上一篇文章中,我们深入探讨了 Soft Actor-Critic (SAC) 算法及其在平衡探索与利用方面的优势。本文将介绍强化学习领域的重要里程碑——Asynchronous Advantage Actor-Critic (A3C) 算法,并展示如何利用 PyTorch 实现并行化训练来加速学习过程。
一、A3C 算法原理
A3C 算法由 DeepMind 于 2016 年提出,通过异步并行的多个智能体(Worker)与环境交互,显著提升了训练效率。其核心思想可概括为:
-
并行架构 多个 Worker 线程同时与各自的环境副本交互,收集经验并计算梯度。
-
参数共享 所有 Worker 共享全局网络参数,定期将本地梯度同步到全局网络。
-
优势函数(Advantage) 使用 A(s,a)=Q(s,a)−V(s)A(s,a)=Q(s,a)−V(s) 衡量动作的相对优势,降低方差。
算法优势
-
高效数据收集:并行交互突破数据相关性限制
-
稳定训练:异步更新缓解梯度冲突
-
资源利用率高:充分利用多核 CPU 资源
二、A3C 实现步骤
我们将使用 PyTorch 实现 A3C 算法解决 CartPole 平衡问题:
-
定义全局网络结构 包含 Actor 和 Critic 两个分支
-
实现异步 Worker 每个 Worker 独立与环境交互并计算梯度
-
设计并行更新机制 异步更新全局网络参数
-
训练与评估
三、代码实现
import torch
import torch.nn as nn
import torch.optim as optim
import torch.multiprocessing as mp
import gym
# 定义全局网络
class GlobalNetwork(nn.Module):
def __init__(self, state_dim, action_dim):
super().__init__()
self.base = nn.Sequential(
nn.Linear(state_dim, 128),
nn.ReLU()
)
self.actor = nn.Linear(128, action_dim)
self.critic = nn.Linear(128, 1)
def forward(self, x):
x = self.base(x)
return torch.softmax(self.actor(x), dim=-1), self.critic(x)
# Worker 进程定义
class Worker(mp.Process):
def __init__(self, global_net, optimizer, global_ep, name):
super().__init__()
self.env = gym.make('CartPole-v1')
self.local_net = GlobalNetwork(4, 2) # 本地网络
self.global_net = global_net
self.optimizer = optimizer
self.global_ep = global_ep
self.name = name
def run(self):
while self.global_ep.value < 300: # 训练300个全局episode
# 同步全局参数到本地
self.local_net.load_state_dict(self.global_net.state_dict())
# 收集轨迹数据
states, actions, rewards = [], [], []
state, _ = self.env.reset() # 正确解包observation
ep_reward = 0
while True:
state_tensor = torch.FloatTensor(state).unsqueeze(0) # 添加批次维度 [1,4]
prob, _ = self.local_net(state_tensor)
action = torch.multinomial(prob, 1).item()
next_state, reward, done, _, _ = self.env.step(action) # 解包所有返回值
ep_reward += reward
states.append(state)
actions.append(action)
rewards.append(reward)
if done:
# 计算优势函数
returns, advantages = self.cal_advantages(states, rewards)
# 异步更新全局网络
self.update_global(states, actions, advantages, returns)
with self.global_ep.get_lock():
self.global_ep.value += 1
print(f"Episode {self.global_ep.value} finished by {self.name} with reward {ep_reward}")
break
state = next_state
def cal_advantages(self, states, rewards):
# 计算回报
returns = []
discounted_reward = 0
for r in reversed(rewards):
discounted_reward = r + 0.99 * discounted_reward
returns.insert(0, discounted_reward)
# 转换为张量并标准化
returns = torch.FloatTensor(returns)
returns = (returns - returns.mean()) / (returns.std() + 1e-8)
# 计算价值估计
states_tensor = torch.FloatTensor(states) # 形状 [n_steps, 4]
_, values = self.local_net(states_tensor)
values = values.squeeze().detach()
# 计算优势函数
advantages = returns - values
return returns, advantages
def update_global(self, states, actions, advantages, returns):
states_tensor = torch.FloatTensor(states) # 形状 [n_steps, 4]
actions_tensor = torch.LongTensor(actions)
advantages = advantages.detach()
returns = returns.detach()
# 计算损失
probs, values = self.local_net(states_tensor)
policy_loss = -torch.log(probs.gather(1, actions_tensor.unsqueeze(1))) * advantages.unsqueeze(1)
policy_loss = policy_loss.mean()
value_loss = nn.MSELoss()(values.squeeze(), returns)
entropy = -torch.sum(probs * torch.log(probs), dim=1).mean()
total_loss = policy_loss + 0.5 * value_loss - 0.01 * entropy
# 反向传播并更新全局网络
self.optimizer.zero_grad()
total_loss.backward()
# 将本地梯度上传到全局网络
for local_param, global_param in zip(self.local_net.parameters(), self.global_net.parameters()):
if global_param.grad is not None:
global_param.grad = local_param.grad.clone()
self.optimizer.step()
if __name__ == '__main__':
# 初始化全局网络
global_net = GlobalNetwork(4, 2)
global_net.share_memory()
optimizer = optim.Adam(global_net.parameters(), lr=0.0002)
global_ep = mp.Value('i', 0)
# 启动4个Worker进程
workers = [Worker(global_net, optimizer, global_ep, f"Worker-{i}") for i in range(4)]
[w.start() for w in workers]
[w.join() for w in workers]
四、关键代码解析
-
网络结构设计
-
共享特征提取层(
base
) -
Actor 输出动作概率分布
-
Critic 评估状态价值
-
-
并行训练机制
-
使用
torch.multiprocessing
实现多进程 -
share_memory()
实现参数共享 -
原子计数器
global_ep
控制训练进度
-
-
优势函数计算 使用 N-step 回报计算优势:
五、训练结果
运行代码将观察到:
-
多个 Worker 并行训练,控制台输出各进程的即时奖励
-
全局训练 episode 达到 300 时自动终止
Episode 1 finished by Worker-1 with reward 34.0 Episode 2 finished by Worker-2 with reward 60.0 Episode 3 finished by Worker-0 with reward 38.0 Episode 4 finished by Worker-3 with reward 46.0 Episode 5 finished by Worker-2 with reward 17.0 Episode 6 finished by Worker-1 with reward 40.0 Episode 7 finished by Worker-3 with reward 56.0 Episode 8 finished by Worker-0 with reward 54.0 Episode 9 finished by Worker-1 with reward 18.0 Episode 10 finished by Worker-2 with reward 59.0 Episode 11 finished by Worker-3 with reward 19.0 Episode 12 finished by Worker-2 with reward 17.0 Episode 13 finished by Worker-0 with reward 32.0 Episode 14 finished by Worker-3 with reward 18.0 Episode 15 finished by Worker-1 with reward 59.0 Episode 16 finished by Worker-1 with reward 19.0 Episode 17 finished by Worker-3 with reward 15.0 Episode 18 finished by Worker-2 with reward 39.0 Episode 19 finished by Worker-1 with reward 12.0 Episode 20 finished by Worker-0 with reward 29.0 Episode 21 finished by Worker-3 with reward 25.0 Episode 22 finished by Worker-2 with reward 14.0 Episode 23 finished by Worker-1 with reward 17.0 Episode 24 finished by Worker-0 with reward 31.0 Episode 25 finished by Worker-3 with reward 17.0 Episode 26 finished by Worker-2 with reward 22.0 Episode 27 finished by Worker-1 with reward 10.0 Episode 28 finished by Worker-0 with reward 12.0 Episode 29 finished by Worker-2 with reward 38.0 Episode 30 finished by Worker-3 with reward 17.0 Episode 31 finished by Worker-0 with reward 16.0 Episode 32 finished by Worker-1 with reward 13.0 Episode 33 finished by Worker-3 with reward 11.0 Episode 34 finished by Worker-2 with reward 36.0 Episode 35 finished by Worker-0 with reward 14.0 Episode 36 finished by Worker-3 with reward 16.0 Episode 37 finished by Worker-1 with reward 12.0 Episode 38 finished by Worker-0 with reward 36.0 Episode 39 finished by Worker-3 with reward 26.0 Episode 40 finished by Worker-1 with reward 35.0 Episode 41 finished by Worker-0 with reward 15.0 Episode 42 finished by Worker-2 with reward 77.0 Episode 43 finished by Worker-2 with reward 15.0 Episode 44 finished by Worker-1 with reward 40.0 Episode 45 finished by Worker-3 with reward 41.0 Episode 46 finished by Worker-0 with reward 61.0 Episode 47 finished by Worker-2 with reward 26.0 Episode 48 finished by Worker-1 with reward 44.0 Episode 49 finished by Worker-3 with reward 43.0 Episode 50 finished by Worker-0 with reward 26.0 Episode 51 finished by Worker-2 with reward 13.0 Episode 52 finished by Worker-1 with reward 42.0 Episode 53 finished by Worker-3 with reward 16.0 Episode 54 finished by Worker-2 with reward 11.0 Episode 55 finished by Worker-3 with reward 38.0 Episode 56 finished by Worker-1 with reward 18.0 Episode 57 finished by Worker-0 with reward 62.0 Episode 58 finished by Worker-1 with reward 12.0 Episode 59 finished by Worker-2 with reward 13.0 Episode 60 finished by Worker-3 with reward 9.0 Episode 61 finished by Worker-0 with reward 23.0 Episode 62 finished by Worker-2 with reward 12.0 Episode 63 finished by Worker-0 with reward 12.0 Episode 64 finished by Worker-3 with reward 11.0 Episode 65 finished by Worker-1 with reward 33.0 Episode 66 finished by Worker-0 with reward 10.0 Episode 67 finished by Worker-2 with reward 14.0 Episode 68 finished by Worker-1 with reward 13.0 Episode 69 finished by Worker-3 with reward 25.0 Episode 70 finished by Worker-0 with reward 21.0 Episode 71 finished by Worker-1 with reward 19.0 Episode 72 finished by Worker-3 with reward 11.0 Episode 73 finished by Worker-2 with reward 49.0 Episode 74 finished by Worker-0 with reward 23.0 Episode 75 finished by Worker-3 with reward 11.0 Episode 76 finished by Worker-2 with reward 18.0 Episode 77 finished by Worker-1 with reward 44.0 Episode 78 finished by Worker-0 with reward 12.0 Episode 79 finished by Worker-3 with reward 30.0 Episode 80 finished by Worker-1 with reward 20.0 Episode 81 finished by Worker-0 with reward 23.0 Episode 82 finished by Worker-2 with reward 14.0 Episode 83 finished by Worker-1 with reward 11.0 Episode 84 finished by Worker-0 with reward 12.0 Episode 85 finished by Worker-3 with reward 30.0 Episode 86 finished by Worker-2 with reward 83.0 Episode 87 finished by Worker-1 with reward 26.0 Episode 88 finished by Worker-0 with reward 17.0 Episode 89 finished by Worker-3 with reward 17.0 Episode 90 finished by Worker-2 with reward 31.0 Episode 91 finished by Worker-1 with reward 28.0 Episode 92 finished by Worker-0 with reward 12.0 Episode 93 finished by Worker-3 with reward 18.0 Episode 94 finished by Worker-2 with reward 12.0 Episode 95 finished by Worker-3 with reward 17.0 Episode 96 finished by Worker-0 with reward 21.0 Episode 97 finished by Worker-1 with reward 39.0 Episode 98 finished by Worker-2 with reward 44.0 Episode 99 finished by Worker-0 with reward 23.0 Episode 100 finished by Worker-3 with reward 12.0 Episode 101 finished by Worker-1 with reward 14.0 Episode 102 finished by Worker-2 with reward 21.0 Episode 103 finished by Worker-0 with reward 12.0 Episode 104 finished by Worker-3 with reward 18.0 Episode 105 finished by Worker-1 with reward 12.0 Episode 106 finished by Worker-0 with reward 30.0 Episode 107 finished by Worker-2 with reward 21.0 Episode 108 finished by Worker-3 with reward 23.0 Episode 109 finished by Worker-1 with reward 34.0 Episode 110 finished by Worker-2 with reward 13.0 Episode 111 finished by Worker-0 with reward 12.0 Episode 112 finished by Worker-3 with reward 14.0 Episode 113 finished by Worker-1 with reward 21.0 Episode 114 finished by Worker-0 with reward 25.0 Episode 115 finished by Worker-2 with reward 18.0 Episode 116 finished by Worker-3 with reward 23.0 Episode 117 finished by Worker-0 with reward 17.0 Episode 118 finished by Worker-2 with reward 16.0 Episode 119 finished by Worker-1 with reward 24.0 Episode 120 finished by Worker-3 with reward 13.0 Episode 121 finished by Worker-0 with reward 25.0 Episode 122 finished by Worker-1 with reward 14.0 Episode 124 finished by Worker-3 with reward 26.0 Episode 123 finished by Worker-2 with reward 13.0 Episode 125 finished by Worker-0 with reward 20.0 Episode 126 finished by Worker-2 with reward 12.0 Episode 127 finished by Worker-1 with reward 15.0 Episode 128 finished by Worker-3 with reward 9.0 Episode 129 finished by Worker-1 with reward 14.0 Episode 130 finished by Worker-0 with reward 15.0 Episode 131 finished by Worker-2 with reward 14.0 Episode 132 finished by Worker-3 with reward 18.0 Episode 133 finished by Worker-2 with reward 31.0 Episode 134 finished by Worker-3 with reward 30.0 Episode 135 finished by Worker-0 with reward 14.0 Episode 136 finished by Worker-1 with reward 16.0 Episode 137 finished by Worker-3 with reward 14.0 Episode 138 finished by Worker-0 with reward 15.0 Episode 140 finished by Worker-2 with reward 25.0 Episode 139 finished by Worker-1 with reward 20.0 Episode 141 finished by Worker-3 with reward 15.0 Episode 142 finished by Worker-0 with reward 15.0 Episode 143 finished by Worker-2 with reward 11.0 Episode 144 finished by Worker-1 with reward 14.0 Episode 145 finished by Worker-3 with reward 17.0 Episode 146 finished by Worker-2 with reward 18.0 Episode 147 finished by Worker-3 with reward 11.0 Episode 148 finished by Worker-1 with reward 26.0 Episode 149 finished by Worker-0 with reward 20.0 Episode 150 finished by Worker-3 with reward 14.0 Episode 151 finished by Worker-2 with reward 20.0 Episode 152 finished by Worker-1 with reward 19.0 Episode 153 finished by Worker-0 with reward 22.0 Episode 154 finished by Worker-2 with reward 15.0 Episode 155 finished by Worker-3 with reward 15.0 Episode 156 finished by Worker-1 with reward 31.0 Episode 157 finished by Worker-0 with reward 18.0 Episode 158 finished by Worker-2 with reward 66.0 Episode 159 finished by Worker-3 with reward 24.0 Episode 160 finished by Worker-1 with reward 25.0 Episode 161 finished by Worker-0 with reward 17.0 Episode 162 finished by Worker-2 with reward 30.0 Episode 163 finished by Worker-3 with reward 20.0 Episode 164 finished by Worker-1 with reward 10.0 Episode 165 finished by Worker-0 with reward 16.0 Episode 166 finished by Worker-2 with reward 13.0 Episode 167 finished by Worker-3 with reward 27.0 Episode 168 finished by Worker-0 with reward 15.0 Episode 169 finished by Worker-1 with reward 11.0 Episode 170 finished by Worker-2 with reward 17.0 Episode 171 finished by Worker-0 with reward 21.0 Episode 172 finished by Worker-3 with reward 45.0 Episode 173 finished by Worker-1 with reward 34.0 Episode 174 finished by Worker-3 with reward 33.0 Episode 175 finished by Worker-2 with reward 22.0 Episode 176 finished by Worker-0 with reward 15.0 Episode 177 finished by Worker-1 with reward 14.0 Episode 178 finished by Worker-2 with reward 15.0 Episode 179 finished by Worker-3 with reward 30.0 Episode 180 finished by Worker-1 with reward 12.0 Episode 181 finished by Worker-2 with reward 17.0 Episode 182 finished by Worker-3 with reward 9.0 Episode 183 finished by Worker-0 with reward 57.0 Episode 184 finished by Worker-0 with reward 11.0 Episode 185 finished by Worker-1 with reward 15.0 Episode 186 finished by Worker-3 with reward 25.0 Episode 187 finished by Worker-0 with reward 23.0 Episode 188 finished by Worker-2 with reward 22.0 Episode 189 finished by Worker-1 with reward 35.0 Episode 190 finished by Worker-3 with reward 20.0 Episode 191 finished by Worker-0 with reward 15.0 Episode 192 finished by Worker-2 with reward 32.0 Episode 193 finished by Worker-1 with reward 11.0 Episode 194 finished by Worker-3 with reward 11.0 Episode 195 finished by Worker-0 with reward 19.0 Episode 196 finished by Worker-2 with reward 23.0 Episode 197 finished by Worker-1 with reward 14.0 Episode 198 finished by Worker-3 with reward 13.0 Episode 199 finished by Worker-0 with reward 23.0 Episode 200 finished by Worker-2 with reward 38.0 Episode 201 finished by Worker-1 with reward 23.0 Episode 202 finished by Worker-3 with reward 20.0 Episode 203 finished by Worker-0 with reward 17.0 Episode 204 finished by Worker-1 with reward 15.0 Episode 205 finished by Worker-2 with reward 47.0 Episode 206 finished by Worker-0 with reward 17.0 Episode 207 finished by Worker-3 with reward 16.0 Episode 208 finished by Worker-1 with reward 28.0 Episode 209 finished by Worker-2 with reward 18.0 Episode 210 finished by Worker-0 with reward 17.0 Episode 211 finished by Worker-3 with reward 16.0 Episode 212 finished by Worker-2 with reward 29.0 Episode 213 finished by Worker-1 with reward 20.0 Episode 214 finished by Worker-0 with reward 47.0 Episode 215 finished by Worker-3 with reward 18.0 Episode 216 finished by Worker-1 with reward 28.0 Episode 217 finished by Worker-2 with reward 17.0 Episode 218 finished by Worker-0 with reward 12.0 Episode 219 finished by Worker-3 with reward 47.0 Episode 220 finished by Worker-2 with reward 18.0 Episode 221 finished by Worker-1 with reward 16.0 Episode 222 finished by Worker-0 with reward 10.0 Episode 223 finished by Worker-3 with reward 11.0 Episode 224 finished by Worker-2 with reward 15.0 Episode 225 finished by Worker-1 with reward 21.0 Episode 226 finished by Worker-3 with reward 30.0 Episode 227 finished by Worker-0 with reward 22.0 Episode 228 finished by Worker-2 with reward 39.0 Episode 229 finished by Worker-1 with reward 65.0 Episode 230 finished by Worker-3 with reward 29.0 Episode 231 finished by Worker-0 with reward 38.0 Episode 232 finished by Worker-2 with reward 46.0 Episode 233 finished by Worker-3 with reward 14.0 Episode 234 finished by Worker-1 with reward 16.0 Episode 235 finished by Worker-0 with reward 10.0 Episode 236 finished by Worker-2 with reward 13.0 Episode 237 finished by Worker-3 with reward 9.0 Episode 238 finished by Worker-1 with reward 10.0 Episode 239 finished by Worker-0 with reward 25.0 Episode 240 finished by Worker-3 with reward 16.0 Episode 241 finished by Worker-2 with reward 39.0 Episode 242 finished by Worker-1 with reward 22.0 Episode 243 finished by Worker-2 with reward 11.0 Episode 244 finished by Worker-0 with reward 28.0 Episode 245 finished by Worker-3 with reward 10.0 Episode 246 finished by Worker-1 with reward 26.0 Episode 247 finished by Worker-0 with reward 13.0 Episode 248 finished by Worker-2 with reward 20.0 Episode 249 finished by Worker-3 with reward 31.0 Episode 250 finished by Worker-1 with reward 11.0 Episode 251 finished by Worker-3 with reward 18.0 Episode 252 finished by Worker-2 with reward 28.0 Episode 253 finished by Worker-0 with reward 75.0 Episode 254 finished by Worker-1 with reward 79.0 Episode 255 finished by Worker-3 with reward 12.0 Episode 256 finished by Worker-2 with reward 57.0 Episode 257 finished by Worker-0 with reward 80.0 Episode 258 finished by Worker-1 with reward 61.0 Episode 259 finished by Worker-3 with reward 17.0 Episode 260 finished by Worker-0 with reward 9.0 Episode 261 finished by Worker-2 with reward 15.0 Episode 262 finished by Worker-1 with reward 15.0 Episode 263 finished by Worker-3 with reward 26.0 Episode 264 finished by Worker-0 with reward 17.0 Episode 265 finished by Worker-1 with reward 20.0 Episode 266 finished by Worker-2 with reward 11.0 Episode 267 finished by Worker-3 with reward 28.0 Episode 268 finished by Worker-0 with reward 12.0 Episode 269 finished by Worker-2 with reward 13.0 Episode 270 finished by Worker-3 with reward 37.0 Episode 271 finished by Worker-1 with reward 40.0 Episode 272 finished by Worker-0 with reward 11.0 Episode 273 finished by Worker-2 with reward 21.0 Episode 274 finished by Worker-3 with reward 19.0 Episode 275 finished by Worker-1 with reward 42.0 Episode 276 finished by Worker-2 with reward 18.0 Episode 277 finished by Worker-3 with reward 11.0 Episode 278 finished by Worker-0 with reward 97.0 Episode 279 finished by Worker-1 with reward 10.0 Episode 280 finished by Worker-2 with reward 11.0 Episode 281 finished by Worker-0 with reward 13.0 Episode 282 finished by Worker-3 with reward 15.0 Episode 283 finished by Worker-1 with reward 15.0 Episode 284 finished by Worker-2 with reward 36.0 Episode 285 finished by Worker-3 with reward 14.0 Episode 286 finished by Worker-0 with reward 14.0 Episode 287 finished by Worker-1 with reward 14.0 Episode 288 finished by Worker-2 with reward 11.0 Episode 289 finished by Worker-1 with reward 21.0 Episode 290 finished by Worker-0 with reward 41.0 Episode 291 finished by Worker-3 with reward 13.0 Episode 292 finished by Worker-2 with reward 12.0 Episode 293 finished by Worker-1 with reward 9.0 Episode 294 finished by Worker-2 with reward 13.0 Episode 295 finished by Worker-0 with reward 12.0 Episode 296 finished by Worker-3 with reward 50.0 Episode 297 finished by Worker-1 with reward 65.0 Episode 298 finished by Worker-2 with reward 56.0 Episode 299 finished by Worker-0 with reward 25.0 Episode 300 finished by Worker-3 with reward 18.0 Episode 301 finished by Worker-1 with reward 17.0 Episode 302 finished by Worker-2 with reward 15.0 Episode 303 finished by Worker-0 with reward 30.0
六、总结与扩展
本文实现了 A3C 算法的核心逻辑,展示了并行化训练在强化学习中的优势。读者可以尝试以下扩展:
-
在 Atari 游戏等复杂环境中测试算法
-
调整 Worker 数量观察训练效率变化
-
实现梯度裁剪等稳定化技巧
在下一篇文章中,我们将探讨分布式强化学习的进阶技术——IMPALA 算法,敬请期待!
注意事项:
-
本代码需在支持多进程的环境下运行(如 Linux)
-
可添加
torch.nn.utils.clip_grad_norm_
防止梯度爆炸 -
使用
gym.wrappers.Monitor
可录制智能体行为
希望本文能帮助您理解 A3C 算法的核心思想与实践方法!欢迎在评论区交流讨论。