【论文】https://arxiv.org/abs/1705.03098v2
【pytorch】(本文代码参考)weigq/3d_pose_baseline_pytorch: A simple baseline for 3d human pose estimation in PyTorch. (github.com)
【tensorflow】https://github.com/una-dinosauria/3d-pose-baseline
基本上算作是2d人体姿态提升到3d这个pineline的开山之作
一.核心思想
将三维位姿估计解耦为已深入研究的二维姿态估计问题[30,50]和基于二维关节检测的三维姿态估计问题中
1.1 网络设计
网络的构建块是一个线性层,然后是批量归一化、退出和RELU激活。重复两次,两个块被包裹在一个残差连接中。外层块重复两次。系统的输入是一个二维关节位置数组,输出是一系列三维关节位置。
1.1.1 为什么使用线性层?
大多数用于3d人体姿势估计的深度学习方法都是基于卷积神经网络,该网络学习可应用于整个图像[13,24,32,33,45]或二维关节位置热图[33,56]的平移不变滤波器。然而,由于我们处理的是低维点作为输入和输出,我们可以使用更简单、计算成本更低的线性层。RELUs[29]是在深度神经网络中添加非线性的标准选择。
二.层结构
一个带残差的线性层,该层包含两个子层,两个子层完全相同,都是 Linear-BN-Relu-dropout的结构
class Linear(nn.Module):
def __init__(self, linear_size, p_dropout=0.5):
super(Linear, self).__init__()
self.l_size = linear_size
self.relu = nn.ReLU(inplace=True)
self.dropout = nn.Dropout(p_dropout)
self.w1 = nn.Linear(self.l_size, self.l_size)
self.batch_norm1 = nn.BatchNorm1d(self.l_size)
self.w2 = nn.Linear(self.l_size, self.l_size)
self.batch_norm2 = nn.BatchNorm1d(self.l_size)
def forward(self, x):
y = self.w1(x)
y = self.batch_norm1(y)
y = self.relu(y)
y = self.dropout(y)
y = self.w2(y)
y = self.batch_norm2(y)
y = self.relu(y)
y = self.dropout(y)
out = x + y
return out
三.模型整体结构
在基本层结构之前,添加了一个预处理和输出,同样也是Linear-BN-Relu-dropout的结构,预处理只是从输入维度input_size(16 * 2),即16个关键点的2维位置,映射到线性层的维度linear_size,输出只是从线性层的维度linear_size映射到输出维度output_size(16 * 3),即16个关键点的3维位置
class LinearModel(nn.Module):
def __init__(self,
linear_size=1024,
num_stage=2,
p_dropout=0.5):
super(LinearModel, self).__init__()
self.linear_size = linear_size
self.p_dropout = p_dropout
self.num_stage = num_stage
# 2d joints
self.input_size = 16 * 2
# 3d joints
self.output_size = 16 * 3
# process input to linear size
self.w1 = nn.Linear(self.input_size, self.linear_size)
self.batch_norm1 = nn.BatchNorm1d(self.linear_size)
self.linear_stages = []
for l in range(num_stage):
self.linear_stages.append(Linear(self.linear_size, self.p_dropout))
self.linear_stages = nn.ModuleList(self.linear_stages)
# post processing
self.w2 = nn.Linear(self.linear_size, self.output_size)
self.relu = nn.ReLU(inplace=True)
self.dropout = nn.Dropout(self.p_dropout)
def forward(self, x):
# pre-processing
y = self.w1(x)
y = self.batch_norm1(y)
y = self.relu(y)
y = self.dropout(y)
# linear layers
for i in range(self.num_stage):
y = self.linear_stages[i](y)
y = self.w2(y)
return y