之前写过FAMI-Pose的论文解析,最近跑了一下官方代码,链接是:FAMI-Pose,但有很多问题,感觉是不是作者上传错了。这篇博客讲一下FAMI-Pose的训练。
运行
首先,安装环境,这个根据官方requirement.txt来就行。数据集配置在DCPose训练那篇文章有讲解。主要跑的还是posetrack2017,运行命令和DCPose类似,进入tools文件夹,python run.py --cfg …/configs/Alignment/posetrack17/Alignment_V15.yaml --train --val即可。
报错与解决
一开始会有如下报错:
File "/home/dsp/ljh/lab/FAMI-Pose/datasets/zoo/posetrack/__init__.py", line 9, in <module>
from .PoseTrack_Alignment import PoseTrack_Alignment
File "/home/dsp/ljh/lab/FAMI-Pose/datasets/zoo/posetrack/PoseTrack_Alignment.py", line 15, in <module>
from datasets.process import get_affine_transform, fliplr_joints, exec_affine_transform, generate_heatmaps, \
File "/home/dsp/ljh/lab/FAMI-Pose/datasets/process/__init__.py", line 16, in <module>
from .structure import *
File "/home/dsp/ljh/lab/FAMI-Pose/datasets/process/structure/__init__.py", line 9, in <module>
from .keypoints_ord import coco2posetrack_ord, coco2posetrack_ord_infer,coco2jhmdb_ord_infer
File "/home/dsp/ljh/lab/FAMI-Pose/datasets/process/structure/keypoints_ord.py", line 10, in <module>
from datasets.zoo.coco import COCO_joint, COCO_joint_paris
ModuleNotFoundError: No module named 'datasets.zoo.coco'
查看发现,在datasets下的zoo文件夹中没有coco这个东西,包括jhmdb也没有,只有posetrack,如下图所示。估计是作者忘记上传了?
但是keypoints_ord.py文件的函数中又用到了COCO_joint,所以我参考了DCPose的该文件,按照DCPose代码进行修改。引用posetrack中的两个东西,并注释掉其他的:
# from datasets.zoo.coco import COCO_joint, COCO_joint_paris
from datasets.zoo.posetrack import PoseTrack_Official_Keypoint_Ordering, PoseTrack_COCO_Keypoint_Ordering
# from datasets.zoo.posetrack.pose_topology import POSETRACK_joint
# from datasets.zoo.jhmdb.pose_topology import JHMDB_Keypoint_Ordering
将coco2posetrack_ord函数和coco2posetrack_ord_infer函数中的src_kps和dst_kps修改一下,并注释掉coco2jhmdb_ord_infer函数,因为跑posetrack用不到,然后把DCPose中zoo/posetrack/pose_skeleton.py粘到FAMI-Pose对应位置。
src_kps = PoseTrack_COCO_Keypoint_Ordering
dst_kps = PoseTrack_Official_Keypoint_Ordering
再次运行会发现有下面报错,无法导入
File "/home/dsp/ljh/lab/FAMI-Pose/datasets/zoo/posetrack/__init__.py", line 9, in <module>
from .PoseTrack_Alignment import PoseTrack_Alignment
File "/home/dsp/ljh/lab/FAMI-Pose/datasets/zoo/posetrack/PoseTrack_Alignment.py", line 15, in <module>
from datasets.process import get_affine_transform, fliplr_joints, exec_affine_transform, generate_heatmaps, \
File "/home/dsp/ljh/lab/FAMI-Pose/datasets/process/__init__.py", line 16, in <module>
from .structure import *
File "/home/dsp/ljh/lab/FAMI-Pose/datasets/process/structure/__init__.py", line 9, in <module>
from .keypoints_ord import coco2posetrack_ord, coco2posetrack_ord_infer,coco2jhmdb_ord_infer
File "/home/dsp/ljh/lab/FAMI-Pose/datasets/process/structure/keypoints_ord.py", line 11, in <module>
from datasets.zoo.posetrack import PoseTrack_Official_Keypoint_Ordering, PoseTrack_COCO_Keypoint_Ordering
ImportError: cannot import name 'PoseTrack_Official_Keypoint_Ordering'
需要在zoo/posetrack/init.py文件中修改如下:
#from .PoseTrack_Alignment import PoseTrack_Alignment
from .pose_skeleton import *
因为引入PoseTrack_Alignment会造成循环引用。之后再次运行run.py,会有如下报错:
File "/home/dsp/ljh/lab/FAMI-Pose/engine/core/__init__.py", line 12, in <module>
from .functions import *
File "/home/dsp/ljh/lab/FAMI-Pose/engine/core/functions/__init__.py", line 7, in <module>
from .alignment_mi_function_term6_1 import AlignmentMIFunction_Term6_V1
File "/home/dsp/ljh/lab/FAMI-Pose/engine/core/functions/alignment_mi_function_term6_1.py", line 28, in <module>
from posetimation.loss.mse_loss import JointMSELoss
File "/home/dsp/ljh/lab/FAMI-Pose/posetimation/loss/__init__.py", line 9, in <module>
from .base import build_loss
File "/home/dsp/ljh/lab/FAMI-Pose/posetimation/loss/base.py", line 11, in <module>
from .integral_loss import IntegralMSELoss, IntegralL1Loss
ModuleNotFoundError: No module named 'posetimation.loss.integral_loss'
这个就是说没有这个loss,查看会发现压根没有integral_loss这个东西,只有mse_loss,也可能是作者忘记上传了。但这里其实也没有用到这个loss,所以就把这部分注释掉即可。
注释掉相关部分后,base.py文件就如下所示:
import logging
# from .integral_loss import IntegralMSELoss, IntegralL1Loss
from .mse_loss import JointMSELoss
logger = logging.getLogger(__name__)
def build_loss(cfg, **kwargs):
if "NAME" in cfg.LOSS:
logger.warning("NAME 将会在之后被删除,请使用NAMES")
if cfg.LOSS.NAME == "MSELOSS":
return JointMSELoss(cfg.LOSS.USE_TARGET_WEIGHT)
# elif cfg.LOSS.NAME == "IntegralMSELoss":
# return IntegralMSELoss(True)
# elif cfg.LOSS.NAME == "IntegralL1Loss":
# return IntegralL1Loss(True)
再次运行run.py会有如下报错:
Traceback (most recent call last):
File "/home/dsp/ljh/lab/FAMI-Pose/tools/run.py", line 46, in <module>
main()
File "/home/dsp/ljh/lab/FAMI-Pose/tools/run.py", line 42, in main
runner.launch()
File "/home/dsp/ljh/lab/FAMI-Pose/engine/defaults/runner.py", line 56, in launch
trainer = DefaultTrainer(self.cfg, self.output_path_dict, PE_Name=self.args.PE_Name)
File "/home/dsp/ljh/lab/FAMI-Pose/engine/defaults/trainer.py", line 33, in __init__
self.dataloader = build_train_loader(cfg)
File "/home/dsp/ljh/lab/FAMI-Pose/datasets/zoo/build.py", line 38, in build_train_loader
dataset = DATASET_REGISTRY.get(dataset_name)(cfg=cfg, phase='train')
File "/home/dsp/ljh/lab/FAMI-Pose/utils/utils_registry.py", line 71, in get
name, self._name
KeyError: "No object named 'PoseTrack_Alignment' found in 'DATASET' registry!"
意思是PoseTrack_Alignment没有注册,我们需要修改datasets下的init.py文件,将PoseTrack_Alignment导入。将下面一行加入即可。
from .zoo.posetrack.PoseTrack_Alignment import PoseTrack_Alignment
再次运行,又会有如下报错:
File "/home/dsp/ljh/lab/FAMI-Pose/engine/defaults/__init__.py", line 16, in <module>
from .runner import DefaultRunner
File "/home/dsp/ljh/lab/FAMI-Pose/engine/defaults/runner.py", line 11, in <module>
from .trainer import DefaultTrainer
File "/home/dsp/ljh/lab/FAMI-Pose/engine/defaults/trainer.py", line 15, in <module>
from datasets import build_train_loader
File "/home/dsp/ljh/lab/FAMI-Pose/datasets/__init__.py", line 12, in <module>
from .zoo.posetrack.PoseTrack_Alignment import PoseTrack_Alignment
File "/home/dsp/ljh/lab/FAMI-Pose/datasets/zoo/posetrack/PoseTrack_Alignment.py", line 27, in <module>
from thirdparty.clustering import k_means
ModuleNotFoundError: No module named 'thirdparty.clustering'
可以发现,在thirdparty文件夹下只有nms,没有clustering,如下图所示。
所以只能注释掉PoseTrack_Alignment.py文件中k_means这一行。其实这一行也是没有用到的。
再次运行会有如下报错:
File "/home/dsp/ljh/lab/FAMI-Pose/engine/defaults/runner.py", line 56, in launch
trainer = DefaultTrainer(self.cfg, self.output_path_dict, PE_Name=self.args.PE_Name)
File "/home/dsp/ljh/lab/FAMI-Pose/engine/defaults/trainer.py", line 33, in __init__
self.dataloader = build_train_loader(cfg)
File "/home/dsp/ljh/lab/FAMI-Pose/datasets/zoo/build.py", line 38, in build_train_loader
dataset = DATASET_REGISTRY.get(dataset_name)(cfg=cfg, phase='train')
File "/home/dsp/ljh/lab/FAMI-Pose/datasets/zoo/posetrack/PoseTrack_Alignment.py", line 78, in __init__
osp.join(self.json_dir, 'posetrack_train.json' if self.is_train else 'posetrack_val.json'))
File "/home/dsp/.local/lib/python3.6/site-packages/pycocotools/coco.py", line 81, in __init__
with open(annotation_file, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/dsp/ljh/lab/FAMI-Pose/DcPose_supp_files/posetrack17_json_files/posetrack_train.json'
这个问题就比较简单了,json文件路径不对,这里就需要修改Base_PoseTrack17.yaml中的一些路径,在上一篇DCPose训练那篇文章有讲过,基本就是json、图片和预训练模型的路径。修改成你自己的PoseTrack数据路径就行了。
然后再次运行run.py,会有如下报错:
File "/home/dsp/ljh/lab/FAMI-Pose/engine/defaults/runner.py", line 56, in launch
trainer = DefaultTrainer(self.cfg, self.output_path_dict, PE_Name=self.args.PE_Name)
File "/home/dsp/ljh/lab/FAMI-Pose/engine/defaults/trainer.py", line 33, in __init__
self.dataloader = build_train_loader(cfg)
File "/home/dsp/ljh/lab/FAMI-Pose/datasets/zoo/build.py", line 38, in build_train_loader
dataset = DATASET_REGISTRY.get(dataset_name)(cfg=cfg, phase='train')
File "/home/dsp/ljh/lab/FAMI-Pose/datasets/zoo/posetrack/PoseTrack_Alignment.py", line 96, in __init__
'17val.json'))
File "/home/dsp/ljh/lab/FAMI-Pose/utils/utils_json.py", line 14, in write_json_to_file
with open(output_path, "w") as write_file:
FileNotFoundError: [Errno 2] No such file or directory: '/media/Z/frunyang/FAMI-Pose/thirdparty/clustering/pose_analysis/17val.json'
会发现在PoseTrack_Alignment.py文件中有下面的代码,是写的绝对路径:
这段代码意义不明,大概是跟聚类相关,但是应该是没用到的,所以将self.clustering改为False就行了。所以我感觉作者是不是传错了代码、、、
再次运行,会有如下报错:
File "/home/dsp/ljh/lab/FAMI-Pose/tools/run.py", line 42, in main
runner.launch()
File "/home/dsp/ljh/lab/FAMI-Pose/engine/defaults/runner.py", line 56, in launch
trainer = DefaultTrainer(self.cfg, self.output_path_dict, PE_Name=self.args.PE_Name)
File "/home/dsp/ljh/lab/FAMI-Pose/engine/defaults/trainer.py", line 49, in __init__
self.core_function = build_core_function(cfg, criterion=self.loss_criterion, **kwargs)
File "/home/dsp/ljh/lab/FAMI-Pose/engine/core/base.py", line 65, in build_core_function
core_function = CORE_FUNCTION_REGISTRY.get(cfg.CORE_FUNCTION)(cfg, **kwargs)
File "/home/dsp/ljh/lab/FAMI-Pose/engine/core/functions/alignment_mi_function_term6_1.py", line 61, in __init__
self.IntegralL1Loss_criterion = IntegralL1Loss()
NameError: name 'IntegralL1Loss' is not defined
这里是因为前面删掉了Integral这个Loss,将alignment_mi_function_term6_1.py中IntegralL1Loss和StructureCosineSimilarity这两行注释掉即可。其实就是有个定义,压根也没用到。
# self.IntegralL1Loss_criterion = IntegralL1Loss()
# self.StructureCosineSimilarityLoss_criterion = StructureCosineSimilarity()
再次运行,会有如下报错(已经说烦了,但这是最后一个了)
File "/home/dsp/ljh/lab/FAMI-Pose/tools/run.py", line 42, in main
runner.launch()
File "/home/dsp/ljh/lab/FAMI-Pose/engine/defaults/runner.py", line 57, in launch
trainer.exec()
File "/home/dsp/ljh/lab/FAMI-Pose/engine/defaults/trainer.py", line 27, in exec
self.train()
File "/home/dsp/ljh/lab/FAMI-Pose/engine/defaults/trainer.py", line 71, in train
tb_writer_dict=self.tb_writer_dict)
File "/home/dsp/ljh/lab/FAMI-Pose/engine/core/functions/alignment_mi_function_term6_1.py", line 104, in train
pred_heatmaps, local_warped_sup_hm_list, kf_bb_heatmaps, mi_loss_list = model(input_x.cuda(), sup_x.cuda())
ValueError: not enough values to unpack (expected 4, got 3)
意思是只能解析出3个变量,但希望的是4个,查看相关代码发现forward函数在训练时返回的变量是3个:
因此将local_warped_sup_hm_list删掉,在下面加一行 local_warped_sup_hm_list=[] 即可。、
再次运行,出现以下情况就代表运行成功了:
互信息损失问题
在运行中会发现loss_MI损失函数都是负的,这个是互信息的损失函数,官方代码中计算kl散度时没有取对数,如下所示。
def feat_feat_mi_estimation(self, F1, F2):
"""
F1: [B,48,96,72]
F2: [B,48,96,72]
F1 -> F2
"""
batch_size = F1.shape[0]
temperature = 0.05
F1 = F1.reshape(batch_size, 48, -1).reshape(batch_size * 48, -1)
F2 = F2.reshape(batch_size, 48, -1).reshape(batch_size * 48, -1)
mi = kl_div(input=self.softmax(F1.detach() / temperature), target=self.softmax(F2 / temperature))
return mi
kl_div的input参数需要使用log_softmax函数,这里只使用了softmax函数,所以才会有负数的loss。
但是奇怪的是,经过几个互信息损失函数的计算之后,最后的损失还是正的,但加上log_softmax就变成了负数了所以,我还是直接用这个代码跑了一个结果,当做参考吧。最后的结果如下:
结果是83.3,和论文的84.8差距有点大,我觉得可能是loss这儿有点问题,不过也不好说。
总结
这篇文章思路挺不错的,但是开源的代码问题真挺多的,不知道是不是作者传错了代码,我改完后与官方结果还是有较大差距,也不知道是啥问题,希望原作者或者有大佬能解释一下吧。