记录第一次使用yolo8跑自己的数据;
首先将官方文档看一下,大概捉摸了2个小时,地址:GitHub - ultralytics/ultralytics: NEW - YOLOv8 🚀 in PyTorch > ONNX > CoreML > TFLite
获得了基本的一些了解,知道了环境和大概的使用方法;
阅读了官方文档,了解检测任务的训练检测方法,就是各种环境的设置;
官方文档,地址:
CLI - YOLOv8 Docs
然后拿自己处理好的数据进行训练,训练的时候数据格式使用的是yolov5通用的数据格式;
单卡在1070上训练没问题;
然后双卡进行单机多卡训练的时候报错,TypeError: barrier() got an unexpected keyword argument 'device_ids',报错详情:
Traceback (most recent call last):
File "/home/jovyan/fuxueping/obj_det/ultralytics-main/train_detect.py", line 9, in <module>
model.train(data='coco_38classes.yaml', epochs=100, imgsz=640, save=True)
File "/home/jovyan/fuxueping/obj_det/ultralytics-main/ultralytics/yolo/engine/model.py", line 365, in train
self.trainer.train()
File "/home/jovyan/fuxueping/obj_det/ultralytics-main/ultralytics/yolo/engine/trainer.py", line 190, in train
self._do_train(world_size)
File "/home/jovyan/fuxueping/obj_det/ultralytics-main/ultralytics/yolo/engine/trainer.py", line 265, in _do_train
self._setup_train(world_size)
File "/home/jovyan/fuxueping/obj_det/ultralytics-main/ultralytics/yolo/engine/trainer.py", line 248, in _setup_train
self.train_loader = self.get_dataloader(self.trainset, batch_size=batch_size, rank=RANK, mode='train')
File "/home/jovyan/fuxueping/obj_det/ultralytics-main/ultralytics/yolo/v8/detect/train.py", line 43, in get_dataloader
build_dataloader(self.args, batch_size, img_path=dataset_path, stride=gs, rank=rank, mode=mode,
File "/home/jovyan/fuxueping/obj_det/ultralytics-main/ultralytics/yolo/data/build.py", line 70, in build_dataloader
with torch_distributed_zero_first(rank): # init dataset *.cache only once if DDP
File "/opt/conda/lib/python3.8/contextlib.py", line 113, in __enter__
return next(self.gen)
File "/home/jovyan/fuxueping/obj_det/ultralytics-main/ultralytics/yolo/utils/torch_utils.py", line 38, in torch_distributed_zero_first
dist.barrier(device_ids=[local_rank])
TypeError: barrier() got an unexpected keyword argument 'device_ids'
网上查了一下,说是pytorch版本不够,然后申请了一个1.9的版本依然出问题:
然后自己处理了一下出错的地方,修改了代码:
然后就可以训练了;后面也找到了一些别人的方法,参考链接:
目标检测 YOLOv5 - YOLOv5:v6版本多机多卡训练出现的错误及解决方案_typeerror: barrier() got an unexpected keyword arg_所向披靡的张大刀的博客-CSDN博客