文章目录
- 摘要
- 模型详解
- 模型实战
- 训练COCO数据集
- 下载数据集
- COCO转yolo格式数据集(适用V4,V5,V6,V7,V8)
- 配置yolov10环境
- 训练
- 断点训练
- 测试
- 训练自定义数据集
- Labelme数据集
- 格式转换
- 训练
- 测试
- 总结
摘要
模型详解
模型实战
训练COCO数据集
本次使用2017版本的COCO数据集作为例子,演示如何使用YoloV10训练和预测。
下载数据集
Images:
- 2017 Train images [118K/18GB] :http://images.cocodataset.org/zips/train2017.zip
- 2017 Val images [5K/1GB]:http://images.cocodataset.org/zips/val2017.zip
- 2017 Test images [41K/6GB]:http://images.cocodataset.org/zips/unlabeled2017.zip
Annotations:
- 2017 annotations_trainval2017 [241MB]:http://images.cocodataset.org/annotations/annotations_trainval2017.zip
COCO转yolo格式数据集(适用V4,V5,V6,V7,V8)
最初的研究论文中,COCO中有91个对象类别。然而,在2014年的第一次发布中,仅发布了80个标记和分割图像的对象类别。2014年发布之后,2017年发布了后续版本。详细的类别如下:
ID | OBJECT (PAPER) | OBJECT (2014 REL.) | OBJECT (2017 REL.) | SUPER CATEGORY |
---|---|---|---|---|
1 | person | person | person | person |
2 | bicycle | bicycle | bicycle | vehicle |
3 | car | car | car | vehicle |
4 | motorcycle | motorcycle | motorcycle | vehicle |
5 | airplane | airplane | airplane | vehicle |
6 | bus | bus | bus | vehicle |
7 | train | train | train | vehicle |
8 | truck | truck | truck | vehicle |
9 | boat | boat | boat | vehicle |
10 | trafficlight | traffic light | traffic light | outdoor |
11 | fire hydrant | fire hydrant | fire hydrant | outdoor |
12 | street | sign | - | - |
13 | stop sign | stop sign | stop sign | outdoor |
14 | parking meter | parking meter | parking meter | outdoor |
15 | bench | bench | bench | outdoor |
16 | bird | bird | bird | animal |
17 | cat | cat | cat | animal |
18 | dog | dog | dog | animal |
19 | horse | horse | horse | animal |
20 | sheep | sheep | sheep | animal |
21 | cow | cow | cow | animal |
22 | elephant | elephant | elephant | animal |
23 | bear | bear | bear | animal |
24 | zebra | zebra | zebra | animal |
25 | giraffe | giraffe | giraffe | animal |
26 | hat | - | - | accessory |
27 | backpack | backpack | backpack | accessory |
28 | umbrella | umbrella | umbrella | accessory |
29 | shoe | - | - | accessory |
30 | eye glasses | - | - | accessory |
31 | handbag | handbag | handbag | accessory |
32 | tie | tie | tie | accessory |
33 | suitcase | suitcase | suitcase | accessory |
34 | frisbee | frisbee | frisbee | sports |
35 | skis | skis | skis | sports |
36 | snowboard | snowboard | snowboard | sports |
37 | sports ball | sports ball | sports ball | sports |
38 | kite | kite | kite | sports |
39 | baseball bat | baseball bat | baseball bat | sports |
40 | baseball glove | baseball glove | baseball glove | sports |
41 | skateboard | skateboard | skateboard | sports |
42 | surfboard | surfboard | surfboard | sports |
43 | tennis racket | tennis racket | tennis racket | sports |
44 | bottle | bottle | bottle | kitchen |
45 | plate | - | - | kitchen |
46 | wine glass | wine glass | wine glass | kitchen |
47 | cup | cup | cup | kitchen |
48 | fork | fork | fork | kitchen |
49 | knife | knife | knife | kitchen |
50 | spoon | spoon | spoon | kitchen |
51 | bowl | bowl | bowl | kitchen |
52 | banana | banana | banana | food |
53 | apple | apple | apple | food |
54 | sandwich | sandwich | sandwich | food |
55 | orange | orange | orange | food |
56 | broccoli | broccoli | broccoli | food |
57 | carrot | carrot | carrot | food |
58 | hot dog | hot dog | hot dog | food |
59 | pizza | pizza | pizza | food |
60 | donut | donut | donut | food |
61 | cake | cake | cake | food |
62 | chair | chair | chair | furniture |
63 | couch | couch | couch | furniture |
64 | potted plant | potted plant | potted plant | furniture |
65 | bed | bed | bed | furniture |
66 | mirror | - | - | furniture |
67 | dining table | dining table | dining table | furniture |
68 | window | - | - | furniture |
69 | desk | - | - | furniture |
70 | toilet | toilet | toilet | furniture |
71 | door | - | - | furniture |
72 | tv | tv | tv | electronic |
73 | laptop | laptop | laptop | electronic |
74 | mouse | mouse | mouse | electronic |
75 | remote | remote | remote | electronic |
76 | keyboard | keyboard | keyboard | electronic |
77 | cell phone | cell phone | cell phone | electronic |
78 | microwave | microwave | microwave | appliance |
79 | oven | oven | oven | appliance |
80 | toaster | toaster | toaster | appliance |
81 | sink | sink | sink | appliance |
82 | refrigerator | refrigerator | refrigerator | appliance |
83 | blender | - | - | appliance |
84 | book | book | book | indoor |
85 | clock | clock | clock | indoor |
86 | vase | vase | vase | indoor |
87 | scissors | scissors | scissors | indoor |
88 | teddy bear | teddy bear | teddy bear | indoor |
89 | hair drier | hair drier | hair drier | indoor |
90 | toothbrush | toothbrush | toothbrush | indoor |
91 | hair brush | - | - | indoor |
可以看到,2014年和2017年发布的对象列表是相同的,它们是论文中最初91个对象类别中的80个对象。所以在转换的时候,要重新对类别做映射,映射函数如下:
def coco91_to_coco80_class(): # converts 80-index (val2014) to 91-index (paper)
# https://tech.amikelive.com/node-718/what-object-categories-labels-are-in-coco-dataset/
# a = np.loadtxt('data/coco.names', dtype='str', delimiter='\n')
# b = np.loadtxt('data/coco_paper.names', dtype='str', delimiter='\n')
# x1 = [list(a[i] == b).index(True) + 1 for i in range(80)] # darknet to coco
# x2 = [list(b[i] == a).index(True) if any(b[i] == a) else None for i in range(91)] # coco to darknet
x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, None, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, None, 24, 25, None,
None, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, None, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
51, 52, 53, 54, 55, 56, 57, 58, 59, None, 60, None, None, 61, None, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72,
None, 73, 74, 75, 76, 77, 78, 79, None]
return x
接下来,开始格式转换,工程的目录如下:
- coco:存放解压后的数据集。
-out:保存输出结果。
-coco2yolo.py:转换脚本。
转换代码如下:
import json
import glob
import os
import shutil
from pathlib import Path
import numpy as np
from tqdm import tqdm
def make_folders(path='../out/'):
# Create folders
if os.path.exists(path):
shutil.rmtree(path) # delete output folder
os.makedirs(path) # make new output folder
os.makedirs(path + os.sep + 'labels') # make new labels folder
os.makedirs(path + os.sep + 'images') # make new labels folder
return path
def convert_coco_json(json_dir='./coco/annotations_trainval2017/annotations/'):
jsons = glob.glob(json_dir + '*.json')
coco80 = coco91_to_coco80_class()
# Import json
for json_file in sorted(jsons):
fn = 'out/labels/%s/' % Path(json_file).stem.replace('instances_', '') # folder name
fn_images = 'out/images/%s/' % Path(json_file).stem.replace('instances_', '') # folder name
os.makedirs(fn,exist_ok=True)
os.makedirs(fn_images,exist_ok=True)
with open(json_file) as f:
data = json.load(f)
print(fn)
# Create image dict
images = {'%g' % x['id']: x for x in data['images']}
# Write labels file
for x in tqdm(data['annotations'], desc='Annotations %s' % json_file):
if x['iscrowd']:
continue
img = images['%g' % x['image_id']]
h, w, f = img['height'], img['width'], img['file_name']
file_path='coco/'+fn.split('/')[-2]+"/"+f
# The Labelbox bounding box format is [top left x, top left y, width, height]
box = np.array(x['bbox'], dtype=np.float64)
box[:2] += box[2:] / 2 # xy top-left corner to center
box[[0, 2]] /= w # normalize x
box[[1, 3]] /= h # normalize y
if (box[2] > 0.) and (box[3] > 0.): # if w > 0 and h > 0
with open(fn + Path(f).stem + '.txt', 'a') as file:
file.write('%g %.6f %.6f %.6f %.6f\n' % (coco80[x['category_id'] - 1], *box))
file_path_t=fn_images+f
print(file_path,file_path_t)
shutil.copy(file_path,file_path_t)
def coco91_to_coco80_class(): # converts 80-index (val2014) to 91-index (paper)
# https://tech.amikelive.com/node-718/what-object-categories-labels-are-in-coco-dataset/
# a = np.loadtxt('data/coco.names', dtype='str', delimiter='\n')
# b = np.loadtxt('data/coco_paper.names', dtype='str', delimiter='\n')
# x1 = [list(a[i] == b).index(True) + 1 for i in range(80)] # darknet to coco
# x2 = [list(b[i] == a).index(True) if any(b[i] == a) else None for i in range(91)] # coco to darknet
x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, None, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, None, 24, 25, None,
None, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, None, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
51, 52, 53, 54, 55, 56, 57, 58, 59, None, 60, None, None, 61, None, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72,
None, 73, 74, 75, 76, 77, 78, 79, None]
return x
convert_coco_json()
开始运行:
转换完成后,验证转换的结果:
import cv2
import os
def draw_box_in_single_image(image_path, txt_path):
# 读取图像
image = cv2.imread(image_path)
# 读取txt文件信息
def read_list(txt_path):
pos = []
with open(txt_path, 'r') as file_to_read:
while True:
lines = file_to_read.readline() # 整行读取数据
if not lines:
break
# 将整行数据分割处理,如果分割符是空格,括号里就不用传入参数,如果是逗号, 则传入‘,'字符。
p_tmp = [float(i) for i in lines.split(' ')]
pos.append(p_tmp) # 添加新读取的数据
# Efield.append(E_tmp)
pass
return pos
# txt转换为box
def convert(size, box):
xmin = (box[1]-box[3]/2.)*size[1]
xmax = (box[1]+box[3]/2.)*size[1]
ymin = (box[2]-box[4]/2.)*size[0]
ymax = (box[2]+box[4]/2.)*size[0]
box = (int(xmin), int(ymin), int(xmax), int(ymax))
return box
pos = read_list(txt_path)
print(pos)
tl = int((image.shape[0]+image.shape[1])/2)
lf = max(tl-1,1)
for i in range(len(pos)):
label = str(int(pos[i][0]))
print('label is '+label)
box = convert(image.shape, pos[i])
image = cv2.rectangle(image,(box[0], box[1]),(box[2],box[3]),(0,0,255),2)
cv2.putText(image,label,(box[0],box[1]-2), 0, 1, [0,0,255], thickness=2, lineType=cv2.LINE_AA)
pass
if pos:
cv2.imwrite('./Data/see_images/{}.png'.format(image_path.split('\\')[-1][:-4]), image)
else:
print('None')
img_folder = "./out/images/val2017"
img_list = os.listdir(img_folder)
img_list.sort()
label_folder = "./out/labels/val2017"
label_list = os.listdir(label_folder)
label_list.sort()
if not os.path.exists('./Data/see_images'):
os.makedirs('./Data/see_images')
for i in range(len(img_list)):
image_path = img_folder + "\\" + img_list[i]
txt_path = label_folder + "\\" + label_list[i]
draw_box_in_single_image(image_path, txt_path)
结果展示:
配置yolov10环境
可以直接安装requirements.txt里面所有的库文件,执行安装命令:
pip install -r requirements.txt
如果不想安装这么多库文件,在运行的时候,查看缺少哪个库,就安装哪个库
训练
下载代码:https://github.com/THU-MIG/yolov10,通过下载的方式可以下载到源码。
接下来,创建训练脚本,可以使用yaml文件创建,例如:
from ultralytics import YOLOv10
if __name__ == '__main__':
model = YOLOv10(model="ultralytics/cfg/models/v10/yolov10l.yaml") # 从头开始构建新模型
# If you want to finetune the model with pretrained weights, you could load the
# pretrained weights like below
# model = YOLOv10.from_pretrained('jameslahm/yolov10{n/s/m/b/l/x}')
# or
# wget https://github.com/THU-MIG/yolov10/releases/download/v1.1/yolov10{n/s/m/b/l/x}.pt
# model = YOLOv10('yolov10{n/s/m/b/l/x}.pt')
# Use the model
results = model.train(data="VOC.yaml", patience=0, epochs=150, device='0', batch=8, seed=42) # 训练模
模型文件在ultralytics/cfg/models/v10
下面,如图:
也可以使用预训练模型创建。例如:
model = YOLOv10('yolov10n.pt')
然后开启训练。
# Use the model
model.train(data="coco128.yaml", epochs=3) # train the model
数据集的配置文件在:ultralytics/datasets/
下面,如图:
是不是很简单!!!!
接下来,我们配置自己的环境。
第一步 找到ultralytics/cfg/datasets/coco.yaml
文件。
然后将其复制到根目录
将里面的路径修改为:
# Ultralytics YOLO 🚀, GPL-3.0 license
# COCO 2017 dataset http://cocodataset.org by Microsoft
# Example usage: yolo train data=coco.yaml
# parent
# ├── ultralytics
# └── datasets
# └── coco ← downloads here (20.1 GB)
# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
train: ./coco/images/train2017 # train images (relative to 'path') 118287 images
val: ./coco/images/val2017 # val images (relative to 'path') 5000 images
test: test-dev2017.txt # 20288 of 40670 images, submit to https://competitions.codalab.org/competitions/20794
关于数据集的路径,大家可以自行尝试,我经过多次尝试发现,YoloV8会自行添加datasets
这个文件,所以设置./coco/images/train2017
,则实际路径是datasets/coco/images/train2017
。
第二步 新建train.py
脚本。
from ultralytics import YOLOv10
if __name__ == '__main__':
model = YOLOv10(model="ultralytics/cfg/models/v10/yolov10l.yaml") # 从头开始构建新模型
# If you want to finetune the model with pretrained weights, you could load the
# pretrained weights like below
# model = YOLOv10.from_pretrained('jameslahm/yolov10{n/s/m/b/l/x}')
# or
# wget https://github.com/THU-MIG/yolov10/releases/download/v1.1/yolov10{n/s/m/b/l/x}.pt
# model = YOLOv10('yolov10{n/s/m/b/l/x}.pt')
# Use the model
results = model.train(data="coco.yaml", epochs=3,device='3') # 训练模型
然后,点击train.py
可以运行了。
如果设置多卡,可以在device中设置,例如我使用四张卡,可以设置为:
results = model.train(data="coco.yaml", epochs=3,device='0,1,2,3') # 训练模型
第三步 修改参数,在ultralytics/cfg/default.yaml
文件中查看。例如:
# Train settings -------------------------------------------------------------------------------------------------------
model: # path to model file, i.e. yolov8n.pt, yolov8n.yaml
data: # path to data file, i.e. coco128.yaml
epochs: 100 # number of epochs to train for
patience: 50 # epochs to wait for no observable improvement for early stopping of training
batch: 16 # number of images per batch (-1 for AutoBatch)
imgsz: 640 # size of input images as integer or w,h
save: True # save train checkpoints and predict results
save_period: -1 # Save checkpoint every x epochs (disabled if < 1)
cache: False # True/ram, disk or False. Use cache for data loading
device: # device to run on, i.e. cuda device=0 or device=0,1,2,3 or device=cpu
workers: 8 # number of worker threads for data loading (per RANK if DDP)
project: # project name
name: # experiment name, results saved to 'project/name' directory
exist_ok: False # whether to overwrite existing experiment
pretrained: False # whether to use a pretrained model
optimizer: SGD # optimizer to use, choices=['SGD', 'Adam', 'AdamW', 'RMSProp']
verbose: True # whether to print verbose output
seed: 0 # random seed for reproducibility
deterministic: True # whether to enable deterministic mode
single_cls: False # train multi-class data as single-class
image_weights: False # use weighted image selection for training
rect: False # support rectangular training if mode='train', support rectangular evaluation if mode='val'
cos_lr: False # use cosine learning rate scheduler
close_mosaic: 10 # disable mosaic augmentation for final 10 epochs
resume: False # resume training from last checkpoint
上面是训练过程中常用的参数,我们调用yolo函数可以自行修改。
等待测试完成后,就可以看到结果,如下图:
断点训练
训练过程中,有时候会出现意外中断的情况,如果想要接着训练,则需要将resume设置为True。代码如下:
from ultralytics import YOLOv10
if __name__ == '__main__':
# 加载模型
model = YOLOv10("runs/detect/train8/weights/last.pt") # 从头开始构建新模型
print(model.model)
# Use the model
results = model.train(data="VOC.yaml", epochs=100, device='0', batch=16,workers=0,resume=True) # 训练模型
然后点击run,就可以继续接着训练。
测试
新建测试脚本test.py.
from ultralytics import YOLOv10
# Load a model
model = YOLOv10("runs/detect/train11/weights/best.pt") # load a pretrained model (recommended for training)
results = model.predict(source="ultralytics/assets",device='3') # predict on an image
print(results)
这个results保存了所有的结果。如下图:
predict的参数也可以在ultralytics/cfg/default.yaml
文件中查看。例如:
# Prediction settings --------------------------------------------------------------------------------------------------
source: # source directory for images or videos
show: False # show results if possible
save_txt: False # save results as .txt file
save_conf: False # save results with confidence scores
save_crop: False # save cropped images with results
hide_labels: False # hide labels
hide_conf: False # hide confidence scores
vid_stride: 1 # video frame-rate stride
line_thickness: 3 # bounding box thickness (pixels)
visualize: False # visualize model features
augment: False # apply image augmentation to prediction sources
agnostic_nms: False # class-agnostic NMS
classes: # filter results by class, i.e. class=0, or class=[0,2,3]
retina_masks: False # use high-resolution segmentation masks
boxes: True # Show boxes in segmentation predictions
训练自定义数据集
Labelme数据集
数据集选用我以前自己标注的数据集。下载链接:
https://download.csdn.net/download/hhhhhhhhhhwwwwwwwwww/63242994。
类别如下: [‘c17’, ‘c5’, ‘helicopter’, ‘c130’, ‘f16’, ‘b2’,
‘other’, ‘b52’, ‘kc10’, ‘command’, ‘f15’, ‘kc135’, ‘a10’,
‘b1’, ‘aew’, ‘f22’, ‘p3’, ‘p8’, ‘f35’, ‘f18’, ‘v22’, ‘f4’,
‘globalhawk’, ‘u2’, ‘su-27’, ‘il-38’, ‘tu-134’, ‘su-33’,
‘an-70’, ‘su-24’, ‘tu-22’, ‘il-76’]
格式转换
将Lableme数据集转为yolov10格式的数据集,转换代码如下:
import os
import shutil
import numpy as np
import json
from glob import glob
import cv2
from sklearn.model_selection import train_test_split
from os import getcwd
def convert(size, box):
dw = 1. / (size[0])
dh = 1. / (size[1])
x = (box[0] + box[1]) / 2.0 - 1
y = (box[2] + box[3]) / 2.0 - 1
w = box[1] - box[0]
h = box[3] - box[2]
x = x * dw
w = w * dw
y = y * dh
h = h * dh
return (x, y, w, h)
def change_2_yolo5(files, txt_Name):
imag_name=[]
for json_file_ in files:
json_filename = labelme_path + json_file_ + ".json"
out_file = open('%s/%s.txt' % (labelme_path, json_file_), 'w')
json_file = json.load(open(json_filename, "r", encoding="utf-8"))
# image_path = labelme_path + json_file['imagePath']
imag_name.append(json_file_+'.jpg')
height, width, channels = cv2.imread(labelme_path + json_file_ + ".jpg").shape
for multi in json_file["shapes"]:
points = np.array(multi["points"])
xmin = min(points[:, 0]) if min(points[:, 0]) > 0 else 0
xmax = max(points[:, 0]) if max(points[:, 0]) > 0 else 0
ymin = min(points[:, 1]) if min(points[:, 1]) > 0 else 0
ymax = max(points[:, 1]) if max(points[:, 1]) > 0 else 0
label = multi["label"].lower()
if xmax <= xmin:
pass
elif ymax <= ymin:
pass
else:
cls_id = classes.index(label)
b = (float(xmin), float(xmax), float(ymin), float(ymax))
bb = convert((width, height), b)
out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')
# print(json_filename, xmin, ymin, xmax, ymax, cls_id)
return imag_name
def image_txt_copy(files,scr_path,dst_img_path,dst_txt_path):
"""
:param files: 图片名字组成的list
:param scr_path: 图片的路径
:param dst_img_path: 图片复制到的路径
:param dst_txt_path: 图片对应的txt复制到的路径
:return:
"""
for file in files:
img_path=scr_path+file
print(file)
shutil.copy(img_path, dst_img_path+file)
scr_txt_path=scr_path+file.split('.')[0]+'.txt'
shutil.copy(scr_txt_path, dst_txt_path + file.split('.')[0]+'.txt')
if __name__ == '__main__':
classes = ['c17', 'c5', 'helicopter', 'c130', 'f16', 'b2',
'other', 'b52', 'kc10', 'command', 'f15', 'kc135', 'a10',
'b1', 'aew', 'f22', 'p3', 'p8', 'f35', 'f18', 'v22', 'f4',
'globalhawk', 'u2', 'su-27', 'il-38', 'tu-134', 'su-33',
'an-70', 'su-24', 'tu-22', 'il-76']
# 1.标签路径
labelme_path = "USA-Labelme/"
isUseTest = True # 是否创建test集
# 3.获取待处理文件
files = glob(labelme_path + "*.json")
files = [i.replace("\\", "/").split("/")[-1].split(".json")[0] for i in files]
for i in files:
print(i)
trainval_files, test_files = train_test_split(files, test_size=0.1, random_state=55)
# split
train_files, val_files = train_test_split(trainval_files, test_size=0.1, random_state=55)
train_name_list=change_2_yolo5(train_files, "train")
print(train_name_list)
val_name_list=change_2_yolo5(val_files, "val")
test_name_list=change_2_yolo5(test_files, "test")
#创建数据集文件夹。
file_List = ["train", "val", "test"]
for file in file_List:
if not os.path.exists('./VOC/images/%s' % file):
os.makedirs('./VOC/images/%s' % file)
if not os.path.exists('./VOC/labels/%s' % file):
os.makedirs('./VOC/labels/%s' % file)
image_txt_copy(train_name_list,labelme_path,'./VOC/images/train/','./VOC/labels/train/')
image_txt_copy(val_name_list, labelme_path, './VOC/images/val/', './VOC/labels/val/')
image_txt_copy(test_name_list, labelme_path, './VOC/images/test/', './VOC/labels/test/')
运行完成后就得到了yolov8格式的数据集。
训练
将生成的yolo数据集放到datasets文件夹下面,如下图:
然后新建VOC.yaml文件,添加内容:
train: ./VOC/images/train # train images
val: VOC/images/val # val images
test: VOC/images/test # test images (optional)
names: ['c17', 'c5', 'helicopter', 'c130', 'f16', 'b2',
'other', 'b52', 'kc10', 'command', 'f15', 'kc135', 'a10',
'b1', 'aew', 'f22', 'p3', 'p8', 'f35', 'f18', 'v22', 'f4',
'globalhawk', 'u2', 'su-27', 'il-38', 'tu-134', 'su-33',
'an-70', 'su-24', 'tu-22', 'il-76']
然后新建train.py,添加代码:
from ultralytics import YOLO
if __name__ == '__main__':
# 加载模型
model = YOLO("ultralytics/models/v8/yolov8n.yaml") # 从头开始构建新模型
print(model.model)
# Use the model
results = model.train(data="VOC.yaml", epochs=100, device='0', batch=16,workers=0) # 训练模型
然后就可以看是训练了,点击run开始运行train.py。
训练100个epoch后的结果:
测试
新建test.py脚本,插入代码:
from ultralytics import YOLOv10
# Load a model
model = YOLOv10("runs/detect/train/weights/best.pt") # load a pretrained model (recommended for training)
results = model.predict(source="datasets/VOC/images/test",device='0',save=True) # predict on an image
预测参数如下:
# Prediction settings --------------------------------------------------------------------------------------------------
source: # source directory for images or videos
show: False # show results if possible
save_txt: False # save results as .txt file
save_conf: False # save results with confidence scores
save_crop: False # save cropped images with results
hide_labels: False # hide labels
hide_conf: False # hide confidence scores
vid_stride: 1 # video frame-rate stride
line_thickness: 3 # bounding box thickness (pixels)
visualize: False # visualize model features
augment: False # apply image augmentation to prediction sources
agnostic_nms: False # class-agnostic NMS
classes: # filter results by class, i.e. class=0, or class=[0,2,3]
retina_masks: False # use high-resolution segmentation masks
boxes: True # Show boxes in segmentation predictions
我们发现并没有像yolov5那样,保存测试图片的参数,通过查看源码:
找到了save这个参数,所以,将save设置为True就可以保存测试的图片。如下图:
如果觉得官方封装的太多了,不太灵活,可以使用下面的推理代码:
import cv2
import time
import random
import numpy as np
import torch, torchvision
def load_model(model_path):
model = torch.load(model_path, map_location='cpu')
category_list = model.get('CLASSES', model.get('model').names)
model = (model.get('ema') or model['model']).to("cuda:0").float() # FP32 model
model.__setattr__('CLASSES', category_list)
model.fuse().eval()
return model
# def data_preprocess(model, img, img_scale):
# stride, auto = 32, True
# stride = max(int(model.stride.max()), 32)
# img = letterbox(img, new_shape=img_scale, stride=stride, auto=auto)[0] # padded resize
# img = np.ascontiguousarray(img.transpose((2, 0, 1))[::-1]) # HWC to CHW, BGR to RGB,contiguous
# img = torch.from_numpy(img).to("cuda:0") # ndarray to tensor
# img = img.float() # uint8 to fp32
# img /= 255 # 0 - 255 to 0.0 - 1.0
# if len(img.shape) == 3:
# img = img[None] # expand for batch dim
# return img
def data_preprocess(model, img, img_scale):
# 定义步长和是否自动调整
stride, auto = 32, True
# 确保步长至少为模型的最大步长或32
stride = max(int(model.stride.max()), 32)
# 对图像进行填充并调整大小,以适应模型输入
img = letterbox(img, new_shape=img_scale, stride=stride, auto=auto)[0] # padded resize
# 将图像的维度从(高度, 宽度, 通道)转换为(通道, 高度, 宽度),并将数据类型从uint8转为fp32
img = np.ascontiguousarray(img.transpose((2, 0, 1))[::-1]) # HWC to CHW, BGR to RGB,contiguous
# 将numpy数组转换为PyTorch张量,并将数据移动到GPU上
img = torch.from_numpy(img).to("cuda:0") # ndarray to tensor
# 将像素值从0-255的范围缩放到0.0-1.0
img = img.float() # uint8 to fp32
img /= 255 #
# 如果图像是单通道的,则在其前面添加一个维度以模拟批处理大小
if len(img.shape) == 3:
img = img[None] # expand for batch dim
return img
def letterbox(im, new_shape=(640, 640), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True, stride=32):
# 获取图像当前形状 [高度, 宽度]
shape = im.shape[:2]
# 如果 new_shape 是一个整数,将其转换为元组 (宽度, 高度)
if isinstance(new_shape, int):
new_shape = (new_shape, new_shape)
# 计算缩放比例 (新尺寸 / 旧尺寸)
r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
# 如果不允许放大,则只进行缩小操作 (为更好的验证 mAP)
if not scaleup:
r = min(r, 1.0)
# 计算缩放后的尺寸和填充
ratio = r, r # 宽度和高度比例
new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1] # 宽度和高度填充
# 如果 auto 为 True,则按 stride 取模 (最小矩形)
if auto:
dw, dh = np.mod(dw, stride), np.mod(dh, stride)
# 如果 scaleFill 为 True,则拉伸填充
elif scaleFill:
dw, dh = 0.0, 0.0
new_unpad = (new_shape[1], new_shape[0])
ratio = new_shape[1] / shape[1], new_shape[0] / shape[0] # 宽度和高度比例
# 将填充分为两部分,每部分为原来的一半
dw /= 2
dh /= 2
# 如果原始尺寸与缩放后的尺寸不同,则进行缩放操作
if shape[::-1] != new_unpad:
im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)
# 在图像周围添加边框,高度和宽度分别为上面计算得到的 dw 和 dh
top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color) # 添加边框
return im, ratio, (dw, dh) # 返回处理后的图像、宽高比例和填充值
def non_max_suppression(prediction, conf_thres=0.25, iou_thres=0.45, classes=None, agnostic=False, multi_label=False,
labels=(), max_det=300, nc=0, max_time_img=0.05, max_nms=30000, max_wh=7680, ):
# Checks
assert 0 <= conf_thres <= 1, f'Invalid Confidence threshold {conf_thres}, valid values are between 0.0 and 1.0'
assert 0 <= iou_thres <= 1, f'Invalid IoU {iou_thres}, valid values are between 0.0 and 1.0'
if isinstance(prediction, (list, tuple)): # YOLOv8 model in validation model, output = (inference_out, loss_out)
prediction = prediction[0] # select only inference output
device = prediction.device
mps = 'mps' in device.type # Apple MPS
if mps: # MPS not fully supported yet, convert tensors to CPU before NMS
prediction = prediction.cpu()
bs = prediction.shape[0] # batch size
nc = nc or (prediction.shape[1] - 4) # number of classes
nm = prediction.shape[1] - nc - 4
mi = 4 + nc # mask start index
xc = prediction[:, 4:mi].amax(1) > conf_thres # candidates
# Settings
# min_wh = 2 # (pixels) minimum box width and height
time_limit = 0.5 + max_time_img * bs # seconds to quit after
multi_label &= nc > 1 # multiple labels per box (adds 0.5ms/img)
prediction = prediction.transpose(-1, -2) # shape(1,84,6300) to shape(1,6300,84)
prediction[..., :4] = xywh2xyxy(prediction[..., :4]) # xywh to xyxy
t = time.time()
output = [torch.zeros((0, 6 + nm), device=prediction.device)] * bs
for xi, x in enumerate(prediction): # image index, image inference
# Apply constraints
# x[((x[:, 2:4] < min_wh) | (x[:, 2:4] > max_wh)).any(1), 4] = 0 # width-height
x = x[xc[xi]] # confidence
# Cat apriori labels if autolabelling
if labels and len(labels[xi]):
lb = labels[xi]
v = torch.zeros((len(lb), nc + nm + 4), device=x.device)
v[:, :4] = xywh2xyxy(lb[:, 1:5]) # box
v[range(len(lb)), lb[:, 0].long() + 4] = 1.0 # cls
x = torch.cat((x, v), 0)
# If none remain process next image
if not x.shape[0]:
continue
# Detections matrix nx6 (xyxy, conf, cls)
box, cls, mask = x.split((4, nc, nm), 1)
if multi_label:
i, j = torch.where(cls > conf_thres)
x = torch.cat((box[i], x[i, 4 + j, None], j[:, None].float(), mask[i]), 1)
else: # best class only
conf, j = cls.max(1, keepdim=True)
x = torch.cat((box, conf, j.float(), mask), 1)[conf.view(-1) > conf_thres]
# Filter by class
if classes is not None:
x = x[(x[:, 5:6] == torch.tensor(classes, device=x.device)).any(1)]
# Check shape
n = x.shape[0] # number of boxes
if not n: # no boxes
continue
if n > max_nms: # excess boxes
x = x[x[:, 4].argsort(descending=True)[:max_nms]] # sort by confidence and remove excess boxes
# Batched NMS
c = x[:, 5:6] * (0 if agnostic else max_wh) # classes
boxes, scores = x[:, :4] + c, x[:, 4] # boxes (offset by class), scores
i = torchvision.ops.nms(boxes, scores, iou_thres) # NMS
i = i[:max_det] # limit detections
output[xi] = x[i]
if mps:
output[xi] = output[xi].to(device)
if (time.time() - t) > time_limit:
print(f'WARNING ⚠️ NMS time limit {time_limit:.3f}s exceeded')
break # time limit exceeded
return output
def xywh2xyxy(x):
"""
Convert bounding box coordinates from (x, y, width, height) format to (x1, y1, x2, y2) format where (x1, y1) is the
top-left corner and (x2, y2) is the bottom-right corner.
Args:
x (np.ndarray | torch.Tensor): The input bounding box coordinates in (x, y, width, height) format.
Returns:
y (np.ndarray | torch.Tensor): The bounding box coordinates in (x1, y1, x2, y2) format.
"""
assert x.shape[-1] == 4, f'input shape last dimension expected 4 but input shape is {x.shape}'
y = torch.empty_like(x) if isinstance(x, torch.Tensor) else np.empty_like(x) # faster than clone/copy
dw = x[..., 2] / 2 # half-width
dh = x[..., 3] / 2 # half-height
y[..., 0] = x[..., 0] - dw # top left x
y[..., 1] = x[..., 1] - dh # top left y
y[..., 2] = x[..., 0] + dw # bottom right x
y[..., 3] = x[..., 1] + dh # bottom right y
return y
def scale_boxes(img1_shape, boxes, img0_shape, ratio_pad=None, padding=True):
"""
Rescales bounding boxes (in the format of xyxy) from the shape of the image they were originally specified in
(img1_shape) to the shape of a different image (img0_shape).
Args:
img1_shape (tuple): The shape of the image that the bounding boxes are for, in the format of (height, width).
boxes (torch.Tensor): the bounding boxes of the objects in the image, in the format of (x1, y1, x2, y2)
img0_shape (tuple): the shape of the target image, in the format of (height, width).
ratio_pad (tuple): a tuple of (ratio, pad) for scaling the boxes. If not provided, the ratio and pad will be
calculated based on the size difference between the two images.
padding (bool): If True, assuming the boxes is based on image augmented by yolo style. If False then do regular
rescaling.
Returns:
boxes (torch.Tensor): The scaled bounding boxes, in the format of (x1, y1, x2, y2)
"""
if ratio_pad is None: # calculate from img0_shape
gain = min(img1_shape[0] / img0_shape[0], img1_shape[1] / img0_shape[1]) # gain = old / new
pad = round((img1_shape[1] - img0_shape[1] * gain) / 2 - 0.1), round(
(img1_shape[0] - img0_shape[0] * gain) / 2 - 0.1) # wh padding
else:
gain = ratio_pad[0][0]
pad = ratio_pad[1]
if padding:
boxes[..., [0, 2]] -= pad[0] # x padding
boxes[..., [1, 3]] -= pad[1] # y padding
boxes[..., :4] /= gain
clip_boxes(boxes, img0_shape)
return boxes
def clip_boxes(boxes, shape):
"""
Takes a list of bounding boxes and a shape (height, width) and clips the bounding boxes to the shape.
Args:
boxes (torch.Tensor): the bounding boxes to clip
shape (tuple): the shape of the image
"""
if isinstance(boxes, torch.Tensor): # faster individually
boxes[..., 0].clamp_(0, shape[1]) # x1
boxes[..., 1].clamp_(0, shape[0]) # y1
boxes[..., 2].clamp_(0, shape[1]) # x2
boxes[..., 3].clamp_(0, shape[0]) # y2
else: # np.array (faster grouped)
boxes[..., [0, 2]] = boxes[..., [0, 2]].clip(0, shape[1]) # x1, x2
boxes[..., [1, 3]] = boxes[..., [1, 3]].clip(0, shape[0]) # y1, y2
def plot_result(det_cpu, dst_img, category_names, image_name):
for i, item in enumerate(det_cpu):
rand_color = (random.randint(0, 255), random.randint(0, 255), random.randint(0, 255))
# 画box
box_x1, box_y1, box_x2, box_y2 = item[0:4].astype(np.int32)
cv2.rectangle(dst_img, (box_x1, box_y1), (box_x2, box_y2), color=rand_color, thickness=2)
# 画label
label = category_names[int(item[5])]
score = item[4]
org = (min(box_x1, box_x2), min(box_y1, box_y2) - 8)
text = '{}|{:.2f}'.format(label, score)
cv2.putText(dst_img, text, org=org, fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.8, color=rand_color,
thickness=2)
cv2.imshow('result', dst_img)
cv2.waitKey()
cv2.imwrite(image_name, dst_img)
if __name__ == '__main__':
img_path = "./ultralytics/assets/bus.jpg"
image_name = img_path.split('/')[-1]
ori_img = cv2.imread(img_path)
# load model
model = load_model("runs/detect/train2/weights/best.pt")
# 数据预处理
img = data_preprocess(model, ori_img, [640, 640])
# 推理
result = model(img, augment=False)
preds = result[0]
# NMS
det = non_max_suppression(preds, conf_thres=0.35, iou_thres=0.45, nc=len(model.CLASSES))[0]
# bbox还原至原图尺寸
det[:, :4] = scale_boxes(img.shape[2:], det[:, :4], ori_img.shape)
category_names = model.CLASSES
# show
plot_result(det.cpu().numpy(), ori_img, category_names, image_name)
总结
本文对yolov10的模型做了讲解,并且带大家一起实战!