Yolov10网络详解与实战（附数据集）

文章目录

摘要
模型详解
模型实战
- 训练COCO数据集
- - 下载数据集
- COCO转yolo格式数据集（适用V4，V5，V6，V7，V8）
- - 配置yolov10环境
  - 训练
  - 断点训练
  - 测试
- 训练自定义数据集
- - Labelme数据集
  - 格式转换
  - 训练
  - 测试
总结

摘要

模型详解

模型实战

训练COCO数据集

本次使用2017版本的COCO数据集作为例子，演示如何使用YoloV10训练和预测。

下载数据集

Images:

2017 Train images [118K/18GB] ：http://images.cocodataset.org/zips/train2017.zip
2017 Val images [5K/1GB]：http://images.cocodataset.org/zips/val2017.zip
2017 Test images [41K/6GB]：http://images.cocodataset.org/zips/unlabeled2017.zip

Annotations:

2017 annotations_trainval2017 [241MB]：http://images.cocodataset.org/annotations/annotations_trainval2017.zip

COCO转yolo格式数据集（适用V4，V5，V6，V7，V8）

最初的研究论文中，COCO中有91个对象类别。然而，在2014年的第一次发布中，仅发布了80个标记和分割图像的对象类别。2014年发布之后，2017年发布了后续版本。详细的类别如下：

ID	OBJECT (PAPER)	OBJECT (2014 REL.)	OBJECT (2017 REL.)	SUPER CATEGORY
1	person	person	person	person
2	bicycle	bicycle	bicycle	vehicle
3	car	car	car	vehicle
4	motorcycle	motorcycle	motorcycle	vehicle
5	airplane	airplane	airplane	vehicle
6	bus	bus	bus	vehicle
7	train	train	train	vehicle
8	truck	truck	truck	vehicle
9	boat	boat	boat	vehicle
10	trafficlight	traffic light	traffic light	outdoor
11	fire hydrant	fire hydrant	fire hydrant	outdoor
12	street	sign	-	-
13	stop sign	stop sign	stop sign	outdoor
14	parking meter	parking meter	parking meter	outdoor
15	bench	bench	bench	outdoor
16	bird	bird	bird	animal
17	cat	cat	cat	animal
18	dog	dog	dog	animal
19	horse	horse	horse	animal
20	sheep	sheep	sheep	animal
21	cow	cow	cow	animal
22	elephant	elephant	elephant	animal
23	bear	bear	bear	animal
24	zebra	zebra	zebra	animal
25	giraffe	giraffe	giraffe	animal
26	hat	-	-	accessory
27	backpack	backpack	backpack	accessory
28	umbrella	umbrella	umbrella	accessory
29	shoe	-	-	accessory
30	eye glasses	-	-	accessory
31	handbag	handbag	handbag	accessory
32	tie	tie	tie	accessory
33	suitcase	suitcase	suitcase	accessory
34	frisbee	frisbee	frisbee	sports
35	skis	skis	skis	sports
36	snowboard	snowboard	snowboard	sports
37	sports ball	sports ball	sports ball	sports
38	kite	kite	kite	sports
39	baseball bat	baseball bat	baseball bat	sports
40	baseball glove	baseball glove	baseball glove	sports
41	skateboard	skateboard	skateboard	sports
42	surfboard	surfboard	surfboard	sports
43	tennis racket	tennis racket	tennis racket	sports
44	bottle	bottle	bottle	kitchen
45	plate	-	-	kitchen
46	wine glass	wine glass	wine glass	kitchen
47	cup	cup	cup	kitchen
48	fork	fork	fork	kitchen
49	knife	knife	knife	kitchen
50	spoon	spoon	spoon	kitchen
51	bowl	bowl	bowl	kitchen
52	banana	banana	banana	food
53	apple	apple	apple	food
54	sandwich	sandwich	sandwich	food
55	orange	orange	orange	food
56	broccoli	broccoli	broccoli	food
57	carrot	carrot	carrot	food
58	hot dog	hot dog	hot dog	food
59	pizza	pizza	pizza	food
60	donut	donut	donut	food
61	cake	cake	cake	food
62	chair	chair	chair	furniture
63	couch	couch	couch	furniture
64	potted plant	potted plant	potted plant	furniture
65	bed	bed	bed	furniture
66	mirror	-	-	furniture
67	dining table	dining table	dining table	furniture
68	window	-	-	furniture
69	desk	-	-	furniture
70	toilet	toilet	toilet	furniture
71	door	-	-	furniture
72	tv	tv	tv	electronic
73	laptop	laptop	laptop	electronic
74	mouse	mouse	mouse	electronic
75	remote	remote	remote	electronic
76	keyboard	keyboard	keyboard	electronic
77	cell phone	cell phone	cell phone	electronic
78	microwave	microwave	microwave	appliance
79	oven	oven	oven	appliance
80	toaster	toaster	toaster	appliance
81	sink	sink	sink	appliance
82	refrigerator	refrigerator	refrigerator	appliance
83	blender	-	-	appliance
84	book	book	book	indoor
85	clock	clock	clock	indoor
86	vase	vase	vase	indoor
87	scissors	scissors	scissors	indoor
88	teddy bear	teddy bear	teddy bear	indoor
89	hair drier	hair drier	hair drier	indoor
90	toothbrush	toothbrush	toothbrush	indoor
91	hair brush	-	-	indoor

可以看到，2014年和2017年发布的对象列表是相同的，它们是论文中最初91个对象类别中的80个对象。所以在转换的时候，要重新对类别做映射，映射函数如下：

def coco91_to_coco80_class():  # converts 80-index (val2014) to 91-index (paper)
    # https://tech.amikelive.com/node-718/what-object-categories-labels-are-in-coco-dataset/
    # a = np.loadtxt('data/coco.names', dtype='str', delimiter='\n')
    # b = np.loadtxt('data/coco_paper.names', dtype='str', delimiter='\n')
    # x1 = [list(a[i] == b).index(True) + 1 for i in range(80)]  # darknet to coco
    # x2 = [list(b[i] == a).index(True) if any(b[i] == a) else None for i in range(91)]  # coco to darknet
    x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, None, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, None, 24, 25, None,
         None, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, None, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
         51, 52, 53, 54, 55, 56, 57, 58, 59, None, 60, None, None, 61, None, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72,
         None, 73, 74, 75, 76, 77, 78, 79, None]
    return x

接下来，开始格式转换，工程的目录如下：
在这里插入图片描述

coco：存放解压后的数据集。
-out：保存输出结果。
-coco2yolo.py：转换脚本。

转换代码如下：

import json
import glob
import os
import shutil
from pathlib import Path
import numpy as np
from tqdm import tqdm


def make_folders(path='../out/'):
    # Create folders

    if os.path.exists(path):
        shutil.rmtree(path)  # delete output folder
    os.makedirs(path)  # make new output folder
    os.makedirs(path + os.sep + 'labels')  # make new labels folder
    os.makedirs(path + os.sep + 'images')  # make new labels folder
    return path


def convert_coco_json(json_dir='./coco/annotations_trainval2017/annotations/'):
    jsons = glob.glob(json_dir + '*.json')
    coco80 = coco91_to_coco80_class()

    # Import json
    for json_file in sorted(jsons):
        fn = 'out/labels/%s/' % Path(json_file).stem.replace('instances_', '')  # folder name
        fn_images = 'out/images/%s/' % Path(json_file).stem.replace('instances_', '')  # folder name
        os.makedirs(fn,exist_ok=True)
        os.makedirs(fn_images,exist_ok=True)
        with open(json_file) as f:
            data = json.load(f)
        print(fn)
        # Create image dict
        images = {'%g' % x['id']: x for x in data['images']}

        # Write labels file
        for x in tqdm(data['annotations'], desc='Annotations %s' % json_file):
            if x['iscrowd']:
                continue

            img = images['%g' % x['image_id']]
            h, w, f = img['height'], img['width'], img['file_name']
            file_path='coco/'+fn.split('/')[-2]+"/"+f
            # The Labelbox bounding box format is [top left x, top left y, width, height]
            box = np.array(x['bbox'], dtype=np.float64)
            box[:2] += box[2:] / 2  # xy top-left corner to center
            box[[0, 2]] /= w  # normalize x
            box[[1, 3]] /= h  # normalize y

            if (box[2] > 0.) and (box[3] > 0.):  # if w > 0 and h > 0
                with open(fn + Path(f).stem + '.txt', 'a') as file:
                    file.write('%g %.6f %.6f %.6f %.6f\n' % (coco80[x['category_id'] - 1], *box))
            file_path_t=fn_images+f
            print(file_path,file_path_t)
            shutil.copy(file_path,file_path_t)


def coco91_to_coco80_class():  # converts 80-index (val2014) to 91-index (paper)
    # https://tech.amikelive.com/node-718/what-object-categories-labels-are-in-coco-dataset/
    # a = np.loadtxt('data/coco.names', dtype='str', delimiter='\n')
    # b = np.loadtxt('data/coco_paper.names', dtype='str', delimiter='\n')
    # x1 = [list(a[i] == b).index(True) + 1 for i in range(80)]  # darknet to coco
    # x2 = [list(b[i] == a).index(True) if any(b[i] == a) else None for i in range(91)]  # coco to darknet
    x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, None, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, None, 24, 25, None,
         None, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, None, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
         51, 52, 53, 54, 55, 56, 57, 58, 59, None, 60, None, None, 61, None, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72,
         None, 73, 74, 75, 76, 77, 78, 79, None]
    return x

convert_coco_json()

开始运行：
在这里插入图片描述

转换完成后，验证转换的结果：

import cv2
import os

def draw_box_in_single_image(image_path, txt_path):
    # 读取图像
    image = cv2.imread(image_path)

    # 读取txt文件信息
    def read_list(txt_path):
        pos = []
        with open(txt_path, 'r') as file_to_read:
            while True:
                lines = file_to_read.readline()  # 整行读取数据
                if not lines:
                    break
                # 将整行数据分割处理，如果分割符是空格，括号里就不用传入参数，如果是逗号， 则传入‘，'字符。
                p_tmp = [float(i) for i in lines.split(' ')]
                pos.append(p_tmp)  # 添加新读取的数据
                # Efield.append(E_tmp)
                pass
        return pos


    # txt转换为box
    def convert(size, box):
        xmin = (box[1]-box[3]/2.)*size[1]
        xmax = (box[1]+box[3]/2.)*size[1]
        ymin = (box[2]-box[4]/2.)*size[0]
        ymax = (box[2]+box[4]/2.)*size[0]
        box = (int(xmin), int(ymin), int(xmax), int(ymax))
        return box

    pos = read_list(txt_path)
    print(pos)
    tl = int((image.shape[0]+image.shape[1])/2)
    lf = max(tl-1,1)
    for i in range(len(pos)):
        label = str(int(pos[i][0]))
        print('label is '+label)
        box = convert(image.shape, pos[i])
        image = cv2.rectangle(image,(box[0], box[1]),(box[2],box[3]),(0,0,255),2)
        cv2.putText(image,label,(box[0],box[1]-2), 0, 1, [0,0,255], thickness=2, lineType=cv2.LINE_AA)
        pass

    if pos:
        cv2.imwrite('./Data/see_images/{}.png'.format(image_path.split('\\')[-1][:-4]), image)
    else:
        print('None')



img_folder = "./out/images/val2017"
img_list = os.listdir(img_folder)
img_list.sort()

label_folder = "./out/labels/val2017"
label_list = os.listdir(label_folder)
label_list.sort()
if not os.path.exists('./Data/see_images'):
    os.makedirs('./Data/see_images')
for i in range(len(img_list)):
    image_path = img_folder + "\\" + img_list[i]
    txt_path = label_folder + "\\" + label_list[i]
    draw_box_in_single_image(image_path, txt_path)

结果展示：
在这里插入图片描述

配置yolov10环境

可以直接安装requirements.txt里面所有的库文件，执行安装命令：

pip install -r requirements.txt

如果不想安装这么多库文件，在运行的时候，查看缺少哪个库，就安装哪个库

训练

下载代码：https://github.com/THU-MIG/yolov10，通过下载的方式可以下载到源码。
接下来，创建训练脚本，可以使用yaml文件创建，例如：

from ultralytics import YOLOv10
if __name__ == '__main__':
    model = YOLOv10(model="ultralytics/cfg/models/v10/yolov10l.yaml")  # 从头开始构建新模型
    # If you want to finetune the model with pretrained weights, you could load the
    # pretrained weights like below
    # model = YOLOv10.from_pretrained('jameslahm/yolov10{n/s/m/b/l/x}')
    # or
    # wget https://github.com/THU-MIG/yolov10/releases/download/v1.1/yolov10{n/s/m/b/l/x}.pt
    # model = YOLOv10('yolov10{n/s/m/b/l/x}.pt')

    # Use the model
    results = model.train(data="VOC.yaml",  patience=0, epochs=150, device='0', batch=8, seed=42)  # 训练模

模型文件在ultralytics/cfg/models/v10下面，如图：

在这里插入图片描述

也可以使用预训练模型创建。例如：

model = YOLOv10('yolov10n.pt')

然后开启训练。

# Use the model
model.train(data="coco128.yaml", epochs=3)  # train the model

数据集的配置文件在：ultralytics/datasets/下面，如图：
在这里插入图片描述

是不是很简单！！！！

接下来，我们配置自己的环境。
第一步找到ultralytics/cfg/datasets/coco.yaml文件。
在这里插入图片描述

然后将其复制到根目录
在这里插入图片描述

将里面的路径修改为：

# Ultralytics YOLO 🚀, GPL-3.0 license
# COCO 2017 dataset http://cocodataset.org by Microsoft
# Example usage: yolo train data=coco.yaml
# parent
# ├── ultralytics
# └── datasets
#     └── coco  ← downloads here (20.1 GB)


# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]

train: ./coco/images/train2017  # train images (relative to 'path') 118287 images
val: ./coco/images/val2017  # val images (relative to 'path') 5000 images
test: test-dev2017.txt  # 20288 of 40670 images, submit to https://competitions.codalab.org/competitions/20794

关于数据集的路径，大家可以自行尝试，我经过多次尝试发现，YoloV8会自行添加datasets这个文件，所以设置./coco/images/train2017,则实际路径是datasets/coco/images/train2017。
第二步新建train.py脚本。

from ultralytics import YOLOv10
if __name__ == '__main__':
    model = YOLOv10(model="ultralytics/cfg/models/v10/yolov10l.yaml")  # 从头开始构建新模型
    # If you want to finetune the model with pretrained weights, you could load the
    # pretrained weights like below
    # model = YOLOv10.from_pretrained('jameslahm/yolov10{n/s/m/b/l/x}')
    # or
    # wget https://github.com/THU-MIG/yolov10/releases/download/v1.1/yolov10{n/s/m/b/l/x}.pt
    # model = YOLOv10('yolov10{n/s/m/b/l/x}.pt')

    # Use the model
	results = model.train(data="coco.yaml", epochs=3,device='3')  # 训练模型

然后，点击train.py可以运行了。
如果设置多卡，可以在device中设置，例如我使用四张卡，可以设置为：

results = model.train(data="coco.yaml", epochs=3,device='0,1,2,3')  # 训练模型

在这里插入图片描述

第三步修改参数，在ultralytics/cfg/default.yaml文件中查看。例如：

# Train settings -------------------------------------------------------------------------------------------------------
model:  # path to model file, i.e. yolov8n.pt, yolov8n.yaml
data:  # path to data file, i.e. coco128.yaml
epochs: 100  # number of epochs to train for
patience: 50  # epochs to wait for no observable improvement for early stopping of training
batch: 16  # number of images per batch (-1 for AutoBatch)
imgsz: 640  # size of input images as integer or w,h
save: True  # save train checkpoints and predict results
save_period: -1 # Save checkpoint every x epochs (disabled if < 1)
cache: False  # True/ram, disk or False. Use cache for data loading
device:  # device to run on, i.e. cuda device=0 or device=0,1,2,3 or device=cpu
workers: 8  # number of worker threads for data loading (per RANK if DDP)
project:  # project name
name:  # experiment name, results saved to 'project/name' directory
exist_ok: False  # whether to overwrite existing experiment
pretrained: False  # whether to use a pretrained model
optimizer: SGD  # optimizer to use, choices=['SGD', 'Adam', 'AdamW', 'RMSProp']
verbose: True  # whether to print verbose output
seed: 0  # random seed for reproducibility
deterministic: True  # whether to enable deterministic mode
single_cls: False  # train multi-class data as single-class
image_weights: False  # use weighted image selection for training
rect: False  # support rectangular training if mode='train', support rectangular evaluation if mode='val'
cos_lr: False  # use cosine learning rate scheduler
close_mosaic: 10  # disable mosaic augmentation for final 10 epochs
resume: False  # resume training from last checkpoint

上面是训练过程中常用的参数，我们调用yolo函数可以自行修改。
等待测试完成后，就可以看到结果，如下图：

在这里插入图片描述

断点训练

训练过程中，有时候会出现意外中断的情况，如果想要接着训练，则需要将resume设置为True。代码如下：

from ultralytics import YOLOv10
if __name__ == '__main__':
    # 加载模型
    model = YOLOv10("runs/detect/train8/weights/last.pt")  # 从头开始构建新模型
    print(model.model)

    # Use the model
    results = model.train(data="VOC.yaml", epochs=100, device='0', batch=16,workers=0,resume=True)  # 训练模型

然后点击run，就可以继续接着训练。

测试

新建测试脚本test.py.

from ultralytics import YOLOv10

# Load a model
model = YOLOv10("runs/detect/train11/weights/best.pt")  # load a pretrained model (recommended for training)

results = model.predict(source="ultralytics/assets",device='3')  # predict on an image
print(results)

这个results保存了所有的结果。如下图：
在这里插入图片描述

predict的参数也可以在ultralytics/cfg/default.yaml文件中查看。例如：

# Prediction settings --------------------------------------------------------------------------------------------------
source:  # source directory for images or videos
show: False  # show results if possible
save_txt: False  # save results as .txt file
save_conf: False  # save results with confidence scores
save_crop: False  # save cropped images with results
hide_labels: False  # hide labels
hide_conf: False  # hide confidence scores
vid_stride: 1  # video frame-rate stride
line_thickness: 3  # bounding box thickness (pixels)
visualize: False  # visualize model features
augment: False  # apply image augmentation to prediction sources
agnostic_nms: False  # class-agnostic NMS
classes:  # filter results by class, i.e. class=0, or class=[0,2,3]
retina_masks: False  # use high-resolution segmentation masks
boxes: True  # Show boxes in segmentation predictions

训练自定义数据集

Labelme数据集

数据集选用我以前自己标注的数据集。下载链接：
https://download.csdn.net/download/hhhhhhhhhhwwwwwwwwww/63242994。
类别如下： [‘c17’, ‘c5’, ‘helicopter’, ‘c130’, ‘f16’, ‘b2’,
‘other’, ‘b52’, ‘kc10’, ‘command’, ‘f15’, ‘kc135’, ‘a10’,
‘b1’, ‘aew’, ‘f22’, ‘p3’, ‘p8’, ‘f35’, ‘f18’, ‘v22’, ‘f4’,
‘globalhawk’, ‘u2’, ‘su-27’, ‘il-38’, ‘tu-134’, ‘su-33’,
‘an-70’, ‘su-24’, ‘tu-22’, ‘il-76’]

格式转换

将Lableme数据集转为yolov10格式的数据集，转换代码如下：

import os
import shutil

import numpy as np
import json
from glob import glob
import cv2
from sklearn.model_selection import train_test_split
from os import getcwd


def convert(size, box):
    dw = 1. / (size[0])
    dh = 1. / (size[1])
    x = (box[0] + box[1]) / 2.0 - 1
    y = (box[2] + box[3]) / 2.0 - 1
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x * dw
    w = w * dw
    y = y * dh
    h = h * dh
    return (x, y, w, h)


def change_2_yolo5(files, txt_Name):
    imag_name=[]
    for json_file_ in files:
        json_filename = labelme_path + json_file_ + ".json"
        out_file = open('%s/%s.txt' % (labelme_path, json_file_), 'w')
        json_file = json.load(open(json_filename, "r", encoding="utf-8"))
        # image_path = labelme_path + json_file['imagePath']
        imag_name.append(json_file_+'.jpg')
        height, width, channels = cv2.imread(labelme_path + json_file_ + ".jpg").shape
        for multi in json_file["shapes"]:
            points = np.array(multi["points"])
            xmin = min(points[:, 0]) if min(points[:, 0]) > 0 else 0
            xmax = max(points[:, 0]) if max(points[:, 0]) > 0 else 0
            ymin = min(points[:, 1]) if min(points[:, 1]) > 0 else 0
            ymax = max(points[:, 1]) if max(points[:, 1]) > 0 else 0
            label = multi["label"].lower()
            if xmax <= xmin:
                pass
            elif ymax <= ymin:
                pass
            else:
                cls_id = classes.index(label)
                b = (float(xmin), float(xmax), float(ymin), float(ymax))
                bb = convert((width, height), b)
                out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')
                # print(json_filename, xmin, ymin, xmax, ymax, cls_id)
    return imag_name

def image_txt_copy(files,scr_path,dst_img_path,dst_txt_path):
    """
    :param files: 图片名字组成的list
    :param scr_path: 图片的路径
    :param dst_img_path: 图片复制到的路径
    :param dst_txt_path: 图片对应的txt复制到的路径
    :return:
    """
    for file in files:
        img_path=scr_path+file
        print(file)
        shutil.copy(img_path, dst_img_path+file)
        scr_txt_path=scr_path+file.split('.')[0]+'.txt'
        shutil.copy(scr_txt_path, dst_txt_path + file.split('.')[0]+'.txt')


if __name__ == '__main__':
    classes = ['c17', 'c5', 'helicopter', 'c130', 'f16', 'b2',
               'other', 'b52', 'kc10', 'command', 'f15', 'kc135', 'a10',
               'b1', 'aew', 'f22', 'p3', 'p8', 'f35', 'f18', 'v22', 'f4',
               'globalhawk', 'u2', 'su-27', 'il-38', 'tu-134', 'su-33',
               'an-70', 'su-24', 'tu-22', 'il-76']

    # 1.标签路径
    labelme_path = "USA-Labelme/"
    isUseTest = True  # 是否创建test集
    # 3.获取待处理文件
    files = glob(labelme_path + "*.json")

    files = [i.replace("\\", "/").split("/")[-1].split(".json")[0] for i in files]
    for i in files:
        print(i)
    trainval_files, test_files = train_test_split(files, test_size=0.1, random_state=55)
    # split
    train_files, val_files = train_test_split(trainval_files, test_size=0.1, random_state=55)
    train_name_list=change_2_yolo5(train_files, "train")
    print(train_name_list)
    val_name_list=change_2_yolo5(val_files, "val")
    test_name_list=change_2_yolo5(test_files, "test")
    #创建数据集文件夹。
    file_List = ["train", "val", "test"]
    for file in file_List:
        if not os.path.exists('./VOC/images/%s' % file):
            os.makedirs('./VOC/images/%s' % file)
        if not os.path.exists('./VOC/labels/%s' % file):
            os.makedirs('./VOC/labels/%s' % file)
    image_txt_copy(train_name_list,labelme_path,'./VOC/images/train/','./VOC/labels/train/')
    image_txt_copy(val_name_list, labelme_path, './VOC/images/val/', './VOC/labels/val/')
    image_txt_copy(test_name_list, labelme_path, './VOC/images/test/', './VOC/labels/test/')

运行完成后就得到了yolov8格式的数据集。
在这里插入图片描述

训练

将生成的yolo数据集放到datasets文件夹下面，如下图：
在这里插入图片描述

然后新建VOC.yaml文件，添加内容：


train: ./VOC/images/train # train images
val: VOC/images/val # val images
test: VOC/images/test # test images (optional)

names: ['c17', 'c5', 'helicopter', 'c130', 'f16', 'b2',
    'other', 'b52', 'kc10', 'command', 'f15', 'kc135', 'a10',
    'b1', 'aew', 'f22', 'p3', 'p8', 'f35', 'f18', 'v22', 'f4',
    'globalhawk', 'u2', 'su-27', 'il-38', 'tu-134', 'su-33',
    'an-70', 'su-24', 'tu-22', 'il-76']

然后新建train.py,添加代码：

from ultralytics import YOLO
if __name__ == '__main__':
    # 加载模型
    model = YOLO("ultralytics/models/v8/yolov8n.yaml")  # 从头开始构建新模型
    print(model.model)

    # Use the model
    results = model.train(data="VOC.yaml", epochs=100, device='0', batch=16,workers=0)  # 训练模型

然后就可以看是训练了，点击run开始运行train.py。
在这里插入图片描述

训练100个epoch后的结果：
在这里插入图片描述

测试

新建test.py脚本，插入代码：

from ultralytics import YOLOv10

# Load a model
model = YOLOv10("runs/detect/train/weights/best.pt")  # load a pretrained model (recommended for training)
results = model.predict(source="datasets/VOC/images/test",device='0',save=True)  # predict on an image

预测参数如下：

# Prediction settings --------------------------------------------------------------------------------------------------
source:  # source directory for images or videos
show: False  # show results if possible
save_txt: False  # save results as .txt file
save_conf: False  # save results with confidence scores
save_crop: False  # save cropped images with results
hide_labels: False  # hide labels
hide_conf: False  # hide confidence scores
vid_stride: 1  # video frame-rate stride
line_thickness: 3  # bounding box thickness (pixels)
visualize: False  # visualize model features
augment: False  # apply image augmentation to prediction sources
agnostic_nms: False  # class-agnostic NMS
classes:  # filter results by class, i.e. class=0, or class=[0,2,3]
retina_masks: False  # use high-resolution segmentation masks
boxes: True  # Show boxes in segmentation predictions

我们发现并没有像yolov5那样，保存测试图片的参数，通过查看源码：
在这里插入图片描述
找到了save这个参数，所以，将save设置为True就可以保存测试的图片。如下图：

在这里插入图片描述
如果觉得官方封装的太多了，不太灵活，可以使用下面的推理代码：

import cv2
import time
import random
import numpy as np
import torch, torchvision


def load_model(model_path):
    model = torch.load(model_path, map_location='cpu')
    category_list = model.get('CLASSES', model.get('model').names)
    model = (model.get('ema') or model['model']).to("cuda:0").float()  # FP32 model
    model.__setattr__('CLASSES', category_list)
    model.fuse().eval()
    return model


# def data_preprocess(model, img, img_scale):
#     stride, auto = 32, True
#     stride = max(int(model.stride.max()), 32)
#     img = letterbox(img, new_shape=img_scale, stride=stride, auto=auto)[0]  # padded resize
#     img = np.ascontiguousarray(img.transpose((2, 0, 1))[::-1])  # HWC to CHW, BGR to RGB,contiguous
#     img = torch.from_numpy(img).to("cuda:0")  # ndarray to tensor
#     img = img.float()  # uint8 to fp32
#     img /= 255  # 0 - 255 to 0.0 - 1.0
#     if len(img.shape) == 3:
#         img = img[None]  # expand for batch dim
#     return img


def data_preprocess(model, img, img_scale):
    # 定义步长和是否自动调整
    stride, auto = 32, True
    # 确保步长至少为模型的最大步长或32
    stride = max(int(model.stride.max()), 32)

    # 对图像进行填充并调整大小，以适应模型输入
    img = letterbox(img, new_shape=img_scale, stride=stride, auto=auto)[0]  # padded resize

    # 将图像的维度从(高度, 宽度, 通道)转换为(通道, 高度, 宽度)，并将数据类型从uint8转为fp32
    img = np.ascontiguousarray(img.transpose((2, 0, 1))[::-1])  # HWC to CHW, BGR to RGB,contiguous
    # 将numpy数组转换为PyTorch张量，并将数据移动到GPU上
    img = torch.from_numpy(img).to("cuda:0")  # ndarray to tensor
    # 将像素值从0-255的范围缩放到0.0-1.0
    img = img.float()  # uint8 to fp32
    img /= 255  #
    # 如果图像是单通道的，则在其前面添加一个维度以模拟批处理大小
    if len(img.shape) == 3:
        img = img[None]  # expand for batch dim
    return img


def letterbox(im, new_shape=(640, 640), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True, stride=32):
    # 获取图像当前形状 [高度, 宽度]
    shape = im.shape[:2]
    # 如果 new_shape 是一个整数，将其转换为元组 (宽度, 高度)
    if isinstance(new_shape, int):
        new_shape = (new_shape, new_shape)
    # 计算缩放比例 (新尺寸 / 旧尺寸)
    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
    # 如果不允许放大，则只进行缩小操作 (为更好的验证 mAP)
    if not scaleup:
        r = min(r, 1.0)

    # 计算缩放后的尺寸和填充
    ratio = r, r  # 宽度和高度比例
    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # 宽度和高度填充
    # 如果 auto 为 True，则按 stride 取模 (最小矩形)
    if auto:
        dw, dh = np.mod(dw, stride), np.mod(dh, stride)
    # 如果 scaleFill 为 True，则拉伸填充
    elif scaleFill:
        dw, dh = 0.0, 0.0
        new_unpad = (new_shape[1], new_shape[0])
        ratio = new_shape[1] / shape[1], new_shape[0] / shape[0]  # 宽度和高度比例
    # 将填充分为两部分，每部分为原来的一半
    dw /= 2
    dh /= 2
    # 如果原始尺寸与缩放后的尺寸不同，则进行缩放操作
    if shape[::-1] != new_unpad:
        im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)
    # 在图像周围添加边框，高度和宽度分别为上面计算得到的 dw 和 dh
    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
    im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # 添加边框
    return im, ratio, (dw, dh)  # 返回处理后的图像、宽高比例和填充值


def non_max_suppression(prediction, conf_thres=0.25, iou_thres=0.45, classes=None, agnostic=False, multi_label=False,
                        labels=(), max_det=300, nc=0, max_time_img=0.05, max_nms=30000, max_wh=7680, ):
    # Checks
    assert 0 <= conf_thres <= 1, f'Invalid Confidence threshold {conf_thres}, valid values are between 0.0 and 1.0'
    assert 0 <= iou_thres <= 1, f'Invalid IoU {iou_thres}, valid values are between 0.0 and 1.0'
    if isinstance(prediction, (list, tuple)):  # YOLOv8 model in validation model, output = (inference_out, loss_out)
        prediction = prediction[0]  # select only inference output

    device = prediction.device
    mps = 'mps' in device.type  # Apple MPS
    if mps:  # MPS not fully supported yet, convert tensors to CPU before NMS
        prediction = prediction.cpu()
    bs = prediction.shape[0]  # batch size
    nc = nc or (prediction.shape[1] - 4)  # number of classes
    nm = prediction.shape[1] - nc - 4
    mi = 4 + nc  # mask start index
    xc = prediction[:, 4:mi].amax(1) > conf_thres  # candidates

    # Settings
    # min_wh = 2  # (pixels) minimum box width and height
    time_limit = 0.5 + max_time_img * bs  # seconds to quit after
    multi_label &= nc > 1  # multiple labels per box (adds 0.5ms/img)

    prediction = prediction.transpose(-1, -2)  # shape(1,84,6300) to shape(1,6300,84)
    prediction[..., :4] = xywh2xyxy(prediction[..., :4])  # xywh to xyxy

    t = time.time()
    output = [torch.zeros((0, 6 + nm), device=prediction.device)] * bs
    for xi, x in enumerate(prediction):  # image index, image inference
        # Apply constraints
        # x[((x[:, 2:4] < min_wh) | (x[:, 2:4] > max_wh)).any(1), 4] = 0  # width-height
        x = x[xc[xi]]  # confidence

        # Cat apriori labels if autolabelling
        if labels and len(labels[xi]):
            lb = labels[xi]
            v = torch.zeros((len(lb), nc + nm + 4), device=x.device)
            v[:, :4] = xywh2xyxy(lb[:, 1:5])  # box
            v[range(len(lb)), lb[:, 0].long() + 4] = 1.0  # cls
            x = torch.cat((x, v), 0)

        # If none remain process next image
        if not x.shape[0]:
            continue

        # Detections matrix nx6 (xyxy, conf, cls)
        box, cls, mask = x.split((4, nc, nm), 1)

        if multi_label:
            i, j = torch.where(cls > conf_thres)
            x = torch.cat((box[i], x[i, 4 + j, None], j[:, None].float(), mask[i]), 1)
        else:  # best class only
            conf, j = cls.max(1, keepdim=True)
            x = torch.cat((box, conf, j.float(), mask), 1)[conf.view(-1) > conf_thres]

        # Filter by class
        if classes is not None:
            x = x[(x[:, 5:6] == torch.tensor(classes, device=x.device)).any(1)]

        # Check shape
        n = x.shape[0]  # number of boxes
        if not n:  # no boxes
            continue
        if n > max_nms:  # excess boxes
            x = x[x[:, 4].argsort(descending=True)[:max_nms]]  # sort by confidence and remove excess boxes

        # Batched NMS
        c = x[:, 5:6] * (0 if agnostic else max_wh)  # classes
        boxes, scores = x[:, :4] + c, x[:, 4]  # boxes (offset by class), scores
        i = torchvision.ops.nms(boxes, scores, iou_thres)  # NMS
        i = i[:max_det]  # limit detections

        output[xi] = x[i]
        if mps:
            output[xi] = output[xi].to(device)
        if (time.time() - t) > time_limit:
            print(f'WARNING ⚠️ NMS time limit {time_limit:.3f}s exceeded')
            break  # time limit exceeded
    return output


def xywh2xyxy(x):
    """
    Convert bounding box coordinates from (x, y, width, height) format to (x1, y1, x2, y2) format where (x1, y1) is the
    top-left corner and (x2, y2) is the bottom-right corner.
    Args:
        x (np.ndarray | torch.Tensor): The input bounding box coordinates in (x, y, width, height) format.
    Returns:
        y (np.ndarray | torch.Tensor): The bounding box coordinates in (x1, y1, x2, y2) format.
    """
    assert x.shape[-1] == 4, f'input shape last dimension expected 4 but input shape is {x.shape}'
    y = torch.empty_like(x) if isinstance(x, torch.Tensor) else np.empty_like(x)  # faster than clone/copy
    dw = x[..., 2] / 2  # half-width
    dh = x[..., 3] / 2  # half-height
    y[..., 0] = x[..., 0] - dw  # top left x
    y[..., 1] = x[..., 1] - dh  # top left y
    y[..., 2] = x[..., 0] + dw  # bottom right x
    y[..., 3] = x[..., 1] + dh  # bottom right y
    return y


def scale_boxes(img1_shape, boxes, img0_shape, ratio_pad=None, padding=True):
    """
    Rescales bounding boxes (in the format of xyxy) from the shape of the image they were originally specified in
    (img1_shape) to the shape of a different image (img0_shape).
    Args:
        img1_shape (tuple): The shape of the image that the bounding boxes are for, in the format of (height, width).
        boxes (torch.Tensor): the bounding boxes of the objects in the image, in the format of (x1, y1, x2, y2)
        img0_shape (tuple): the shape of the target image, in the format of (height, width).
        ratio_pad (tuple): a tuple of (ratio, pad) for scaling the boxes. If not provided, the ratio and pad will be
            calculated based on the size difference between the two images.
        padding (bool): If True, assuming the boxes is based on image augmented by yolo style. If False then do regular
            rescaling.
    Returns:
        boxes (torch.Tensor): The scaled bounding boxes, in the format of (x1, y1, x2, y2)
    """
    if ratio_pad is None:  # calculate from img0_shape
        gain = min(img1_shape[0] / img0_shape[0], img1_shape[1] / img0_shape[1])  # gain  = old / new
        pad = round((img1_shape[1] - img0_shape[1] * gain) / 2 - 0.1), round(
            (img1_shape[0] - img0_shape[0] * gain) / 2 - 0.1)  # wh padding
    else:
        gain = ratio_pad[0][0]
        pad = ratio_pad[1]

    if padding:
        boxes[..., [0, 2]] -= pad[0]  # x padding
        boxes[..., [1, 3]] -= pad[1]  # y padding
    boxes[..., :4] /= gain
    clip_boxes(boxes, img0_shape)
    return boxes


def clip_boxes(boxes, shape):
    """
    Takes a list of bounding boxes and a shape (height, width) and clips the bounding boxes to the shape.

    Args:
      boxes (torch.Tensor): the bounding boxes to clip
      shape (tuple): the shape of the image
    """
    if isinstance(boxes, torch.Tensor):  # faster individually
        boxes[..., 0].clamp_(0, shape[1])  # x1
        boxes[..., 1].clamp_(0, shape[0])  # y1
        boxes[..., 2].clamp_(0, shape[1])  # x2
        boxes[..., 3].clamp_(0, shape[0])  # y2
    else:  # np.array (faster grouped)
        boxes[..., [0, 2]] = boxes[..., [0, 2]].clip(0, shape[1])  # x1, x2
        boxes[..., [1, 3]] = boxes[..., [1, 3]].clip(0, shape[0])  # y1, y2


def plot_result(det_cpu, dst_img, category_names, image_name):
    for i, item in enumerate(det_cpu):
        rand_color = (random.randint(0, 255), random.randint(0, 255), random.randint(0, 255))
        # 画box
        box_x1, box_y1, box_x2, box_y2 = item[0:4].astype(np.int32)
        cv2.rectangle(dst_img, (box_x1, box_y1), (box_x2, box_y2), color=rand_color, thickness=2)
        # 画label
        label = category_names[int(item[5])]
        score = item[4]
        org = (min(box_x1, box_x2), min(box_y1, box_y2) - 8)
        text = '{}|{:.2f}'.format(label, score)
        cv2.putText(dst_img, text, org=org, fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.8, color=rand_color,
                    thickness=2)

    cv2.imshow('result', dst_img)
    cv2.waitKey()
    cv2.imwrite(image_name, dst_img)


if __name__ == '__main__':
    img_path = "./ultralytics/assets/bus.jpg"
    image_name = img_path.split('/')[-1]
    ori_img = cv2.imread(img_path)
    # load model
    model = load_model("runs/detect/train2/weights/best.pt")
    # 数据预处理
    img = data_preprocess(model, ori_img, [640, 640])
    # 推理
    result = model(img, augment=False)
    preds = result[0]
    # NMS
    det = non_max_suppression(preds, conf_thres=0.35, iou_thres=0.45, nc=len(model.CLASSES))[0]
    # bbox还原至原图尺寸
    det[:, :4] = scale_boxes(img.shape[2:], det[:, :4], ori_img.shape)
    category_names = model.CLASSES
    # show
    plot_result(det.cpu().numpy(), ori_img, category_names, image_name)