Yolov10网络详解与实战(附数据集)

news2024/9/20 20:22:36

文章目录

  • 摘要
  • 模型详解
  • 模型实战
    • 训练COCO数据集
      • 下载数据集
    • COCO转yolo格式数据集(适用V4,V5,V6,V7,V8)
      • 配置yolov10环境
      • 训练
      • 断点训练
      • 测试
    • 训练自定义数据集
      • Labelme数据集
      • 格式转换
      • 训练
      • 测试
  • 总结

摘要

模型详解

模型实战

训练COCO数据集

本次使用2017版本的COCO数据集作为例子,演示如何使用YoloV10训练和预测。

下载数据集

Images:

  • 2017 Train images [118K/18GB] :http://images.cocodataset.org/zips/train2017.zip
  • 2017 Val images [5K/1GB]:http://images.cocodataset.org/zips/val2017.zip
  • 2017 Test images [41K/6GB]:http://images.cocodataset.org/zips/unlabeled2017.zip

Annotations:

  • 2017 annotations_trainval2017 [241MB]:http://images.cocodataset.org/annotations/annotations_trainval2017.zip

COCO转yolo格式数据集(适用V4,V5,V6,V7,V8)

最初的研究论文中,COCO中有91个对象类别。然而,在2014年的第一次发布中,仅发布了80个标记和分割图像的对象类别。2014年发布之后,2017年发布了后续版本。详细的类别如下:

IDOBJECT (PAPER)OBJECT (2014 REL.)OBJECT (2017 REL.)SUPER CATEGORY
1personpersonpersonperson
2bicyclebicyclebicyclevehicle
3carcarcarvehicle
4motorcyclemotorcyclemotorcyclevehicle
5airplaneairplaneairplanevehicle
6busbusbusvehicle
7traintraintrainvehicle
8trucktrucktruckvehicle
9boatboatboatvehicle
10trafficlighttraffic lighttraffic lightoutdoor
11fire hydrantfire hydrantfire hydrantoutdoor
12streetsign--
13stop signstop signstop signoutdoor
14parking meterparking meterparking meteroutdoor
15benchbenchbenchoutdoor
16birdbirdbirdanimal
17catcatcatanimal
18dogdogdoganimal
19horsehorsehorseanimal
20sheepsheepsheepanimal
21cowcowcowanimal
22elephantelephantelephantanimal
23bearbearbearanimal
24zebrazebrazebraanimal
25giraffegiraffegiraffeanimal
26hat--accessory
27backpackbackpackbackpackaccessory
28umbrellaumbrellaumbrellaaccessory
29shoe--accessory
30eye glasses--accessory
31handbaghandbaghandbagaccessory
32tietietieaccessory
33suitcasesuitcasesuitcaseaccessory
34frisbeefrisbeefrisbeesports
35skisskisskissports
36snowboardsnowboardsnowboardsports
37sports ballsports ballsports ballsports
38kitekitekitesports
39baseball batbaseball batbaseball batsports
40baseball glovebaseball glovebaseball glovesports
41skateboardskateboardskateboardsports
42surfboardsurfboardsurfboardsports
43tennis rackettennis rackettennis racketsports
44bottlebottlebottlekitchen
45plate--kitchen
46wine glasswine glasswine glasskitchen
47cupcupcupkitchen
48forkforkforkkitchen
49knifeknifeknifekitchen
50spoonspoonspoonkitchen
51bowlbowlbowlkitchen
52bananabananabananafood
53appleappleapplefood
54sandwichsandwichsandwichfood
55orangeorangeorangefood
56broccolibroccolibroccolifood
57carrotcarrotcarrotfood
58hot doghot doghot dogfood
59pizzapizzapizzafood
60donutdonutdonutfood
61cakecakecakefood
62chairchairchairfurniture
63couchcouchcouchfurniture
64potted plantpotted plantpotted plantfurniture
65bedbedbedfurniture
66mirror--furniture
67dining tabledining tabledining tablefurniture
68window--furniture
69desk--furniture
70toilettoilettoiletfurniture
71door--furniture
72tvtvtvelectronic
73laptoplaptoplaptopelectronic
74mousemousemouseelectronic
75remoteremoteremoteelectronic
76keyboardkeyboardkeyboardelectronic
77cell phonecell phonecell phoneelectronic
78microwavemicrowavemicrowaveappliance
79ovenovenovenappliance
80toastertoastertoasterappliance
81sinksinksinkappliance
82refrigeratorrefrigeratorrefrigeratorappliance
83blender--appliance
84bookbookbookindoor
85clockclockclockindoor
86vasevasevaseindoor
87scissorsscissorsscissorsindoor
88teddy bearteddy bearteddy bearindoor
89hair drierhair drierhair drierindoor
90toothbrushtoothbrushtoothbrushindoor
91hair brush--indoor

可以看到,2014年和2017年发布的对象列表是相同的,它们是论文中最初91个对象类别中的80个对象。所以在转换的时候,要重新对类别做映射,映射函数如下:

def coco91_to_coco80_class():  # converts 80-index (val2014) to 91-index (paper)
    # https://tech.amikelive.com/node-718/what-object-categories-labels-are-in-coco-dataset/
    # a = np.loadtxt('data/coco.names', dtype='str', delimiter='\n')
    # b = np.loadtxt('data/coco_paper.names', dtype='str', delimiter='\n')
    # x1 = [list(a[i] == b).index(True) + 1 for i in range(80)]  # darknet to coco
    # x2 = [list(b[i] == a).index(True) if any(b[i] == a) else None for i in range(91)]  # coco to darknet
    x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, None, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, None, 24, 25, None,
         None, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, None, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
         51, 52, 53, 54, 55, 56, 57, 58, 59, None, 60, None, None, 61, None, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72,
         None, 73, 74, 75, 76, 77, 78, 79, None]
    return x

接下来,开始格式转换,工程的目录如下:
在这里插入图片描述

  • coco:存放解压后的数据集。
    -out:保存输出结果。
    -coco2yolo.py:转换脚本。

转换代码如下:

import json
import glob
import os
import shutil
from pathlib import Path
import numpy as np
from tqdm import tqdm


def make_folders(path='../out/'):
    # Create folders

    if os.path.exists(path):
        shutil.rmtree(path)  # delete output folder
    os.makedirs(path)  # make new output folder
    os.makedirs(path + os.sep + 'labels')  # make new labels folder
    os.makedirs(path + os.sep + 'images')  # make new labels folder
    return path


def convert_coco_json(json_dir='./coco/annotations_trainval2017/annotations/'):
    jsons = glob.glob(json_dir + '*.json')
    coco80 = coco91_to_coco80_class()

    # Import json
    for json_file in sorted(jsons):
        fn = 'out/labels/%s/' % Path(json_file).stem.replace('instances_', '')  # folder name
        fn_images = 'out/images/%s/' % Path(json_file).stem.replace('instances_', '')  # folder name
        os.makedirs(fn,exist_ok=True)
        os.makedirs(fn_images,exist_ok=True)
        with open(json_file) as f:
            data = json.load(f)
        print(fn)
        # Create image dict
        images = {'%g' % x['id']: x for x in data['images']}

        # Write labels file
        for x in tqdm(data['annotations'], desc='Annotations %s' % json_file):
            if x['iscrowd']:
                continue

            img = images['%g' % x['image_id']]
            h, w, f = img['height'], img['width'], img['file_name']
            file_path='coco/'+fn.split('/')[-2]+"/"+f
            # The Labelbox bounding box format is [top left x, top left y, width, height]
            box = np.array(x['bbox'], dtype=np.float64)
            box[:2] += box[2:] / 2  # xy top-left corner to center
            box[[0, 2]] /= w  # normalize x
            box[[1, 3]] /= h  # normalize y

            if (box[2] > 0.) and (box[3] > 0.):  # if w > 0 and h > 0
                with open(fn + Path(f).stem + '.txt', 'a') as file:
                    file.write('%g %.6f %.6f %.6f %.6f\n' % (coco80[x['category_id'] - 1], *box))
            file_path_t=fn_images+f
            print(file_path,file_path_t)
            shutil.copy(file_path,file_path_t)


def coco91_to_coco80_class():  # converts 80-index (val2014) to 91-index (paper)
    # https://tech.amikelive.com/node-718/what-object-categories-labels-are-in-coco-dataset/
    # a = np.loadtxt('data/coco.names', dtype='str', delimiter='\n')
    # b = np.loadtxt('data/coco_paper.names', dtype='str', delimiter='\n')
    # x1 = [list(a[i] == b).index(True) + 1 for i in range(80)]  # darknet to coco
    # x2 = [list(b[i] == a).index(True) if any(b[i] == a) else None for i in range(91)]  # coco to darknet
    x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, None, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, None, 24, 25, None,
         None, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, None, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
         51, 52, 53, 54, 55, 56, 57, 58, 59, None, 60, None, None, 61, None, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72,
         None, 73, 74, 75, 76, 77, 78, 79, None]
    return x

convert_coco_json()

开始运行:
在这里插入图片描述

转换完成后,验证转换的结果:

import cv2
import os

def draw_box_in_single_image(image_path, txt_path):
    # 读取图像
    image = cv2.imread(image_path)

    # 读取txt文件信息
    def read_list(txt_path):
        pos = []
        with open(txt_path, 'r') as file_to_read:
            while True:
                lines = file_to_read.readline()  # 整行读取数据
                if not lines:
                    break
                # 将整行数据分割处理,如果分割符是空格,括号里就不用传入参数,如果是逗号, 则传入‘,'字符。
                p_tmp = [float(i) for i in lines.split(' ')]
                pos.append(p_tmp)  # 添加新读取的数据
                # Efield.append(E_tmp)
                pass
        return pos


    # txt转换为box
    def convert(size, box):
        xmin = (box[1]-box[3]/2.)*size[1]
        xmax = (box[1]+box[3]/2.)*size[1]
        ymin = (box[2]-box[4]/2.)*size[0]
        ymax = (box[2]+box[4]/2.)*size[0]
        box = (int(xmin), int(ymin), int(xmax), int(ymax))
        return box

    pos = read_list(txt_path)
    print(pos)
    tl = int((image.shape[0]+image.shape[1])/2)
    lf = max(tl-1,1)
    for i in range(len(pos)):
        label = str(int(pos[i][0]))
        print('label is '+label)
        box = convert(image.shape, pos[i])
        image = cv2.rectangle(image,(box[0], box[1]),(box[2],box[3]),(0,0,255),2)
        cv2.putText(image,label,(box[0],box[1]-2), 0, 1, [0,0,255], thickness=2, lineType=cv2.LINE_AA)
        pass

    if pos:
        cv2.imwrite('./Data/see_images/{}.png'.format(image_path.split('\\')[-1][:-4]), image)
    else:
        print('None')



img_folder = "./out/images/val2017"
img_list = os.listdir(img_folder)
img_list.sort()

label_folder = "./out/labels/val2017"
label_list = os.listdir(label_folder)
label_list.sort()
if not os.path.exists('./Data/see_images'):
    os.makedirs('./Data/see_images')
for i in range(len(img_list)):
    image_path = img_folder + "\\" + img_list[i]
    txt_path = label_folder + "\\" + label_list[i]
    draw_box_in_single_image(image_path, txt_path)

结果展示:
在这里插入图片描述

配置yolov10环境

可以直接安装requirements.txt里面所有的库文件,执行安装命令:

pip install -r requirements.txt

如果不想安装这么多库文件,在运行的时候,查看缺少哪个库,就安装哪个库

训练

下载代码:https://github.com/THU-MIG/yolov10,通过下载的方式可以下载到源码。
接下来,创建训练脚本,可以使用yaml文件创建,例如:

from ultralytics import YOLOv10
if __name__ == '__main__':
    model = YOLOv10(model="ultralytics/cfg/models/v10/yolov10l.yaml")  # 从头开始构建新模型
    # If you want to finetune the model with pretrained weights, you could load the
    # pretrained weights like below
    # model = YOLOv10.from_pretrained('jameslahm/yolov10{n/s/m/b/l/x}')
    # or
    # wget https://github.com/THU-MIG/yolov10/releases/download/v1.1/yolov10{n/s/m/b/l/x}.pt
    # model = YOLOv10('yolov10{n/s/m/b/l/x}.pt')

    # Use the model
    results = model.train(data="VOC.yaml",  patience=0, epochs=150, device='0', batch=8, seed=42)  # 训练模

模型文件在ultralytics/cfg/models/v10下面,如图:

在这里插入图片描述

也可以使用预训练模型创建。例如:

model = YOLOv10('yolov10n.pt')

然后开启训练。

# Use the model
model.train(data="coco128.yaml", epochs=3)  # train the model

数据集的配置文件在:ultralytics/datasets/下面,如图:
在这里插入图片描述

是不是很简单!!!!

接下来,我们配置自己的环境。
第一步 找到ultralytics/cfg/datasets/coco.yaml文件。
在这里插入图片描述

然后将其复制到根目录
在这里插入图片描述

将里面的路径修改为:

# Ultralytics YOLO 🚀, GPL-3.0 license
# COCO 2017 dataset http://cocodataset.org by Microsoft
# Example usage: yolo train data=coco.yaml
# parent
# ├── ultralytics
# └── datasets
#     └── coco  ← downloads here (20.1 GB)


# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]

train: ./coco/images/train2017  # train images (relative to 'path') 118287 images
val: ./coco/images/val2017  # val images (relative to 'path') 5000 images
test: test-dev2017.txt  # 20288 of 40670 images, submit to https://competitions.codalab.org/competitions/20794

关于数据集的路径,大家可以自行尝试,我经过多次尝试发现,YoloV8会自行添加datasets这个文件,所以设置./coco/images/train2017,则实际路径是datasets/coco/images/train2017
第二步 新建train.py脚本。

from ultralytics import YOLOv10
if __name__ == '__main__':
    model = YOLOv10(model="ultralytics/cfg/models/v10/yolov10l.yaml")  # 从头开始构建新模型
    # If you want to finetune the model with pretrained weights, you could load the
    # pretrained weights like below
    # model = YOLOv10.from_pretrained('jameslahm/yolov10{n/s/m/b/l/x}')
    # or
    # wget https://github.com/THU-MIG/yolov10/releases/download/v1.1/yolov10{n/s/m/b/l/x}.pt
    # model = YOLOv10('yolov10{n/s/m/b/l/x}.pt')

    # Use the model
	results = model.train(data="coco.yaml", epochs=3,device='3')  # 训练模型

然后,点击train.py可以运行了。
如果设置多卡,可以在device中设置,例如我使用四张卡,可以设置为:

results = model.train(data="coco.yaml", epochs=3,device='0,1,2,3')  # 训练模型

在这里插入图片描述

第三步 修改参数,在ultralytics/cfg/default.yaml文件中查看。例如:

# Train settings -------------------------------------------------------------------------------------------------------
model:  # path to model file, i.e. yolov8n.pt, yolov8n.yaml
data:  # path to data file, i.e. coco128.yaml
epochs: 100  # number of epochs to train for
patience: 50  # epochs to wait for no observable improvement for early stopping of training
batch: 16  # number of images per batch (-1 for AutoBatch)
imgsz: 640  # size of input images as integer or w,h
save: True  # save train checkpoints and predict results
save_period: -1 # Save checkpoint every x epochs (disabled if < 1)
cache: False  # True/ram, disk or False. Use cache for data loading
device:  # device to run on, i.e. cuda device=0 or device=0,1,2,3 or device=cpu
workers: 8  # number of worker threads for data loading (per RANK if DDP)
project:  # project name
name:  # experiment name, results saved to 'project/name' directory
exist_ok: False  # whether to overwrite existing experiment
pretrained: False  # whether to use a pretrained model
optimizer: SGD  # optimizer to use, choices=['SGD', 'Adam', 'AdamW', 'RMSProp']
verbose: True  # whether to print verbose output
seed: 0  # random seed for reproducibility
deterministic: True  # whether to enable deterministic mode
single_cls: False  # train multi-class data as single-class
image_weights: False  # use weighted image selection for training
rect: False  # support rectangular training if mode='train', support rectangular evaluation if mode='val'
cos_lr: False  # use cosine learning rate scheduler
close_mosaic: 10  # disable mosaic augmentation for final 10 epochs
resume: False  # resume training from last checkpoint

上面是训练过程中常用的参数,我们调用yolo函数可以自行修改。
等待测试完成后,就可以看到结果,如下图:

在这里插入图片描述

断点训练

训练过程中,有时候会出现意外中断的情况,如果想要接着训练,则需要将resume设置为True。代码如下:

from ultralytics import YOLOv10
if __name__ == '__main__':
    # 加载模型
    model = YOLOv10("runs/detect/train8/weights/last.pt")  # 从头开始构建新模型
    print(model.model)

    # Use the model
    results = model.train(data="VOC.yaml", epochs=100, device='0', batch=16,workers=0,resume=True)  # 训练模型

然后点击run,就可以继续接着训练。

测试

新建测试脚本test.py.

from ultralytics import YOLOv10

# Load a model
model = YOLOv10("runs/detect/train11/weights/best.pt")  # load a pretrained model (recommended for training)

results = model.predict(source="ultralytics/assets",device='3')  # predict on an image
print(results)

这个results保存了所有的结果。如下图:
在这里插入图片描述

predict的参数也可以在ultralytics/cfg/default.yaml文件中查看。例如:

# Prediction settings --------------------------------------------------------------------------------------------------
source:  # source directory for images or videos
show: False  # show results if possible
save_txt: False  # save results as .txt file
save_conf: False  # save results with confidence scores
save_crop: False  # save cropped images with results
hide_labels: False  # hide labels
hide_conf: False  # hide confidence scores
vid_stride: 1  # video frame-rate stride
line_thickness: 3  # bounding box thickness (pixels)
visualize: False  # visualize model features
augment: False  # apply image augmentation to prediction sources
agnostic_nms: False  # class-agnostic NMS
classes:  # filter results by class, i.e. class=0, or class=[0,2,3]
retina_masks: False  # use high-resolution segmentation masks
boxes: True  # Show boxes in segmentation predictions

训练自定义数据集

Labelme数据集

数据集选用我以前自己标注的数据集。下载链接:
https://download.csdn.net/download/hhhhhhhhhhwwwwwwwwww/63242994。
类别如下: [‘c17’, ‘c5’, ‘helicopter’, ‘c130’, ‘f16’, ‘b2’,
‘other’, ‘b52’, ‘kc10’, ‘command’, ‘f15’, ‘kc135’, ‘a10’,
‘b1’, ‘aew’, ‘f22’, ‘p3’, ‘p8’, ‘f35’, ‘f18’, ‘v22’, ‘f4’,
‘globalhawk’, ‘u2’, ‘su-27’, ‘il-38’, ‘tu-134’, ‘su-33’,
‘an-70’, ‘su-24’, ‘tu-22’, ‘il-76’]

格式转换

将Lableme数据集转为yolov10格式的数据集,转换代码如下:

import os
import shutil

import numpy as np
import json
from glob import glob
import cv2
from sklearn.model_selection import train_test_split
from os import getcwd


def convert(size, box):
    dw = 1. / (size[0])
    dh = 1. / (size[1])
    x = (box[0] + box[1]) / 2.0 - 1
    y = (box[2] + box[3]) / 2.0 - 1
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x * dw
    w = w * dw
    y = y * dh
    h = h * dh
    return (x, y, w, h)


def change_2_yolo5(files, txt_Name):
    imag_name=[]
    for json_file_ in files:
        json_filename = labelme_path + json_file_ + ".json"
        out_file = open('%s/%s.txt' % (labelme_path, json_file_), 'w')
        json_file = json.load(open(json_filename, "r", encoding="utf-8"))
        # image_path = labelme_path + json_file['imagePath']
        imag_name.append(json_file_+'.jpg')
        height, width, channels = cv2.imread(labelme_path + json_file_ + ".jpg").shape
        for multi in json_file["shapes"]:
            points = np.array(multi["points"])
            xmin = min(points[:, 0]) if min(points[:, 0]) > 0 else 0
            xmax = max(points[:, 0]) if max(points[:, 0]) > 0 else 0
            ymin = min(points[:, 1]) if min(points[:, 1]) > 0 else 0
            ymax = max(points[:, 1]) if max(points[:, 1]) > 0 else 0
            label = multi["label"].lower()
            if xmax <= xmin:
                pass
            elif ymax <= ymin:
                pass
            else:
                cls_id = classes.index(label)
                b = (float(xmin), float(xmax), float(ymin), float(ymax))
                bb = convert((width, height), b)
                out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')
                # print(json_filename, xmin, ymin, xmax, ymax, cls_id)
    return imag_name

def image_txt_copy(files,scr_path,dst_img_path,dst_txt_path):
    """
    :param files: 图片名字组成的list
    :param scr_path: 图片的路径
    :param dst_img_path: 图片复制到的路径
    :param dst_txt_path: 图片对应的txt复制到的路径
    :return:
    """
    for file in files:
        img_path=scr_path+file
        print(file)
        shutil.copy(img_path, dst_img_path+file)
        scr_txt_path=scr_path+file.split('.')[0]+'.txt'
        shutil.copy(scr_txt_path, dst_txt_path + file.split('.')[0]+'.txt')


if __name__ == '__main__':
    classes = ['c17', 'c5', 'helicopter', 'c130', 'f16', 'b2',
               'other', 'b52', 'kc10', 'command', 'f15', 'kc135', 'a10',
               'b1', 'aew', 'f22', 'p3', 'p8', 'f35', 'f18', 'v22', 'f4',
               'globalhawk', 'u2', 'su-27', 'il-38', 'tu-134', 'su-33',
               'an-70', 'su-24', 'tu-22', 'il-76']

    # 1.标签路径
    labelme_path = "USA-Labelme/"
    isUseTest = True  # 是否创建test集
    # 3.获取待处理文件
    files = glob(labelme_path + "*.json")

    files = [i.replace("\\", "/").split("/")[-1].split(".json")[0] for i in files]
    for i in files:
        print(i)
    trainval_files, test_files = train_test_split(files, test_size=0.1, random_state=55)
    # split
    train_files, val_files = train_test_split(trainval_files, test_size=0.1, random_state=55)
    train_name_list=change_2_yolo5(train_files, "train")
    print(train_name_list)
    val_name_list=change_2_yolo5(val_files, "val")
    test_name_list=change_2_yolo5(test_files, "test")
    #创建数据集文件夹。
    file_List = ["train", "val", "test"]
    for file in file_List:
        if not os.path.exists('./VOC/images/%s' % file):
            os.makedirs('./VOC/images/%s' % file)
        if not os.path.exists('./VOC/labels/%s' % file):
            os.makedirs('./VOC/labels/%s' % file)
    image_txt_copy(train_name_list,labelme_path,'./VOC/images/train/','./VOC/labels/train/')
    image_txt_copy(val_name_list, labelme_path, './VOC/images/val/', './VOC/labels/val/')
    image_txt_copy(test_name_list, labelme_path, './VOC/images/test/', './VOC/labels/test/')

运行完成后就得到了yolov8格式的数据集。
在这里插入图片描述

训练

将生成的yolo数据集放到datasets文件夹下面,如下图:
在这里插入图片描述

然后新建VOC.yaml文件,添加内容:


train: ./VOC/images/train # train images
val: VOC/images/val # val images
test: VOC/images/test # test images (optional)

names: ['c17', 'c5', 'helicopter', 'c130', 'f16', 'b2',
    'other', 'b52', 'kc10', 'command', 'f15', 'kc135', 'a10',
    'b1', 'aew', 'f22', 'p3', 'p8', 'f35', 'f18', 'v22', 'f4',
    'globalhawk', 'u2', 'su-27', 'il-38', 'tu-134', 'su-33',
    'an-70', 'su-24', 'tu-22', 'il-76']

然后新建train.py,添加代码:

from ultralytics import YOLO
if __name__ == '__main__':
    # 加载模型
    model = YOLO("ultralytics/models/v8/yolov8n.yaml")  # 从头开始构建新模型
    print(model.model)

    # Use the model
    results = model.train(data="VOC.yaml", epochs=100, device='0', batch=16,workers=0)  # 训练模型

然后就可以看是训练了,点击run开始运行train.py。
在这里插入图片描述

训练100个epoch后的结果:
在这里插入图片描述

测试

新建test.py脚本,插入代码:

from ultralytics import YOLOv10

# Load a model
model = YOLOv10("runs/detect/train/weights/best.pt")  # load a pretrained model (recommended for training)
results = model.predict(source="datasets/VOC/images/test",device='0',save=True)  # predict on an image

预测参数如下:

# Prediction settings --------------------------------------------------------------------------------------------------
source:  # source directory for images or videos
show: False  # show results if possible
save_txt: False  # save results as .txt file
save_conf: False  # save results with confidence scores
save_crop: False  # save cropped images with results
hide_labels: False  # hide labels
hide_conf: False  # hide confidence scores
vid_stride: 1  # video frame-rate stride
line_thickness: 3  # bounding box thickness (pixels)
visualize: False  # visualize model features
augment: False  # apply image augmentation to prediction sources
agnostic_nms: False  # class-agnostic NMS
classes:  # filter results by class, i.e. class=0, or class=[0,2,3]
retina_masks: False  # use high-resolution segmentation masks
boxes: True  # Show boxes in segmentation predictions

我们发现并没有像yolov5那样,保存测试图片的参数,通过查看源码:
在这里插入图片描述
找到了save这个参数,所以,将save设置为True就可以保存测试的图片。如下图:

在这里插入图片描述
如果觉得官方封装的太多了,不太灵活,可以使用下面的推理代码:

import cv2
import time
import random
import numpy as np
import torch, torchvision


def load_model(model_path):
    model = torch.load(model_path, map_location='cpu')
    category_list = model.get('CLASSES', model.get('model').names)
    model = (model.get('ema') or model['model']).to("cuda:0").float()  # FP32 model
    model.__setattr__('CLASSES', category_list)
    model.fuse().eval()
    return model


# def data_preprocess(model, img, img_scale):
#     stride, auto = 32, True
#     stride = max(int(model.stride.max()), 32)
#     img = letterbox(img, new_shape=img_scale, stride=stride, auto=auto)[0]  # padded resize
#     img = np.ascontiguousarray(img.transpose((2, 0, 1))[::-1])  # HWC to CHW, BGR to RGB,contiguous
#     img = torch.from_numpy(img).to("cuda:0")  # ndarray to tensor
#     img = img.float()  # uint8 to fp32
#     img /= 255  # 0 - 255 to 0.0 - 1.0
#     if len(img.shape) == 3:
#         img = img[None]  # expand for batch dim
#     return img


def data_preprocess(model, img, img_scale):
    # 定义步长和是否自动调整
    stride, auto = 32, True
    # 确保步长至少为模型的最大步长或32
    stride = max(int(model.stride.max()), 32)

    # 对图像进行填充并调整大小,以适应模型输入
    img = letterbox(img, new_shape=img_scale, stride=stride, auto=auto)[0]  # padded resize

    # 将图像的维度从(高度, 宽度, 通道)转换为(通道, 高度, 宽度),并将数据类型从uint8转为fp32
    img = np.ascontiguousarray(img.transpose((2, 0, 1))[::-1])  # HWC to CHW, BGR to RGB,contiguous
    # 将numpy数组转换为PyTorch张量,并将数据移动到GPU上
    img = torch.from_numpy(img).to("cuda:0")  # ndarray to tensor
    # 将像素值从0-255的范围缩放到0.0-1.0
    img = img.float()  # uint8 to fp32
    img /= 255  #
    # 如果图像是单通道的,则在其前面添加一个维度以模拟批处理大小
    if len(img.shape) == 3:
        img = img[None]  # expand for batch dim
    return img


def letterbox(im, new_shape=(640, 640), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True, stride=32):
    # 获取图像当前形状 [高度, 宽度]
    shape = im.shape[:2]
    # 如果 new_shape 是一个整数,将其转换为元组 (宽度, 高度)
    if isinstance(new_shape, int):
        new_shape = (new_shape, new_shape)
    # 计算缩放比例 (新尺寸 / 旧尺寸)
    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
    # 如果不允许放大,则只进行缩小操作 (为更好的验证 mAP)
    if not scaleup:
        r = min(r, 1.0)

    # 计算缩放后的尺寸和填充
    ratio = r, r  # 宽度和高度比例
    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # 宽度和高度填充
    # 如果 auto 为 True,则按 stride 取模 (最小矩形)
    if auto:
        dw, dh = np.mod(dw, stride), np.mod(dh, stride)
    # 如果 scaleFill 为 True,则拉伸填充
    elif scaleFill:
        dw, dh = 0.0, 0.0
        new_unpad = (new_shape[1], new_shape[0])
        ratio = new_shape[1] / shape[1], new_shape[0] / shape[0]  # 宽度和高度比例
    # 将填充分为两部分,每部分为原来的一半
    dw /= 2
    dh /= 2
    # 如果原始尺寸与缩放后的尺寸不同,则进行缩放操作
    if shape[::-1] != new_unpad:
        im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)
    # 在图像周围添加边框,高度和宽度分别为上面计算得到的 dw 和 dh
    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
    im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # 添加边框
    return im, ratio, (dw, dh)  # 返回处理后的图像、宽高比例和填充值


def non_max_suppression(prediction, conf_thres=0.25, iou_thres=0.45, classes=None, agnostic=False, multi_label=False,
                        labels=(), max_det=300, nc=0, max_time_img=0.05, max_nms=30000, max_wh=7680, ):
    # Checks
    assert 0 <= conf_thres <= 1, f'Invalid Confidence threshold {conf_thres}, valid values are between 0.0 and 1.0'
    assert 0 <= iou_thres <= 1, f'Invalid IoU {iou_thres}, valid values are between 0.0 and 1.0'
    if isinstance(prediction, (list, tuple)):  # YOLOv8 model in validation model, output = (inference_out, loss_out)
        prediction = prediction[0]  # select only inference output

    device = prediction.device
    mps = 'mps' in device.type  # Apple MPS
    if mps:  # MPS not fully supported yet, convert tensors to CPU before NMS
        prediction = prediction.cpu()
    bs = prediction.shape[0]  # batch size
    nc = nc or (prediction.shape[1] - 4)  # number of classes
    nm = prediction.shape[1] - nc - 4
    mi = 4 + nc  # mask start index
    xc = prediction[:, 4:mi].amax(1) > conf_thres  # candidates

    # Settings
    # min_wh = 2  # (pixels) minimum box width and height
    time_limit = 0.5 + max_time_img * bs  # seconds to quit after
    multi_label &= nc > 1  # multiple labels per box (adds 0.5ms/img)

    prediction = prediction.transpose(-1, -2)  # shape(1,84,6300) to shape(1,6300,84)
    prediction[..., :4] = xywh2xyxy(prediction[..., :4])  # xywh to xyxy

    t = time.time()
    output = [torch.zeros((0, 6 + nm), device=prediction.device)] * bs
    for xi, x in enumerate(prediction):  # image index, image inference
        # Apply constraints
        # x[((x[:, 2:4] < min_wh) | (x[:, 2:4] > max_wh)).any(1), 4] = 0  # width-height
        x = x[xc[xi]]  # confidence

        # Cat apriori labels if autolabelling
        if labels and len(labels[xi]):
            lb = labels[xi]
            v = torch.zeros((len(lb), nc + nm + 4), device=x.device)
            v[:, :4] = xywh2xyxy(lb[:, 1:5])  # box
            v[range(len(lb)), lb[:, 0].long() + 4] = 1.0  # cls
            x = torch.cat((x, v), 0)

        # If none remain process next image
        if not x.shape[0]:
            continue

        # Detections matrix nx6 (xyxy, conf, cls)
        box, cls, mask = x.split((4, nc, nm), 1)

        if multi_label:
            i, j = torch.where(cls > conf_thres)
            x = torch.cat((box[i], x[i, 4 + j, None], j[:, None].float(), mask[i]), 1)
        else:  # best class only
            conf, j = cls.max(1, keepdim=True)
            x = torch.cat((box, conf, j.float(), mask), 1)[conf.view(-1) > conf_thres]

        # Filter by class
        if classes is not None:
            x = x[(x[:, 5:6] == torch.tensor(classes, device=x.device)).any(1)]

        # Check shape
        n = x.shape[0]  # number of boxes
        if not n:  # no boxes
            continue
        if n > max_nms:  # excess boxes
            x = x[x[:, 4].argsort(descending=True)[:max_nms]]  # sort by confidence and remove excess boxes

        # Batched NMS
        c = x[:, 5:6] * (0 if agnostic else max_wh)  # classes
        boxes, scores = x[:, :4] + c, x[:, 4]  # boxes (offset by class), scores
        i = torchvision.ops.nms(boxes, scores, iou_thres)  # NMS
        i = i[:max_det]  # limit detections

        output[xi] = x[i]
        if mps:
            output[xi] = output[xi].to(device)
        if (time.time() - t) > time_limit:
            print(f'WARNING ⚠️ NMS time limit {time_limit:.3f}s exceeded')
            break  # time limit exceeded
    return output


def xywh2xyxy(x):
    """
    Convert bounding box coordinates from (x, y, width, height) format to (x1, y1, x2, y2) format where (x1, y1) is the
    top-left corner and (x2, y2) is the bottom-right corner.
    Args:
        x (np.ndarray | torch.Tensor): The input bounding box coordinates in (x, y, width, height) format.
    Returns:
        y (np.ndarray | torch.Tensor): The bounding box coordinates in (x1, y1, x2, y2) format.
    """
    assert x.shape[-1] == 4, f'input shape last dimension expected 4 but input shape is {x.shape}'
    y = torch.empty_like(x) if isinstance(x, torch.Tensor) else np.empty_like(x)  # faster than clone/copy
    dw = x[..., 2] / 2  # half-width
    dh = x[..., 3] / 2  # half-height
    y[..., 0] = x[..., 0] - dw  # top left x
    y[..., 1] = x[..., 1] - dh  # top left y
    y[..., 2] = x[..., 0] + dw  # bottom right x
    y[..., 3] = x[..., 1] + dh  # bottom right y
    return y


def scale_boxes(img1_shape, boxes, img0_shape, ratio_pad=None, padding=True):
    """
    Rescales bounding boxes (in the format of xyxy) from the shape of the image they were originally specified in
    (img1_shape) to the shape of a different image (img0_shape).
    Args:
        img1_shape (tuple): The shape of the image that the bounding boxes are for, in the format of (height, width).
        boxes (torch.Tensor): the bounding boxes of the objects in the image, in the format of (x1, y1, x2, y2)
        img0_shape (tuple): the shape of the target image, in the format of (height, width).
        ratio_pad (tuple): a tuple of (ratio, pad) for scaling the boxes. If not provided, the ratio and pad will be
            calculated based on the size difference between the two images.
        padding (bool): If True, assuming the boxes is based on image augmented by yolo style. If False then do regular
            rescaling.
    Returns:
        boxes (torch.Tensor): The scaled bounding boxes, in the format of (x1, y1, x2, y2)
    """
    if ratio_pad is None:  # calculate from img0_shape
        gain = min(img1_shape[0] / img0_shape[0], img1_shape[1] / img0_shape[1])  # gain  = old / new
        pad = round((img1_shape[1] - img0_shape[1] * gain) / 2 - 0.1), round(
            (img1_shape[0] - img0_shape[0] * gain) / 2 - 0.1)  # wh padding
    else:
        gain = ratio_pad[0][0]
        pad = ratio_pad[1]

    if padding:
        boxes[..., [0, 2]] -= pad[0]  # x padding
        boxes[..., [1, 3]] -= pad[1]  # y padding
    boxes[..., :4] /= gain
    clip_boxes(boxes, img0_shape)
    return boxes


def clip_boxes(boxes, shape):
    """
    Takes a list of bounding boxes and a shape (height, width) and clips the bounding boxes to the shape.

    Args:
      boxes (torch.Tensor): the bounding boxes to clip
      shape (tuple): the shape of the image
    """
    if isinstance(boxes, torch.Tensor):  # faster individually
        boxes[..., 0].clamp_(0, shape[1])  # x1
        boxes[..., 1].clamp_(0, shape[0])  # y1
        boxes[..., 2].clamp_(0, shape[1])  # x2
        boxes[..., 3].clamp_(0, shape[0])  # y2
    else:  # np.array (faster grouped)
        boxes[..., [0, 2]] = boxes[..., [0, 2]].clip(0, shape[1])  # x1, x2
        boxes[..., [1, 3]] = boxes[..., [1, 3]].clip(0, shape[0])  # y1, y2


def plot_result(det_cpu, dst_img, category_names, image_name):
    for i, item in enumerate(det_cpu):
        rand_color = (random.randint(0, 255), random.randint(0, 255), random.randint(0, 255))
        # 画box
        box_x1, box_y1, box_x2, box_y2 = item[0:4].astype(np.int32)
        cv2.rectangle(dst_img, (box_x1, box_y1), (box_x2, box_y2), color=rand_color, thickness=2)
        # 画label
        label = category_names[int(item[5])]
        score = item[4]
        org = (min(box_x1, box_x2), min(box_y1, box_y2) - 8)
        text = '{}|{:.2f}'.format(label, score)
        cv2.putText(dst_img, text, org=org, fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.8, color=rand_color,
                    thickness=2)

    cv2.imshow('result', dst_img)
    cv2.waitKey()
    cv2.imwrite(image_name, dst_img)


if __name__ == '__main__':
    img_path = "./ultralytics/assets/bus.jpg"
    image_name = img_path.split('/')[-1]
    ori_img = cv2.imread(img_path)
    # load model
    model = load_model("runs/detect/train2/weights/best.pt")
    # 数据预处理
    img = data_preprocess(model, ori_img, [640, 640])
    # 推理
    result = model(img, augment=False)
    preds = result[0]
    # NMS
    det = non_max_suppression(preds, conf_thres=0.35, iou_thres=0.45, nc=len(model.CLASSES))[0]
    # bbox还原至原图尺寸
    det[:, :4] = scale_boxes(img.shape[2:], det[:, :4], ori_img.shape)
    category_names = model.CLASSES
    # show
    plot_result(det.cpu().numpy(), ori_img, category_names, image_name)

总结

本文对yolov10的模型做了讲解,并且带大家一起实战!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2059156.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

Tiktok和Facebook广告哪个效果更好?

Tiktok广告作为新兴的数字营销工具&#xff0c;以其独特的短视频格式在全球范围内迅速获得了广泛的受众关注&#xff0c;如今已经和Facebook并列成为了社交媒体营销广告的两巨头&#xff0c;刚开始做海外社交媒体广告的朋友可能会纠结&#xff0c;这两者哪个的广告效果更好&…

大模型微调课程及大模型应用开发课程介绍

大模型实验室是在学校现有的实验室建设基础上&#xff0c;依托行业标杆企业&#xff0c;聚焦行业大模型产业发展方向&#xff0c;建设一个产学研一体化的合作教学平台&#xff0c;形成“教与学紧密结合、理论与实践紧密结合&#xff0c;学校与企业紧密结合”的创新教育模式。大…

搭建 PXE 远程安装服务器和设置 Kickstart 无人值守安装

目录 搭建 PXE 远程安装服务器 1.安装并启用 TFTP 服务 2.安装并启用 DHCP 服务 3.准备 Linux 内核、初始化镜像文件 4.准备 PXE 引导程序 5.安装FTP服务&#xff0c;准备CentOS 7 安装源 6.配置启动菜单文件 7.关闭防火墙&#xff0c;验证 PXE 网络安装 设置 Kicksta…

SpringSecurity6

一、Spring Security概述 1、Spring Security简介 ​ Spring Security 是一个能够为基于 Spring 的企业应用系统提供声明式的安全访问控制解决方案的安全框架。它提供了一组可以在 Spring 应用上下文中配置的 Bean&#xff0c;充分利用了 Spring IoC&#xff08;Inversion of…

el-table实现动态添加行,并且有父子级联动下拉框

<template><div><el-button click"addRow">添加行</el-button><el-table :data"tableData" style"width: 100%"><el-table-column label"序号"type"index"width"100"align"…

flink读写案例合集

文章目录 前言一、flink 写kafka1.第一种使用FlinkKafkaProducer API2.第二种使用自定义序列化器3.第三种使用FlinkKafkaProducer011 API4.使用Kafka的Avro序列化 (没有使用过,感觉比较复杂)5.第五种使用 (强烈推荐使用)二、Flink读kafka三、Flink写其他外部系统前言 提示:这…

【Kettle实战】组件讲解(战前磨刀)

目录 【 CSV 文件输入 】组件【过滤记录】组件【字段选择】组件【排序记录】组件【分组】组件【Excel输出】组件【 CSV 文件输入 】组件 基础参数解释: 字段参数解释: 【过滤记录】组件 在数据处理时,往往要对数据所属类别、区域和时间等进行限制,将限制范围外的数据过…

上千条备孕至育儿指南速查ACCESS\EXCEL数据库

虽然今天这个数据库的记录数才不过区区上千条&#xff0c;但是每条记录里的内容都包含四五个子标题&#xff0c;可以将相关的知识完整且整齐的展现&#xff0c;是个属于简而精的数据库。并且它包含2级分类。 【备孕】大类包含&#xff1a;备孕百科(19)、不孕不育(23)、精子卵子…

QStorageInfo 出现C2228报错

这里使用 QStorageInfo info(QDir(path));创建就会报错 改为 // 获取给定路径所在的磁盘信息 QDir d(path); QStorageInfo info(d);就不会报错&#xff0c;怪噻&#xff01;

【区块链+商贸零售】消费券 2.0 应用方案 | FISCO BCOS应用案例

方案基于FISCO BCOS区块链技术与中间件平台WeBASE&#xff0c;实现新一代消费券安全精准高效发放&#xff0c;实现消费激励&#xff0c; 促进消费循环。同时&#xff0c;方案将用户消费数据上链&#xff0c;实现账本记录与管理&#xff0c;同时加密机制保证了数据安全性。

基于python的坦克游戏的设计与实现

获取源码联系方式请查看文章结尾&#x1f345; 摘 要 随着互联网的日益普及、python语言在Internet上的实现&#xff0c;python应用程序产生的 互联网增值服务逐渐体现出其影响力&#xff0c;对丰富人们的生活内容、提供快捷的资讯起着不可忽视的作用。本论文中介绍了python的相…

梯度、偏导数、导数

梯度 对于一个多变量函数 f(x1,x2,…,xn)&#xff0c;其梯度 ∇f 是一个 n 维向量&#xff0c;定义为&#xff1a; ​ 是函数 f在 方向上的偏导数。 偏导数 偏导数是多元函数在某一个方向上的导数&#xff0c;它描述了函数在该方向上的局部变化率。偏导数的计算过程涉及对函…

ARR 竟然超过 150 万美元!斯坦福都在使用的 AI 学术搜索引擎 Consensus获 USV 领投的 1100 万美元。

惊爆&#xff01;就在当下&#xff0c;AI 学术搜索引擎 Consensus 传来令人震撼的消息&#xff0c;其已成功完成 1100 万美元融资。此轮 A 轮融资由 Union Square Ventures 领衔主导&#xff0c;其他参与的投资者有 Nat Friedman、Daniel Gross 以及 Draper Associates 等等。 …

springboot大学生时间管理分析系统---附源码130930

摘 要 时间是一种无形资源,但可以对其进行有效的使用与管理。时间管理倾向是个体在运用时间方式上所表现出来的心理和行为特征&#xff0c;具有多维度、多层次的心理结构&#xff0c;由时间价值感、时间监控观和时间效能感构成。时间是一种重要的资源,作为当代大学生,在进行生…

【UltraVNC】私有远程工具VNC机器部署方式

旨在解决监控端非固定IP的计算机A,远程连接受控的端非固定IP的计算机B。 一、UltraVNC下载和安装 官网:Home - UltraVNC VNC OFFICIAL SITE, Remote Desktop Free Opensource 二、部署私有的远程维护VNC机器-方式一 UltraVNC中继模式原理: UltraVNC中继模式部署: 1.1 中…

在ubuntu16.04下使用词典工具GoldenDict

前言 本来要装有道词典&#xff0c;结果发现各种问题&#xff0c;放弃。 网上看大家对GoldenDict评价比较高&#xff0c;决定安装GoldenDict 。 安装 启动 添加词库 GoldenDict本身并不带词库&#xff0c;需要查词的话&#xff0c;必须先下载离线词库或者配置在线翻译网址才…

安泰电压放大器的设计要求是什么样的

电压放大器的设计要求是一个广泛而复杂的领域&#xff0c;它在电子工程中扮演着至关重要的角色。电压放大器是一种电子电路&#xff0c;用于将输入信号的电压增大&#xff0c;而不改变其波形&#xff0c;通常用于放大微弱的信号以便进行后续处理或传输。下面将详细介绍电压放大…

【Mybatis-plus】Mybatis-plus的踩坑日记之速查版

【Mybatis-plus】Mybatis-plus踩坑日记之速查版 开篇词&#xff1a;干货篇&#xff1a;1.TableField(fill FieldFill.INSERT_UPDATE)的错误使用2.采用MybatisPlus自带update方法&#xff0c;但无法更新null的问题3.表字段为json类型的入库问题4.字段忽略未生效5.自带id生成策略…

RabbitMQ中消息的分发策略

我的后端学习大纲 RabbitMQ学习大纲 1.不公平分发&#xff1a; 1.1.什么是不公平分发&#xff1a; 1.在最开始的时候我们学习到 RabbitMQ 分发消息采用的轮训分发&#xff0c;但在某种场景下这种策略并不是很好&#xff0c;比方说有两个消费者在处理任务&#xff0c;其中有个…

基于vue全家桶的pc端仿淘宝系统_kebgy基于vue全家桶的pc端仿淘宝系统_kebgy--论文

TOC springboot478基于vue全家桶的pc端仿淘宝系统_kebgy基于vue全家桶的pc端仿淘宝系统_kebgy--论文 绪 论 1.1开发背景 改革开放以来&#xff0c;中国社会经济体系复苏&#xff0c;人们生活水平稳步提升&#xff0c;中国社会已全面步入小康社会。同时也在逐渐转型&#xf…