第98步 深度学习图像目标检测:SSD建模

news2025/1/23 3:12:50

基于WIN10的64位系统演示

一、写在前面

本期开始,我们继续学习深度学习图像目标检测系列,SSD(Single Shot MultiBox Detector)模型。

二、SSD简介

SSD(Single Shot MultiBox Detector)是一种流行的目标检测算法,由 Wei Liu, Dragomir Anguelov, Dumitru Erhan 等人于 2016 年提出。它是一种单阶段的目标检测算法,与当时流行的两阶段检测器(如 Faster R-CNN)相比,SSD 提供了更快的检测速度,同时仍然具有较高的准确性。

以下是 SSD 的主要特点和组件:

(1)多尺度特征映射:

SSD 从不同的层级提取特征图,这使得它能够有效地检测不同大小的物体。这是通过在多个特征图上执行预测来实现的,其中每个特征图代表不同的尺度。

(2)默认框(或称为先验框、锚框):

在每个特征图位置,SSD 定义了多个具有不同形状和大小的默认框。这些默认框用于与真实边界框进行匹配,并提供回归目标以调整预测的边界框大小和位置。

(3)单阶段检测器:

与两阶段检测器不同,SSD 在单个前向传递中同时进行边界框回归和类别分类,从而实现了速度和准确性之间的平衡。

(4)损失函数:

SSD 使用了组合损失,包括边界框回归的平滑 L1 损失和类别预测的交叉熵损失。

(5)数据增强:

为了提高模型的性能,SSD 使用了多种数据增强技术,包括随机裁剪、缩放和颜色扭曲等。

(6)模型骨干:

原始的 SSD 使用 VGG-16 作为其骨干网络,但后续的变种如 SSDlite 使用了更轻量级的骨干网络,如 MobileNet。

三、数据源

来源于公共数据,文件设置如下:

大概的任务就是:用一个框框标记出MTB的位置。

四、SSD实战

直接上代码:

import os
import random
import torch
import torchvision
from torchvision.models.detection import ssd300_vgg16
from torchvision.transforms import functional as F
from PIL import Image
from torch.utils.data import DataLoader
import xml.etree.ElementTree as ET
import matplotlib.pyplot as plt
from torchvision import transforms
import albumentations as A
from albumentations.pytorch import ToTensorV2
import numpy as np

# Function to parse XML annotations
def parse_xml(xml_path):
    tree = ET.parse(xml_path)
    root = tree.getroot()

    boxes = []
    for obj in root.findall("object"):
        bndbox = obj.find("bndbox")
        xmin = int(bndbox.find("xmin").text)
        ymin = int(bndbox.find("ymin").text)
        xmax = int(bndbox.find("xmax").text)
        ymax = int(bndbox.find("ymax").text)

        # Check if the bounding box is valid
        if xmin < xmax and ymin < ymax:
            boxes.append((xmin, ymin, xmax, ymax))
        else:
            print(f"Warning: Ignored invalid box in {xml_path} - ({xmin}, {ymin}, {xmax}, {ymax})")

    return boxes

# Function to split data into training and validation sets
def split_data(image_dir, split_ratio=0.8):
    all_images = [f for f in os.listdir(image_dir) if f.endswith(".jpg")]
    random.shuffle(all_images)
    split_idx = int(len(all_images) * split_ratio)
    train_images = all_images[:split_idx]
    val_images = all_images[split_idx:]
    
    return train_images, val_images


# Dataset class for the Tuberculosis dataset
class TuberculosisDataset(torch.utils.data.Dataset):
    def __init__(self, image_dir, annotation_dir, image_list, transform=None):
        self.image_dir = image_dir
        self.annotation_dir = annotation_dir
        self.image_list = image_list
        self.transform = transform

    def __len__(self):
        return len(self.image_list)

    def __getitem__(self, idx):
        image_path = os.path.join(self.image_dir, self.image_list[idx])
        image = Image.open(image_path).convert("RGB")
        
        xml_path = os.path.join(self.annotation_dir, self.image_list[idx].replace(".jpg", ".xml"))
        boxes = parse_xml(xml_path)
        
        # Check for empty bounding boxes and return None
        if len(boxes) == 0:
            return None
        
        boxes = torch.as_tensor(boxes, dtype=torch.float32)
        labels = torch.ones((len(boxes),), dtype=torch.int64)
        iscrowd = torch.zeros((len(boxes),), dtype=torch.int64)
        
        target = {}
        target["boxes"] = boxes
        target["labels"] = labels
        target["image_id"] = torch.tensor([idx])
        target["iscrowd"] = iscrowd
        
        # Apply transformations
        if self.transform:
            image = self.transform(image)
    
        return image, target

# Define the transformations using torchvision
data_transform = torchvision.transforms.Compose([
    torchvision.transforms.ToTensor(),  # Convert PIL image to tensor
    torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])  # Normalize the images
])


# Adjusting the DataLoader collate function to handle None values
def collate_fn(batch):
    batch = list(filter(lambda x: x is not None, batch))
    return tuple(zip(*batch))


def get_ssd_model_for_finetuning(num_classes):
    # Load an SSD model with a VGG16 backbone without pre-trained weights
    model = ssd300_vgg16(pretrained=False, num_classes=num_classes)
    return model

# Function to save the model
def save_model(model, path="RetinaNet_mtb.pth", save_full_model=False):
    if save_full_model:
        torch.save(model, path)
    else:
        torch.save(model.state_dict(), path)
    print(f"Model saved to {path}")

# Function to compute Intersection over Union
def compute_iou(boxA, boxB):
    xA = max(boxA[0], boxB[0])
    yA = max(boxA[1], boxB[1])
    xB = min(boxA[2], boxB[2])
    yB = min(boxA[3], boxB[3])
    
    interArea = max(0, xB - xA + 1) * max(0, yB - yA + 1)
    boxAArea = (boxA[2] - boxA[0] + 1) * (boxA[3] - boxA[1] + 1)
    boxBArea = (boxB[2] - boxB[0] + 1) * (boxB[3] - boxB[1] + 1)
    
    iou = interArea / float(boxAArea + boxBArea - interArea)
    return iou

# Adjusting the DataLoader collate function to handle None values and entirely empty batches
def collate_fn(batch):
    batch = list(filter(lambda x: x is not None, batch))
    if len(batch) == 0:
        # Return placeholder batch if entirely empty
        return [torch.zeros(1, 3, 224, 224)], [{}]
    return tuple(zip(*batch))

#Training function with modifications for collecting IoU and loss
def train_model(model, train_loader, optimizer, device, num_epochs=10):
    model.train()
    model.to(device)
    loss_values = []
    iou_values = []
    for epoch in range(num_epochs):
        epoch_loss = 0.0
        total_ious = 0
        num_boxes = 0
        for images, targets in train_loader:
            # Skip batches with placeholder data
            if len(targets) == 1 and not targets[0]:
                continue
            # Skip batches with empty targets
            if any(len(target["boxes"]) == 0 for target in targets):
                continue
            images = [image.to(device) for image in images]
            targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
            
            loss_dict = model(images, targets)
            losses = sum(loss for loss in loss_dict.values())
            
            optimizer.zero_grad()
            losses.backward()
            optimizer.step()
            
            epoch_loss += losses.item()
            
            # Compute IoU for evaluation
            with torch.no_grad():
                model.eval()
                predictions = model(images)
                for i, prediction in enumerate(predictions):
                    pred_boxes = prediction["boxes"].cpu().numpy()
                    true_boxes = targets[i]["boxes"].cpu().numpy()
                    for pred_box in pred_boxes:
                        for true_box in true_boxes:
                            iou = compute_iou(pred_box, true_box)
                            total_ious += iou
                            num_boxes += 1
                model.train()
        
        avg_loss = epoch_loss / len(train_loader)
        avg_iou = total_ious / num_boxes if num_boxes != 0 else 0
        loss_values.append(avg_loss)
        iou_values.append(avg_iou)
        print(f"Epoch {epoch+1}/{num_epochs} Loss: {avg_loss} Avg IoU: {avg_iou}")
    
    # Plotting loss and IoU values
    plt.figure(figsize=(12, 5))
    plt.subplot(1, 2, 1)
    plt.plot(loss_values, label="Training Loss")
    plt.title("Training Loss across Epochs")
    plt.xlabel("Epochs")
    plt.ylabel("Loss")
    
    plt.subplot(1, 2, 2)
    plt.plot(iou_values, label="IoU")
    plt.title("IoU across Epochs")
    plt.xlabel("Epochs")
    plt.ylabel("IoU")
    plt.show()

    # Save model after training
    save_model(model)

# Validation function
def validate_model(model, val_loader, device):
    model.eval()
    model.to(device)
    
    with torch.no_grad():
        for images, targets in val_loader:
            images = [image.to(device) for image in images]
            targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
            model(images)

# Paths to your data
image_dir = "tuberculosis-phonecamera"
annotation_dir = "tuberculosis-phonecamera"

# Split data
train_images, val_images = split_data(image_dir)

# Create datasets and dataloaders
train_dataset = TuberculosisDataset(image_dir, annotation_dir, train_images, transform=data_transform)
val_dataset = TuberculosisDataset(image_dir, annotation_dir, val_images, transform=data_transform)

# Updated DataLoader with new collate function
train_loader = DataLoader(train_dataset, batch_size=4, shuffle=True, collate_fn=collate_fn)
val_loader = DataLoader(val_dataset, batch_size=4, shuffle=False, collate_fn=collate_fn)

# Model and optimizer
model = get_ssd_model_for_finetuning(2)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Train and validate
train_model(model, train_loader, optimizer, device="cuda", num_epochs=10)
validate_model(model, val_loader, device="cuda")


#######################################Print Metrics######################################
def calculate_metrics(predictions, ground_truths, iou_threshold=0.5):
    TP = 0  # True Positives
    FP = 0  # False Positives
    FN = 0  # False Negatives
    total_iou = 0  # to calculate mean IoU

    for pred, gt in zip(predictions, ground_truths):
        pred_boxes = pred["boxes"].cpu().numpy()
        gt_boxes = gt["boxes"].cpu().numpy()

        # Match predicted boxes to ground truth boxes
        for pred_box in pred_boxes:
            max_iou = 0
            matched = False
            for gt_box in gt_boxes:
                iou = compute_iou(pred_box, gt_box)
                if iou > max_iou:
                    max_iou = iou
                    if iou > iou_threshold:
                        matched = True

            total_iou += max_iou
            if matched:
                TP += 1
            else:
                FP += 1

        FN += len(gt_boxes) - TP

    precision = TP / (TP + FP) if (TP + FP) != 0 else 0
    recall = TP / (TP + FN) if (TP + FN) != 0 else 0
    f1_score = (2 * precision * recall) / (precision + recall) if (precision + recall) != 0 else 0
    mean_iou = total_iou / (TP + FP) if (TP + FP) != 0 else 0

    return precision, recall, f1_score, mean_iou

def evaluate_model(model, dataloader, device):
    model.eval()
    model.to(device)
    all_predictions = []
    all_ground_truths = []

    with torch.no_grad():
        for images, targets in dataloader:
            images = [image.to(device) for image in images]
            predictions = model(images)

            all_predictions.extend(predictions)
            all_ground_truths.extend(targets)

    precision, recall, f1_score, mean_iou = calculate_metrics(all_predictions, all_ground_truths)
    return precision, recall, f1_score, mean_iou


train_precision, train_recall, train_f1, train_iou = evaluate_model(model, train_loader, "cuda")
val_precision, val_recall, val_f1, val_iou = evaluate_model(model, val_loader, "cuda")

print("Training Set Metrics:")
print(f"Precision: {train_precision:.4f}, Recall: {train_recall:.4f}, F1 Score: {train_f1:.4f}, Mean IoU: {train_iou:.4f}")

print("\nValidation Set Metrics:")
print(f"Precision: {val_precision:.4f}, Recall: {val_recall:.4f}, F1 Score: {val_f1:.4f}, Mean IoU: {val_iou:.4f}")

#sheet
header = "| Metric    | Training Set | Validation Set |"
divider = "+----------+--------------+----------------+"

train_metrics = f"| Precision | {train_precision:.4f}      | {val_precision:.4f}          |"
recall_metrics = f"| Recall    | {train_recall:.4f}      | {val_recall:.4f}          |"
f1_metrics = f"| F1 Score  | {train_f1:.4f}      | {val_f1:.4f}          |"
iou_metrics = f"| Mean IoU  | {train_iou:.4f}      | {val_iou:.4f}          |"

print(header)
print(divider)
print(train_metrics)
print(recall_metrics)
print(f1_metrics)
print(iou_metrics)
print(divider)

#######################################Train Set######################################
import numpy as np
import matplotlib.pyplot as plt

def plot_predictions_on_image(model, dataset, device, title):
    # Select a random image from the dataset
    idx = np.random.randint(50, len(dataset))
    image, target = dataset[idx]
    img_tensor = image.clone().detach().to(device).unsqueeze(0)

    # Use the model to make predictions
    model.eval()
    with torch.no_grad():
        prediction = model(img_tensor)

    # Inverse normalization for visualization
    inv_normalize = transforms.Normalize(
        mean=[-0.485/0.229, -0.456/0.224, -0.406/0.225],
        std=[1/0.229, 1/0.224, 1/0.225]
    )
    image = inv_normalize(image)
    image = torch.clamp(image, 0, 1)
    image = F.to_pil_image(image)

    # Plot the image with ground truth boxes
    plt.figure(figsize=(10, 6))
    plt.title(title + " with Ground Truth Boxes")
    plt.imshow(image)
    ax = plt.gca()

    # Draw the ground truth boxes in blue
    for box in target["boxes"]:
        rect = plt.Rectangle(
            (box[0], box[1]), box[2]-box[0], box[3]-box[1],
            fill=False, color='blue', linewidth=2
        )
        ax.add_patch(rect)
    plt.show()

    # Plot the image with predicted boxes
    plt.figure(figsize=(10, 6))
    plt.title(title + " with Predicted Boxes")
    plt.imshow(image)
    ax = plt.gca()

    # Draw the predicted boxes in red
    for box in prediction[0]["boxes"].cpu():
        rect = plt.Rectangle(
            (box[0], box[1]), box[2]-box[0], box[3]-box[1],
            fill=False, color='red', linewidth=2
        )
        ax.add_patch(rect)
    plt.show()

# Call the function for a random image from the train dataset
plot_predictions_on_image(model, train_dataset, "cuda", "Selected from Training Set")


#######################################Val Set######################################

# Call the function for a random image from the validation dataset
plot_predictions_on_image(model, val_dataset, "cuda", "Selected from Validation Set")

需要从头训练的,就不跑了。

结尾我开始摆烂了。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1246091.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

鼠标点击位置获取几何体对象_vtkAreaPicker_vtkInteractorStyleRubberBandPick

开发环境&#xff1a; Windows 11 家庭中文版Microsoft Visual Studio Community 2019VTK-9.3.0.rc0vtk-example参考代码 demo解决问题&#xff1a;框选或者点选某一区域&#xff0c;并获取区域prop3D对象&#xff08;红线内为有效区域&#xff0c;polydata组成的3d几何对象&a…

精益制造中的周转箱和工具柜优势

制造业&#xff08;Manufacturing industry&#xff09;是指机械工业时代利用某种资源&#xff08;物料、能源、设备、工具、资金、技术、信息和人力等&#xff09;&#xff0c;按照市场要求&#xff0c;通过制造过程&#xff0c;转化为可供人们使用和利用的大型工具、工业品与…

杰发科技AC7801——EEP内存分布情况

简介 按照文档进行配置 核心代码如下 /*!* file sweeprom_demo.c** brief This file provides sweeprom demo test function.**//* Includes */ #include <stdlib.h> #include "ac780x_sweeprom.h" #include "ac780x_debugout.h"/* Define …

技术前沿探索:人工智能与大数据融合的未来

技术前沿探索&#xff1a;人工智能与大数据融合的未来 摘要&#xff1a;本博客将探讨人工智能与大数据融合领域的最新技术趋势、前沿研究方向以及挑战与机遇。通过介绍相关技术和案例&#xff0c;我们希望激发读者对这一领域的兴趣&#xff0c;并为其职业发展提供有益参考。 一…

nvm-切换node版本工具安装-方便好用

去官网下载&#xff1a; https://github.com/coreybutler/nvm-windows#installation--upgrades 网站进去后点击下载&#xff0c;点击那个exe文件就下载本地&#xff0c;然后双击安装 安装nvm 就直接按照窗口提示的下一步就行&#xff0c;如果改了某些地方会不成功&#xf…

论文笔记——FasterNet

为了设计快速神经网络,许多工作都集中在减少浮点运算(FLOPs)的数量上。然而,作者观察到FLOPs的这种减少不一定会带来延迟的类似程度的减少。这主要源于每秒低浮点运算(FLOPS)效率低下。 为了实现更快的网络,作者重新回顾了FLOPs的运算符,并证明了如此低的FLOPS主要是由…

如何在Simulink中使用syms?换个思路解决报错:Function ‘syms‘ not supported for code generation.

问题描述 在Simulink中的User defined function使用syms函数&#xff0c;报错simulink无法使用外部函数。 具体来说&#xff1a; 我想在Predefined function定义如下符号函数作为输入信号&#xff0c;在后续模块传入函数参数赋值&#xff0c;以实现一次定义多次使用&#xf…

Python,FastAPI,mLB网关,无法访问/docs

根源就是js和ccs文件访问路由的问题&#xff0c;首先你要有本地的文件&#xff0c;详情看https://qq742971636.blog.csdn.net/article/details/134587010。 其次&#xff0c;你需要这么写&#xff1a; /unicontorlblip就是我配置的mLB网关路由。 app FastAPI(titleoutpaint…

【Unity细节】Unity中为什么用字符串加载对象,检查多便都加载不出来—(命名细节)

&#x1f468;‍&#x1f4bb;个人主页&#xff1a;元宇宙-秩沅 hallo 欢迎 点赞&#x1f44d; 收藏⭐ 留言&#x1f4dd; 加关注✅! 本文由 秩沅 原创 &#x1f636;‍&#x1f32b;️收录于专栏&#xff1a;unity细节和bug &#x1f636;‍&#x1f32b;️优质专栏 ⭐【…

【Linux】关系运算符、shell判断脚本执行时是否有传参、判断文件/文件夹是否存在、判断字符串是否相等、判断上个命令执行是否正常、判断字符串是否为空

&#x1f984; 个人主页——&#x1f390;个人主页 &#x1f390;✨&#x1f341; &#x1fa81;&#x1f341;&#x1fa81;&#x1f341;&#x1fa81;&#x1f341;&#x1fa81;&#x1f341; 感谢点赞和关注 &#xff0c;每天进步一点点&#xff01;加油&#xff01;&…

做外贸的你崩溃过吗

某日&#xff0c;孔先生问我&#xff0c; 前几天的那个单子怎么样了&#xff1f;看你一会找工厂拍照片&#xff0c;一会找办公室录制视频&#xff0c;半夜还在拿着手机和客户打电话&#xff0c;现在怎么也不提这个进展了&#xff0c;我回答道&#xff1a;黄了。 此时孔先生一股…

静态链表的结构设计与主要操作功能的实现(初始化,头插,尾插,判空,删除,输出,清空,销毁)

目录 一.静态链表的结构设计 二.静态链表的结构设计示意图 三.静态链表的实现 四.静态链表的总结 一.静态链表的结构设计 typedef struct SNode {int data;//数据int next;//后继指针(下标) }SNode,SLinkList[MAXSIZE]; 二.静态链表的结构设计示意图 0:有效数据链的头节点;…

2016年8月15日 Go生态洞察:Go 1.7版本发布

&#x1f337;&#x1f341; 博主猫头虎&#xff08;&#x1f405;&#x1f43e;&#xff09;带您 Go to New World✨&#x1f341; &#x1f984; 博客首页——&#x1f405;&#x1f43e;猫头虎的博客&#x1f390; &#x1f433; 《面试题大全专栏》 &#x1f995; 文章图文…

基于C#实现Dijkstra算法

或许在生活中&#xff0c;经常会碰到针对某一个问题&#xff0c;在众多的限制条件下&#xff0c;如何去寻找一个最优解&#xff1f;可能大家想到了很多诸如“线性规划”&#xff0c;“动态规划”这些经典策略&#xff0c;当然有的问题我们可以用贪心来寻求整体最优解&#xff0…

悦榕集团以养修概念持续引领健康出行

诚邀宾客共赴身、心、灵的回归之旅 【2023年11月22日&#xff0c;中国&#xff0c;上海】作为全球领先的独立酒店集团&#xff0c;悦榕集团一直以来始终秉承可持续发展理念&#xff0c;为宾客打造多层次的身、心、灵平衡之旅。出于对当代人生活方式变化和旅行需求的敏锐洞察&am…

2024免费MacBook清理工具CleanMyMac X4.15

CleanMyMac X 是一款专业的Mac清理软件&#xff0c;可智能清理mac磁盘垃圾和多余语言安装包&#xff0c;快速释放电脑内存&#xff0c;轻松管理和升级 Mac 上的应用。同时 CleanMyMac X 可以强力卸载恶意软件&#xff0c;修复系统漏洞&#xff0c;一键扫描和优化 Mac 系统&…

关闭vscode打开的本地服务器端口

vscode开了本地的一个端口“8443”当本地服务器端口&#xff0c;然后随手把VScode一关&#xff0c;后来继续做发现8443端口已经被占用了。   原来&#xff0c;即便关闭了编译器VScode&#xff0c;服务器依然是被node.exe运行着的。那这个端口怎么才能关掉呢&#xff1f;   …

首个央企量子云计算项目,中标!

6月29日&#xff0c;北京玻色量子科技有限公司&#xff08;简称“玻色量子”&#xff09;成功中标中国移动云能力中心“2023—2024年量子算法及光量子算力接入关键技术研究项目”&#xff0c;这是玻色量子继与移动云签订“五岳量子云计算创新加速计划”后&#x1f517;&#xf…

云计算时代来临,传统运维怎样做才能不被“杀死”?

据Forrester Research的数据显示&#xff0c;2021年全球公有云基础设施市场将增长35%&#xff0c;达到1200亿美元&#xff0c;云计算将继续在疫情复苏的过程中“占据中心位置”。 全球用于云计算的IT支出占比将持续增长&#xff0c;企业对于云计算开发人才需求紧迫&#xff0c…

甄知燕千云与SAP、EBS、TC、NS等应用深度集成,智能提单一键畅通,效能一键提升

当今全球化时代下&#xff0c;全球商业环境面临前所未有的机遇和挑战&#xff0c;企业需要持续的业务变革、组织优化来进行降本增效&#xff0c;企业管理软件已成为中小企业、大型企业数字化转型不可或缺的管理工具&#xff0c;企业内管理软件系统也越来越多。 为了适应当前企业…