Yolo v11目标检测实战1：对象分割和人流跟踪（附源码）

news2025/7/7 16:02:52

一、运行效果演示

多目标跟踪

二、基本理论和核心概念

2.1 对象分割

对象分割是指将图像中的每个像素标记为属于某一特定对象或背景的过程。对于YOLO来说，对象分割是其功能的一个扩展，通过添加额外的分支来预测每个检测框内的像素级掩码，从而实现对检测到的对象进行精确分割。这种能力使得YOLO能够提供比单纯边界框更精细的结果，有助于更好地理解场景，并支持需要精确轮廓的应用，如医学影像分析、自动驾驶等。

在YOLO11中，对象分割通常依赖于一些先进的技术，例如注意力机制来聚焦于重要的区域，以及轻量化的网络设计以保持高性能的同时减少计算开销。这使得YOLO能够在保持高效率的同时，也具备了强大的对象分割能力。

2.1 多目标跟踪

多目标跟踪
多目标跟踪是在视频序列中持续地识别并跟随多个移动物体的技术。它结合了目标检测与运动估计的方法，在连续帧之间维持同一对象的身份信息。YOLO通过集成像DeepSORT这样的跟踪算法，可以在检测的基础上进一步实现实时的多目标跟踪。该过程一般包括以下几个步骤：

初始化：利用YOLO模型检测当前帧中的所有感兴趣目标。
特征提取：从检测到的目标中提取特征向量，这些特征可以是外观信息（如颜色直方图）或是深层神经网络学到的表示。
数据关联：基于前一帧的状态预测当前帧中目标的位置，并使用某种距离度量方法（比如马氏距离）将新检测到的目标与已有轨迹相关联。
更新轨迹：一旦建立了正确的匹配关系，则根据新的观测结果更新相应轨迹的状态信息；对于未能成功匹配的新检测结果，则可能意味着出现了新的目标，需创建新的轨迹。

三、环境部署

3.1 安装环境要求

在开始之前，请确保你的系统满足以下基本要求：

操作系统：Windows 10/11
Python 版本：Python 3.10
PyTorch： PyTorch 2.4
CUDA 和 cuDNN（可选但推荐）：11.8
其他依赖库：OpenCV, NumPy, Pillow 等

显卡：RTX3090

3.20安装步骤

以下是详细的安装步骤，假设你已经安装了 Python 和 pip。

1. 创建虚拟环境（推荐但非必须）：
为了保持项目的独立性，建议使用 virtualenv 或 venv 创建一个 Python 虚拟环境。

python3 -m venv yolov11_env
source yolov11_env/bin/activate # 在 Windows 上使用 `yolov11_env\Scripts\activate`

2. 安装 PyTorch：
根据你的系统配置选择合适的 PyTorch 版本。你可以从 PyTorch 官方网站获取最新的安装命令。如果你有 CUDA 支持，可以使用如下命令安装 PyTorch 和 torchvision：

pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113

如果没有 CUDA 支持，可以使用 CPU 版本：

pip install torch torchvision torchaudio

3. 克隆 YOLO11 仓库：
假设 YOLO11 已经开源并且托管在 GitHub 上，你可以通过以下命令克隆仓库到本地。

git clone ultralytics/ultralytics: Ultralytics YOLO11 🚀 (github.com)
cd yolov11

4.安装依赖库：
通常，项目会提供一个 requirements.txt 文件来列出所有必需的依赖库。你可以使用以下命令来安装这些依赖库。

pip install -r requirements.txt

四、核心部分源码：

from collections import defaultdict

import cv2

from ultralytics import YOLO
from ultralytics.utils.plotting import Annotator, colors

# Dictionary to store tracking history with default empty lists
track_history = defaultdict(lambda: [])

# Load the YOLO model with segmentation capabilities
model = YOLO("../checkpoints/yolo11n-seg.pt")

# Open the video file
cap = cv2.VideoCapture("../inputs/007.mp4")

# Retrieve video properties: width, height, and frames per second
w, h, fps = (int(cap.get(x)) for x in (cv2.CAP_PROP_FRAME_WIDTH, cv2.CAP_PROP_FRAME_HEIGHT, cv2.CAP_PROP_FPS))

# Initialize video writer to save the output video with the specified properties
out = cv2.VideoWriter("../outputs/seg-tracking.avi", cv2.VideoWriter_fourcc(*"MJPG"), fps, (w, h))

while True:
    # Read a frame from the video
    ret, im0 = cap.read()
    if not ret:
        print("Video frame is empty or video processing has been successfully completed.")
        break

    # Create an annotator object to draw on the frame
    annotator = Annotator(im0, line_width=2)

    # Perform object tracking on the current frame
    results = model.track(im0, persist=True)

    # Filter out only people (class_id 0) and cars (class_id 2)
    valid_ids = [0, 2]  # Person and Car
    filtered_masks = []
    filtered_track_ids = []
    person_count = 0
    car_count = 0

    if results[0].boxes.id is not None and results[0].masks is not None:
        masks = results[0].masks.xy
        track_ids = results[0].boxes.id.int().cpu().tolist()
        class_ids = results[0].boxes.cls.int().cpu().tolist()

        # Filter by class ID and count objects
        for mask, track_id, class_id in zip(masks, track_ids, class_ids):
            if class_id == 0:  # Person
                person_count += 1
                filtered_masks.append(mask)
                filtered_track_ids.append(track_id)
            elif class_id == 2:  # Car
                car_count += 1
                filtered_masks.append(mask)
                filtered_track_ids.append(track_id)

        # Annotate each mask with its corresponding tracking ID and color
        for mask, track_id in zip(filtered_masks, filtered_track_ids):
            annotator.seg_bbox(mask=mask, mask_color=colors(track_id, True), label=str(track_id))

    # Draw the counts of people and cars on the top right corner
    text = f"Person: {person_count} | Car: {car_count}"
    font = cv2.FONT_HERSHEY_SIMPLEX
    org = (w - 400, 50)  # Position at the top right corner
    font_scale = 1
    color = (255, 255, 255)  # White color
    thickness = 2
    cv2.putText(im0, text, org, font, font_scale, color, thickness, cv2.LINE_AA)

    # Write the annotated frame to the output video
    out.write(im0)
    # Display the annotated frame
    cv2.imshow("seg-tracking", im0)

    # Exit the loop if 'q' is pressed
    if cv2.waitKey(1) & 0xFF == ord("q"):
        break

# Release resources
cap.release()
out.release()
cv2.destroyAllWindows()