【数据集处理工具】根据COCO数据集的json标注文件实现训练与图像的文件划分

news2026/4/2 12:14:16

根据COCO数据集的json标注文件实现训练与图像的文件划分

- 一、适用场景：
- 二、COCO数据集简介：
- 三、场景细化：
- 四、代码优势：
- 五、代码

在这里插入图片描述

一、适用场景：

适用于一个常见的计算机视觉项目应用场景，特别是当涉及到使用标注过的图像数据集时。具体而言，这段代码主要用于从一个大型的、未分类的图像存储库中，依据标注文件（COCO JSON格式）来筛选并整理出特定子集的图像文件。

二、COCO数据集简介：

COCO（Common Objects in Context）数据集是一种广泛使用的图像数据集，它不仅包含了丰富的图像资源，还提供了详尽的标注信息，包括物体检测、分割、关键点定位等。COCO JSON文件是一种结构化的数据格式，用于存储关于图像的元数据和标注细节，例如图像ID、文件名、图像尺寸以及各种标注信息。

三、场景细化：

当你在进行深度学习模型训练、测试或验证时，可能需要从原始图像库中提取出符合特定条件的图像，以便构建定制化的数据子集。这时，上述代码就显得尤为重要。它通过解析COCO JSON文件中的images部分，获取每张图像的文件名，然后从原始图像目录中找到对应文件，并将其复制到目标目录下，从而实现了图像数据的自动整理和分类。

四、代码优势：

这个代码过程极大地简化了数据预处理工作，使得研究人员和开发者能够更高效地准备训练数据，避免了手动查找和移动文件的繁琐步骤。同时，这样的自动化处理也减少了人为错误，确保了数据集的一致性和完整性，对后续的机器学习模型训练具有重要的支撑作用。

总之，这段代码是数据科学和机器学习项目中数据管理环节的一个实用工具，有助于加速研究和开发流程，提高数据处理效率。

五、代码

import json
import os
import shutil

def copy_images_from_coco_json(json_path, images_dir, target_dir):
    # 确保目标目录存在
    if not os.path.exists(target_dir):
        os.makedirs(target_dir)

    try:
        # 读取COCO JSON文件
        with open(json_path, 'r') as f:
            coco_data = json.load(f)
    except Exception as e:
        print(f"Error reading JSON file: {e}")
        return

    # 遍历所有图片
    for image_info in coco_data['images']:
        image_file_name = image_info['file_name']
        source_image_path = os.path.join(images_dir, image_file_name)
        target_image_path = os.path.join(target_dir, image_file_name)

        try:
            # 检查目标文件是否已存在
            if os.path.exists(target_image_path):
                print(f"File {image_file_name} already exists in {target_dir}. Skipping.")
            else:
                # 复制图片到目标目录
                shutil.copy(source_image_path, target_image_path)
                print(f"Image {image_file_name} copied to {target_dir}")
        except FileNotFoundError:
            print(f"Source image {image_file_name} not found.")
        except Exception as e:
            print(f"Error copying image {image_file_name}: {e}")

# 使用函数
# data/coco1/annotations/
json_path = './coco1/annotations/instances_val2017.json'
images_dir = './coco1/images'
target_dir = './coco1/target/val2017'

copy_images_from_coco_json(json_path, images_dir, target_dir)