COCO Dataset Format

news2024/11/19 12:39:30

COCO (Common Objects in Context) dataset数据集是一个广泛应用于目标检测、语义分割的数据集,包含330K 图片数据 与 2.5 million 个目标实体。

1.数据集下载

!wget http://images.cocodataset.org/zips/train2017.zip -O coco_train2017.zip
!wget http://images.cocodataset.org/zips/val2017.zip -O coco_val2017.zip
!wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip -O coco_ann2017.zip

 2.数据集解压缩

from zipfile import ZipFile, BadZipFile
import os

def extract_zip_file(extract_path):
    try:
        with ZipFile(extract_path+".zip") as zfile:
            zfile.extractall(extract_path)

        # remove zipfile
        zfileTOremove=f"{extract_path}"+".zip"
        if os.path.isfile(zfileTOremove):
            os.remove(zfileTOremove)
        else:
            print("Error: %s file not found" % zfileTOremove)    

    except BadZipFile as e:
        print("Error:", e)


extract_train_path = "./coco_train2017"
extract_val_path = "./coco_val2017"
extract_ann_path="./coco_ann2017"

extract_zip_file(extract_train_path)
extract_zip_file(extract_val_path)
extract_zip_file(extract_ann_path)

解压缩后,coco_train2017 与 coco_val2017文件夹下包含子文件夹train2017与val2017,各自包含有图片数据.

而coco_ann2017 子文件夹下有6个JSON 格式annotation 文件位于子文件夹annotations.

                ​​​​​​​        ​​​​​​​         

3.数据集格式

随意调整一个format格式文件,说明如下:

{   
    "info": 
    { "description": "COCO 2017 Dataset",
        "url": "http://cocodataset.org",
        "version": "1.0",
        "year": 2017,
        "contributor": "COCO Consortium",
        "date_created": "2017/09/01"
    },
    "licenses": 
    [
        {"url": "http://creativecommons.org/licenses/by-nc-sa/2.0/","id": 1,"name": "Attribution-NonCommercial-ShareAlike License"},
        {"url": "http://creativecommons.org/licenses/by-nc/2.0/","id": 2,"name": "Attribution-NonCommercial License"},
        ...,
        ...
    ],
        
    "images":     
    [
        {"license": 4,"file_name": "000000397133.jpg","coco_url": "http://images.cocodataset.org/val2017/000000397133.jpg","height": 427,"width": 640,"date_captured": "2013-11-14 17:02:52","flickr_url": "http://farm7.staticflickr.com/6116/6255196340_da26cf2c9e_z.jpg","id": 397133},
        {"license": 1,"file_name": "000000037777.jpg","coco_url": "http://images.cocodataset.org/val2017/000000037777.jpg","height": 230,"width": 352,"date_captured": "2013-11-14 20:55:31","flickr_url": "http://farm9.staticflickr.com/8429/7839199426_f6d48aa585_z.jpg","id": 37777},
        ...,
        ...
    ],
    "annotations": 
    [
        {"segmentation": [[510.66,423.01,511.72,...,...]],"area": 702.1057499999998,"iscrowd": 0,"image_id": 289343,"bbox": [473.07,395.93,38.65,28.67],"category_id": 18,"id": 1768},
        {"segmentation": [[289.74,443.39,302.29,...,...]],"area": 27718.476299999995,"iscrowd": 0,"image_id": 61471,"bbox": [272.1,200.23,151.97,279.77],"category_id": 18,"id": 1773}
        ...,
        ...
    ],
    "categories": 
    [
        {"supercategory": "person","id": 1,"name": "person"},
        {"supercategory": "vehicle","id": 2,"name": "bicycle"},
        ...,
        ...
    ]
}

info:

The “info” component provides metadata about the COCO dataset, including the version number, the date it was created, and the contact information for the creators of the dataset.

"info": 
{ "description": "COCO 2017 Dataset",
    "url": "http://cocodataset.org",
    "version": "1.0",
    "year": 2017,
    "contributor": "COCO Consortium",
    "date_created": "2017/09/01"
}

licenses

  • "license": an integer value indicating the license type of the image. This value corresponds to the license "id" in the "licenses" component.
  • "file_name": a string containing the name of the image file.
  • "coco_url": a string containing a URL to the image on the COCO website.
  • "height": an integer value representing the height of the image in pixels.
  • "width": an integer value representing the width of the image in pixels.
  • "date_captured": a string representing the date and time that the image was captured.
  • "flickr_url": a string containing a URL to the image on Flickr.
  • "id": a unique identifier for the image, as an integer value.
"licenses": 
[
    {
        "url": "http://creativecommons.org/licenses/by-nc-sa/2.0/",
        "id": 1,
        "name": "Attribution-NonCommercial-ShareAlike License"
    },
    ...,
    ...,
    ...
]

images

  • "license": an integer value indicating the license type of the image. This value corresponds to the license "id" in the "licenses" component.
  • "file_name": a string containing the name of the image file.
  • "coco_url": a string containing a URL to the image on the COCO website.
  • "height": an integer value representing the height of the image in pixels.
  • "width": an integer value representing the width of the image in pixels.
  • "date_captured": a string representing the date and time that the image was captured.
  • "flickr_url": a string containing a URL to the image on Flickr.
  • "id": a unique identifier for the image, as an integer value.
"images":     
[
    {
        "license": 4,
        "file_name": "000000397133.jpg",
        "coco_url": "http://images.cocodataset.org/val2017/000000397133.jpg",
        "height": 427,
        "width": 640,
        "date_captured": "2013-11-14 17:02:52",
        "flickr_url": "http://farm7.staticflickr.com/6116/6255196340_da26cf2c9e_z.jpg",
        "id": 397133
    },
    ...,
    ...,
    ...       
]

annotations

  • "segmentation": The “segmentation” key in an annotation dictionary holds a list of floating point numbers that represent the pixel coordinates of an object’s segmentation mask. They can be used to plot the segmentation mask of the object on an image. To plot the mask, we need to take pairs of numbers (the first and second value, then the third and fourth, etc.) and use them as the x and y coordinates of the pixels.
  • "area": a floating point value indicating the area of the object in segmentation mask in pixels squared.
  • "iscrowd": a binary integer value indicating whether the object is part of a crowd (1) or not (0).
  • "image_id": an integer value that is a unique identifier for the image in which the object appears. This "image_id" corresponds to the "id" in "image" component.
  • "bbox": a list of four floating point values representing the bounding box of the object in the format [x, y, width, height].
  • "category_id": an integer value indicating the category or class of the object.
  • "id": a unique identifier for the annotation across the entire COCO dataset, as an integer value.
"annotations": 
[
    {
        "segmentation": [[510.66,423.01,511.72,...,...,...]],
        "area": 702.1057499999998,
        "iscrowd": 0,
        "image_id": 289343,
        "bbox": [473.07,395.93,38.65,28.67],
        "category_id": 18,
        "id": 1768
    },
    ...,
    ...,
    ...
]

categories

  • “supercategory”: a string indicating the supercategory or super class of an object. For example, in the second dictionary, “vehicle” is the supercategory of the bicycle.
  • “id”: a unique identifier for identifying the category of an object , as an integer value.
  • “name”: a string that represents the name of the category.
"categories": 
[
    {
        "supercategory": "person",
        "id": 1,
        "name": "person"
    },
    {
        "supercategory": "vehicle",
        "id": 2,
        "name": "bicycle"
    },
    ...,
    ...,
    ...
]

COCO dataset 总共定义了91 类目标,但其中只有 80类用到了.

 4.数据集解析

COCO dataset 以JSON 文件格式存储annotations,借助下属class可完成解析。

from collections import defaultdict
import json
import numpy as np
class COCOParser:
    def __init__(self, anns_file, imgs_dir):
        with open(anns_file, 'r') as f:
            coco = json.load(f)
            
        self.annIm_dict = defaultdict(list)        
        self.cat_dict = {} 
        self.annId_dict = {}
        self.im_dict = {}
        self.licenses_dict = {}
        for ann in coco['annotations']:           
            self.annIm_dict[ann['image_id']].append(ann) 
            self.annId_dict[ann['id']]=ann
        for img in coco['images']:
            self.im_dict[img['id']] = img
        for cat in coco['categories']:
            self.cat_dict[cat['id']] = cat
        for license in coco['licenses']:
            self.licenses_dict[license['id']] = license
    def get_imgIds(self):
        return list(self.im_dict.keys())
    def get_annIds(self, im_ids):
        im_ids=im_ids if isinstance(im_ids, list) else [im_ids]
        return [ann['id'] for im_id in im_ids for ann in self.annIm_dict[im_id]]
    def load_anns(self, ann_ids):
        im_ids=ann_ids if isinstance(ann_ids, list) else [ann_ids]
        return [self.annId_dict[ann_id] for ann_id in ann_ids]        
    def load_cats(self, class_ids):
        class_ids=class_ids if isinstance(class_ids, list) else [class_ids]
        return [self.cat_dict[class_id] for class_id in class_ids]
    def get_imgLicenses(self,im_ids):
        im_ids=im_ids if isinstance(im_ids, list) else [im_ids]
        lic_ids = [self.im_dict[im_id]["license"] for im_id in im_ids]
        return [self.licenses_dict[lic_id] for lic_id in lic_ids]

if __name__ == "__main__"
    coco_annotations_file="/content/coco_ann2017/annotations/instances_val2017.json"
    coco_images_dir="/content/coco_val2017/val2017"
    coco= COCOParser(coco_annotations_file, coco_images_dir)

5.数据可视化

import matplotlib.pyplot as plt
from PIL import Image
import numpy as np
# define a list of colors for drawing bounding boxes
color_list = ["pink", "red", "teal", "blue", "orange", "yellow", "black", "magenta","green","aqua"]*10
num_imgs_to_disp = 4
total_images = len(coco.get_imgIds()) # total number of images
sel_im_idxs = np.random.permutation(total_images)[:num_imgs_to_disp]
img_ids = coco.get_imgIds()
selected_img_ids = [img_ids[i] for i in sel_im_idxs]
ann_ids = coco.get_annIds(selected_img_ids)
im_licenses = coco.get_imgLicenses(selected_img_ids)
fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(15,10))
ax = ax.ravel()
for i, im in enumerate(selected_img_ids):
    image = Image.open(f"{coco_images_dir}/{str(im).zfill(12)}.jpg")
    ann_ids = coco.get_annIds(im)
    annotations = coco.load_anns(ann_ids)
    for ann in annotations:
        bbox = ann['bbox']
        x, y, w, h = [int(b) for b in bbox]
        class_id = ann["category_id"]
        class_name = coco.load_cats(class_id)[0]["name"]
        license = coco.get_imgLicenses(im)[0]["name"]
        color_ = color_list[class_id]
        rect = plt.Rectangle((x, y), w, h, linewidth=2, edgecolor=color_, facecolor='none')
        t_box=ax[i].text(x, y, class_name,  color='red', fontsize=10)
        t_box.set_bbox(dict(boxstyle='square, pad=0',facecolor='white', alpha=0.6, edgecolor='blue'))
        ax[i].add_patch(rect)
    
    ax[i].axis('off')
    ax[i].imshow(image)
    ax[i].set_xlabel('Longitude')
    ax[i].set_title(f"License: {license}")
plt.tight_layout()
plt.show()

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1358717.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

宝塔面板yum安装指南

1、执行 yum install -y wget && wget -O install.sh https://download.bt.cn/install/install_6.0.sh && sh install.sh ed8484bec2、QA 提示抱歉,连接宝塔官网失败,请切换节点后重试服务器终端 分别执行这2条命令 mv /www/server/pa…

1.4 day4 IO进程线程

使用两个子进程进行文件拷贝&#xff0c;父进程进行资源回收 #include <myhead.h> int main(int argc, const char *argv[]) {//创建一个文件描述符并以只读的方式打开int fd-1;if((fdopen("./test.bmp",O_RDONLY))-1){perror("open error");return…

私有云平台搭建openstack和ceph结合搭建手册

OpenStack与云计算 什么是云&#xff1f; 如何正确理解云&#xff0c;可以从以下几个方面。 云的构成。 用户&#xff1a;对用户而言是透明无感知的&#xff0c;不用关心底层构成&#xff0c;只需要知道利用云完成自己任务即可。 云提供商&#xff1a;对云资产管理和运维。 云…

高并发下的计数器实现方式:AtomicLong、LongAdder、LongAccumulator

一、前言 计数器是并发编程中非常常见的一个需求&#xff0c;例如统计网站的访问量、计算某个操作的执行次数等等。在高并发场景下&#xff0c;如何实现一个线程安全的计数器是一个比较有挑战性的问题。本文将介绍几种常用的计数器实现方式&#xff0c;包括AtomicLong、LongAd…

工业城市的废水监控系统

前言 很多工业城市的废水排放量较大&#xff0c;已造成城市地表水的严重污染。各城市的环境监测中心站肩负着对城市地表环境水质及污染源排放废水的监测工作&#xff0c;很多城市相继形成了以市站为网头&#xff0c;与区站、行业站构成一体的废水监测网。 为提高水质监测能力建…

【Java】RuoYi-Vue-Plus 多数据源整合TDengine时序数据库——服务端自动建库建表

目录 环境准备整合TDengine 数据源1. 添加驱动依赖2. 添加数据源配置3. 添加Mapper4. 添加建表sql脚本5. Controller 测试效果 环境准备 RuoYi-Vue-Plus v5.1.2JDK17Maven 3.6.3Redis 5.XMySQL 5.7TDengine 2.6.0.34 客户端 整合TDengine 数据源 1. 添加驱动依赖 注意&…

redis安装与配置

目录 1. 切换到 root 用户 2. 搜索安装包 3. 安装 redis 4. 查看 redis 是否正常存在 5. 修改ip 6. 重新启动服务器 7. 连接服务器 1. 切换到 root 用户 通过 su 命令切换到 root 用户。 2. 搜索安装包 apt search redis 这里安装的是下面的版本&#xff1a; 3. 安装 …

Python接口自动化 —— 什么是接口测试、为什么要做接口测试(详解)

什么是接口测试 接口测试是测试系统组件间接口的一种测试。接口测试主要用于检测外部系统与系统之间以及内部各个子系统之间的交互点。测试的重点是要检查数据的交换&#xff0c;传递和控制管理过程&#xff0c;以及系统间的相互逻辑依赖关系等。  一般来说&#xff0c;测试接…

pod进阶版(2)

startupProbe启动探针 如果探测失败,pod的状态是notready&#xff0c;启动探针会重启容器 启动探针没有成功之前&#xff0c;后续的探针都不会执行。启动探针成功之后&#xff0c;在pod的后续生命周期不会用启动探针 exec方式 正确示范 apiVersion: v1 kind: Pod metadata…

linux 使用log4cpp记录项目日志

为什么要用log4cpp记录项目日志 在通常情况下&#xff0c;Linux/UNIX 每个程序在开始运行的时刻&#xff0c;都会打开 3 个已经打开的 stream. 分别用来输入&#xff0c;输出&#xff0c;打印错误信息。通常他们会被连接到用户终端。这 3 个句柄的类型为指向 FILE 的指针。可以…

MobaXterm SSH 免密登录配置

文章目录 1.简介2.SSH 免密登录配置第一步&#xff1a;点击 Session第二步&#xff1a;选择 SSH第三步&#xff1a;输入服务器地址与用户名第四步&#xff1a;设置会话名称第五步&#xff1a;点击 OK 并输入密码 3.密码管理4.小结参考文献 1.简介 MobaXterm 是一个功能强大的终…

shell,对输出的结果去掉空格和换行符号,grep忽略特定字符

对原始的执行命令&#xff0c;直接后面加 |tr -d \n | tr -d |tr -d \n 用来去除换行符 |tr -d 用来出去空格 grep去除特定的字符的行&#xff0c;直接 -v&#xff0c;后接字符。比如&#xff1a; 现在设法去掉含有"ini"的行&#xff0c;执行&#xff1a; …

对象克隆学习

假如说你想复制一个简单变量。很简单&#xff1a; int apples 5; int pears apples; 不仅仅是int类型&#xff0c;其它七种原始数据类型(boolean,char,byte,short,float,double.long)同样适用于该类情况。 但是如果你复制的是一个对象&#xff0c;情况就有些复杂了。 …

美洽获评2023年度“最佳数字化服务商”,产品优势赋能企业智能转型

日前&#xff0c;由知名学习交流平台人人都是产品经理举办的“2023年度评选”活动圆满落幕&#xff0c;美洽凭借在企业服务领域的技术实力与优秀实践成果脱颖而出&#xff0c;入围年度产品评选榜单&#xff0c;获评2023年度“最佳数字化服务商”。 人人都是产品经理成立于2010年…

邮件营销发件人名称设置指南:提升邮件信誉度与点击率的技巧

正如品牌名称之于产品&#xff0c;发件人名称之于电子邮件。它有助于提高识别度、熟悉度和清晰度。一封邮件的发件人名称可以是话题的开端&#xff0c;也可以是邮件内容的破坏者。 关于电子邮件的发件人名称&#xff0c;你可能经常不得不思考两件事。 我应该用什么名字&#…

怎么分解一张二维码图片?二维码解码在线处理技巧

相信对于制作二维码很多小伙伴还知道怎么操作&#xff0c;那么分解二维码图片的方法有知道的吗&#xff1f;二维码解码也是在日常生活中经常会需要使用的一个功能&#xff0c;当我们需要将一张图片上的二维码分解使用时&#xff0c;那么最方便快捷的方法就是使用二维码解码器工…

【S32K 进阶之旅】 NXP S32K3 以太网 RMII 接口调试(1)

前言 大联大世平集团推出了一款基于 NXP 车规级 MCU S32K344 的开发板——花名“Cavalry”&#xff0c;它使用 BGA257 封装的 32 位 ArmCortex-M7 S32K344 作为主控芯片&#xff0c;在69.6*130mm 的小体积开发板上搭载了 SBC 电源管理芯片、CAN 收发器、LIN 收发器、FLASH 存储…

Rhinoceros 8.2(犀牛8.2)安装教程

Rhinoceros 8.2下载链接&#xff1a;https://docs.qq.com/doc/DUmhMYmNyUGl6ZVpU 1.选中下载的压缩包&#xff0c;右键解压到“Rhinoceros 8.2”文件夹 2.选中“rhino_en-us_8.2.23346.13001.exe”右键以管理员身份运行 3.点击“小齿轮“ 4.选择安装路径&#xff0c;取消勾选“…

计算机视觉与美颜SDK:详解人脸美型功能的实现过程

众所周知。人脸美型功能作为美颜技术的一项关键特性&#xff0c;对于用户塑造理想形象具有重要意义。本文将深入探讨计算机视觉与美颜SDK中人脸美型功能的实现过程。 一、关键点定位 关键点定位是人脸美型的关键步骤之一。深度学习方法在关键点定位方面取得了巨大的成功&…

已知输入图像大小为n、卷积核大小为f、卷积步长s,填充大小为p,求解输出图像大小。

问题描述&#xff1a;已知输入图像大小为n、卷积核大小为f、卷积步长s&#xff0c;填充大小为p&#xff0c;求解输出图像大小。 问题解答&#xff1a; 输出图像的大小可以使用以下的计算公式确定&#xff1a; 为了举例说明&#xff0c;假设有以下参数&#xff1a; 输入图像大…