如何使用COCO数据集，注意事项

news2025/2/25 21:01:13

COCO数据集可用来训练目标检测，分类，实例分割等。

下面简单说下如何使用这个数据集，
数据集下载可用如下的代码进行，以2017为例。

# Download the image data.
cd ./images
echo "Downloading MSCOCO train images ..."
curl -LO http://images.cocodataset.org/zips/train2017.zip
echo "Downloading MSCOCO val images ..."
curl -LO http://images.cocodataset.org/zips/val2017.zip

cd ../
if [ ! -d annotations ]
  then
    mkdir -p ./annotations
fi

# Download the annotation data.
cd ./annotations
echo "Downloading MSCOCO train/val annotations ..."
curl -LO http://images.cocodataset.org/annotations/annotations_trainval2017.zip
echo "Finished downloading. Now extracting ..."

# Unzip data
echo "Extracting train images ..."
unzip -qqjd ../images ../images/train2017.zip
echo "Extracting val images ..."
unzip -qqjd ../images ../images/val2017.zip
echo "Extracting annotations ..."
unzip -qqd .. ./annotations_trainval2017.zip

会得到这2个文件夹

在这里插入图片描述
annotation文件夹下有各种json文件，记下它们的路径，后面要用到。

下面说下数据集处理的主要流程，主要用COCO API来提取数据，这个API是conda自带的，不需要另外安装
用的时候import就行了

from pycocotools.coco import COCO

刚才说了annotation文件夹下有json文件，假设这个json文件的路径为ann_path(包括json)
把这个路径传给COCO API

self.coco_api = COCO(ann_path)

然后你就能用这个coco_api提取各种数据，
比如提取类别转为label

self.cat_ids = sorted(self.coco_api.getCatIds())
self.cat2label = {cat_id: i for i, cat_id in enumerate(self.cat_ids)}

提取类别的名称

self.cats = self.coco_api.loadCats(self.cat_ids)
self.class_names = [cat["name"] for cat in self.cats]

提取img_id, 每个id对应一个img文件名，还能根据这个img_id提取对应的annotation,
把img_id对应的信息全部放到img_info里面

self.img_ids = sorted(self.coco_api.imgs.keys()) #这个是全部图片的img_id,需要的时候根据idx提取一个
img_info = self.coco_api.loadImgs(self.img_ids)

所以后面能用img_info干什么呢。
从pytorch的__getitem__函数说起吧，
getitem函数会传入一个idx, 提取这个idx对应的图片和annotation.

我们可以根据上面得到的全部图片的img_id, 也就是self.img_ids, 得到idx对应的img_id
再根据这个img_id得到这个图片对应的annotation

img_id = self.img_ids[idx]
ann_ids = self.coco_api.getAnnIds([img_id])
anns = self.coco_api.loadAnns(ann_ids)

得到图片的文件名，于是可以读取图片

file_name = self.coco_api.loadImgs(img_id)[0]['file_name']
if file_name.startswith('COCO'):
    file_name = file_name.split('_')[-1]

path = osp.join(self.root, file_name)
assert osp.exists(path), 'Image path does not exist: {}'.format(path)

img = cv2.imread(path)

下面说如何获得目标框，类别和分割mask
一个图片可能对应多个目标框，所以一个img_id提取出的anns里面可能有多个annotation,
可以用一个循环把它们读出来

for ann in anns:
    if ann.get("ignore", False):
        continue
    x1, y1, w, h = ann["bbox"]  #目标框是(x,y,w,h)形式
    if ann["area"] <= 0 or w < 1 or h < 1:
        continue
    if ann["category_id"] not in self.cat_ids:
        continue
    bbox = [x1, y1, x1 + w, y1 + h]  #有需要的话转为(x1,y1,x2,y2)形式
    
    gt_bboxes.append(bbox)
    gt_labels.append(self.cat2label[ann["category_id"]])
    
    gt_masks.append(self.coco_api.annToMask(ann).reshape(-1))  #提取分割mask
    
    gt_keypoints.append(ann["keypoints"])  #提取特征点

需要注意的是一个图片里面可能没有目标框，也就是annotation为[ ],
这种情况就不要把它当作训练图片了，需要重新采样，
所以需要这样一个流程

    def __getitem__(self, idx):
        
        while True:
              data = self.get_train_data(idx) #annotation为空时要返回None
              if data is None:
                  idx = self.get_another_id()
                  continue
              return data

    def get_another_id(self):
        return np.random.random_integers(0, len(self.data_info) - 1)