paper:YOLOv4: Optimal Speed and Accuracy of Object Detection
mosaic data augmentation最早是在YOLO v4的文章中提出的,但其实在ultralytics-YOLOv3中就已经实现了。具体就是将4张样本拼接成一张图,具有以下优点:(1)丰富一张图上的信息(2)增强后一张图上包含四张图的信息,减少了对大batch_size的依赖(3)通常小目标的检测效果要比大目标差,将四张图放到 一张图中,相当于变相扩充了数据集中小目标的样本数量。
下面是YOLOv4 paper中给出的一些mosaic增强的示例图
下面以mmdetection中的实现为例,介绍一下具体实现
在mmdet中要使用Mosaic,需要同时使用MultiImageMixDataset。原本results字典中保存的是一张图的相关信息包括img、gt_bboxes、gt_labels等,在MultiImageMixDataset类中调用Mosaic类中的get_indexes方法,随机再挑出其它三张图的索引。然后将这3张图的信息放到列表中作为key 'mix_results'的value加到原始的results中,这样results就包含了4张图的信息。
mosaic的具体实现在函数_mosaic_transform中,具体步骤如下:
- 创建一个两倍img_scale大小的空图mosaic_img,值为pad_val,默认为114。img_scale是config文件中预先设定的模型输入大小。
- 确定四张图的的交点,将空图分为左上、右上、右下、左下四块。
- 将原始图片保持宽高比例resize到模型输入大小img_scale。
- 将图片贴到mosaic_img中,四张图相交于中心点,对于超过mosaic_img范围的部分截断。
- 调整每张小图的gt_bboxes坐标。
完整代码如下
class Mosaic:
"""Mosaic augmentation.
Given 4 images, mosaic transform combines them into
one output image. The output image is composed of the parts from each sub-
image.
.. code:: text
mosaic transform
center_x
+------------------------------+
| pad | pad |
| +-----------+ |
| | | |
| | image1 |--------+ |
| | | | |
| | | image2 | |
center_y |----+-------------+-----------|
| | cropped | |
|pad | image3 | image4 |
| | | |
+----|-------------+-----------+
| |
+-------------+
The mosaic transform steps are as follows:
1. Choose the mosaic center as the intersections of 4 images
2. Get the left top image according to the index, and randomly
sample another 3 images from the custom dataset.
3. Sub image will be cropped if image is larger than mosaic patch
Args:
img_scale (Sequence[int]): Image size after mosaic pipeline of single
image. The shape order should be (height, width).
Default to (640, 640).
center_ratio_range (Sequence[float]): Center ratio range of mosaic
output. Default to (0.5, 1.5).
min_bbox_size (int | float): The minimum pixel for filtering
invalid bboxes after the mosaic pipeline. Default to 0.
bbox_clip_border (bool, optional): Whether to clip the objects outside
the border of the image. In some dataset like MOT17, the gt bboxes
are allowed to cross the border of images. Therefore, we don't
need to clip the gt bboxes in these cases. Defaults to True.
skip_filter (bool): Whether to skip filtering rules. If it
is True, the filter rule will not be applied, and the
`min_bbox_size` is invalid. Default to True.
pad_val (int): Pad value. Default to 114.
prob (float): Probability of applying this transformation.
Default to 1.0.
"""
def __init__(self,
img_scale=(640, 640),
center_ratio_range=(0.5, 1.5),
min_bbox_size=0,
bbox_clip_border=True,
skip_filter=True,
pad_val=114,
prob=1.0):
assert isinstance(img_scale, tuple)
assert 0 <= prob <= 1.0, 'The probability should be in range [0,1]. '\
f'got {prob}.'
log_img_scale(img_scale, skip_square=True)
self.img_scale = img_scale
self.center_ratio_range = center_ratio_range
self.min_bbox_size = min_bbox_size
self.bbox_clip_border = bbox_clip_border
self.skip_filter = skip_filter
self.pad_val = pad_val
self.prob = prob
def __call__(self, results):
"""Call function to make a mosaic of image.
Args:
results (dict): Result dict.
Returns:
dict: Result dict with mosaic transformed.
"""
if random.uniform(0, 1) > self.prob:
return results
results = self._mosaic_transform(results)
return results
def get_indexes(self, dataset):
"""Call function to collect indexes.
Args:
dataset (:obj:`MultiImageMixDataset`): The dataset.
Returns:
list: indexes.
"""
indexes = [random.randint(0, len(dataset)) for _ in range(3)]
return indexes
def _mosaic_transform(self, results):
"""Mosaic transform function.
Args:
results (dict): Result dict.
Returns:
dict: Updated result dict.
"""
assert 'mix_results' in results
mosaic_labels = []
mosaic_bboxes = []
if len(results['img'].shape) == 3:
mosaic_img = np.full(
(int(self.img_scale[0] * 2), int(self.img_scale[1] * 2), 3),
self.pad_val,
dtype=results['img'].dtype)
else:
mosaic_img = np.full(
(int(self.img_scale[0] * 2), int(self.img_scale[1] * 2)),
self.pad_val,
dtype=results['img'].dtype)
# mosaic center x, y
center_x = int(
random.uniform(*self.center_ratio_range) * self.img_scale[1])
center_y = int(
random.uniform(*self.center_ratio_range) * self.img_scale[0])
center_position = (center_x, center_y)
loc_strs = ('top_left', 'top_right', 'bottom_left', 'bottom_right')
for i, loc in enumerate(loc_strs):
if loc == 'top_left':
results_patch = copy.deepcopy(results)
else:
results_patch = copy.deepcopy(results['mix_results'][i - 1])
img_i = results_patch['img']
h_i, w_i = img_i.shape[:2]
# keep_ratio resize
scale_ratio_i = min(self.img_scale[0] / h_i,
self.img_scale[1] / w_i)
img_i = mmcv.imresize(
img_i, (int(w_i * scale_ratio_i), int(h_i * scale_ratio_i)))
# compute the combine parameters
paste_coord, crop_coord = self._mosaic_combine(
loc, center_position, img_i.shape[:2][::-1])
x1_p, y1_p, x2_p, y2_p = paste_coord
x1_c, y1_c, x2_c, y2_c = crop_coord
# crop and paste image
mosaic_img[y1_p:y2_p, x1_p:x2_p] = img_i[y1_c:y2_c, x1_c:x2_c]
# adjust coordinate
gt_bboxes_i = results_patch['gt_bboxes']
gt_labels_i = results_patch['gt_labels']
if gt_bboxes_i.shape[0] > 0:
padw = x1_p - x1_c
padh = y1_p - y1_c
gt_bboxes_i[:, 0::2] = \
scale_ratio_i * gt_bboxes_i[:, 0::2] + padw
gt_bboxes_i[:, 1::2] = \
scale_ratio_i * gt_bboxes_i[:, 1::2] + padh
mosaic_bboxes.append(gt_bboxes_i)
mosaic_labels.append(gt_labels_i)
if len(mosaic_labels) > 0:
mosaic_bboxes = np.concatenate(mosaic_bboxes, 0)
mosaic_labels = np.concatenate(mosaic_labels, 0)
if self.bbox_clip_border: # True
mosaic_bboxes[:, 0::2] = np.clip(mosaic_bboxes[:, 0::2], 0,
2 * self.img_scale[1])
mosaic_bboxes[:, 1::2] = np.clip(mosaic_bboxes[:, 1::2], 0,
2 * self.img_scale[0])
if not self.skip_filter: # True
mosaic_bboxes, mosaic_labels = \
self._filter_box_candidates(mosaic_bboxes, mosaic_labels)
# remove outside bboxes
inside_inds = find_inside_bboxes(mosaic_bboxes, 2 * self.img_scale[0],
2 * self.img_scale[1])
mosaic_bboxes = mosaic_bboxes[inside_inds]
mosaic_labels = mosaic_labels[inside_inds]
results['img'] = mosaic_img
results['img_shape'] = mosaic_img.shape
results['gt_bboxes'] = mosaic_bboxes
results['gt_labels'] = mosaic_labels
return results
def _mosaic_combine(self, loc, center_position_xy, img_shape_wh):
"""Calculate global coordinate of mosaic image and local coordinate of
cropped sub-image.
Args:
loc (str): Index for the sub-image, loc in ('top_left',
'top_right', 'bottom_left', 'bottom_right').
center_position_xy (Sequence[float]): Mixing center for 4 images,
(x, y).
img_shape_wh (Sequence[int]): Width and height of sub-image
Returns:
tuple[tuple[float]]: Corresponding coordinate of pasting and
cropping
- paste_coord (tuple): paste corner coordinate in mosaic image.
- crop_coord (tuple): crop corner coordinate in mosaic image.
"""
assert loc in ('top_left', 'top_right', 'bottom_left', 'bottom_right')
if loc == 'top_left':
# index0 to top left part of image
x1, y1, x2, y2 = max(center_position_xy[0] - img_shape_wh[0], 0), \
max(center_position_xy[1] - img_shape_wh[1], 0), \
center_position_xy[0], \
center_position_xy[1]
crop_coord = img_shape_wh[0] - (x2 - x1), img_shape_wh[1] - (
y2 - y1), img_shape_wh[0], img_shape_wh[1]
elif loc == 'top_right':
# index1 to top right part of image
x1, y1, x2, y2 = center_position_xy[0], \
max(center_position_xy[1] - img_shape_wh[1], 0), \
min(center_position_xy[0] + img_shape_wh[0],
self.img_scale[1] * 2), \
center_position_xy[1]
crop_coord = 0, img_shape_wh[1] - (y2 - y1), min(
img_shape_wh[0], x2 - x1), img_shape_wh[1]
elif loc == 'bottom_left':
# index2 to bottom left part of image
x1, y1, x2, y2 = max(center_position_xy[0] - img_shape_wh[0], 0), \
center_position_xy[1], \
center_position_xy[0], \
min(self.img_scale[0] * 2, center_position_xy[1] +
img_shape_wh[1])
crop_coord = img_shape_wh[0] - (x2 - x1), 0, img_shape_wh[0], min(
y2 - y1, img_shape_wh[1])
else:
# index3 to bottom right part of image
x1, y1, x2, y2 = center_position_xy[0], \
center_position_xy[1], \
min(center_position_xy[0] + img_shape_wh[0],
self.img_scale[1] * 2), \
min(self.img_scale[0] * 2, center_position_xy[1] +
img_shape_wh[1])
crop_coord = 0, 0, min(img_shape_wh[0],
x2 - x1), min(y2 - y1, img_shape_wh[1])
paste_coord = x1, y1, x2, y2
return paste_coord, crop_coord
def _filter_box_candidates(self, bboxes, labels):
"""Filter out bboxes too small after Mosaic."""
bbox_w = bboxes[:, 2] - bboxes[:, 0]
bbox_h = bboxes[:, 3] - bboxes[:, 1]
valid_inds = (bbox_w > self.min_bbox_size) & \
(bbox_h > self.min_bbox_size)
valid_inds = np.nonzero(valid_inds)[0]
return bboxes[valid_inds], labels[valid_inds]
def __repr__(self):
repr_str = self.__class__.__name__
repr_str += f'img_scale={self.img_scale}, '
repr_str += f'center_ratio_range={self.center_ratio_range}, '
repr_str += f'pad_val={self.pad_val}, '
repr_str += f'min_bbox_size={self.min_bbox_size}, '
repr_str += f'skip_filter={self.skip_filter})'
return repr_str
因为这里的mosaic_img大小是img_scale的两倍,因为在mmdet中mosaic还需要同时结合RandomAffine使用,randomaffine中包含旋转、缩放、平移、剪切操作,其中包含参数border=(-img_scale[0] // 2, -img_scale[1] // 2)),这里就使仿射变换后的输出大小为img_scale。