Mosaic数据增强

news2025/1/10 21:30:28

paper:YOLOv4: Optimal Speed and Accuracy of Object Detection

mosaic data augmentation最早是在YOLO v4的文章中提出的,但其实在ultralytics-YOLOv3中就已经实现了。具体就是将4张样本拼接成一张图,具有以下优点:(1)丰富一张图上的信息(2)增强后一张图上包含四张图的信息,减少了对大batch_size的依赖(3)通常小目标的检测效果要比大目标差,将四张图放到 一张图中,相当于变相扩充了数据集中小目标的样本数量。

下面是YOLOv4 paper中给出的一些mosaic增强的示例图

下面以mmdetection中的实现为例,介绍一下具体实现 

在mmdet中要使用Mosaic,需要同时使用MultiImageMixDataset。原本results字典中保存的是一张图的相关信息包括img、gt_bboxes、gt_labels等,在MultiImageMixDataset类中调用Mosaic类中的get_indexes方法,随机再挑出其它三张图的索引。然后将这3张图的信息放到列表中作为key 'mix_results'的value加到原始的results中,这样results就包含了4张图的信息。

mosaic的具体实现在函数_mosaic_transform中,具体步骤如下:

  1. 创建一个两倍img_scale大小的空图mosaic_img,值为pad_val,默认为114。img_scale是config文件中预先设定的模型输入大小。
  2. 确定四张图的的交点,将空图分为左上、右上、右下、左下四块。
  3. 将原始图片保持宽高比例resize到模型输入大小img_scale。
  4. 将图片贴到mosaic_img中,四张图相交于中心点,对于超过mosaic_img范围的部分截断。
  5. 调整每张小图的gt_bboxes坐标。

完整代码如下

class Mosaic:
    """Mosaic augmentation.

    Given 4 images, mosaic transform combines them into
    one output image. The output image is composed of the parts from each sub-
    image.

    .. code:: text

                        mosaic transform
                           center_x
                +------------------------------+
                |       pad        |  pad      |
                |      +-----------+           |
                |      |           |           |
                |      |  image1   |--------+  |
                |      |           |        |  |
                |      |           | image2 |  |
     center_y   |----+-------------+-----------|
                |    |   cropped   |           |
                |pad |   image3    |  image4   |
                |    |             |           |
                +----|-------------+-----------+
                     |             |
                     +-------------+

     The mosaic transform steps are as follows:

         1. Choose the mosaic center as the intersections of 4 images
         2. Get the left top image according to the index, and randomly
            sample another 3 images from the custom dataset.
         3. Sub image will be cropped if image is larger than mosaic patch

    Args:
        img_scale (Sequence[int]): Image size after mosaic pipeline of single
            image. The shape order should be (height, width).
            Default to (640, 640).
        center_ratio_range (Sequence[float]): Center ratio range of mosaic
            output. Default to (0.5, 1.5).
        min_bbox_size (int | float): The minimum pixel for filtering
            invalid bboxes after the mosaic pipeline. Default to 0.
        bbox_clip_border (bool, optional): Whether to clip the objects outside
            the border of the image. In some dataset like MOT17, the gt bboxes
            are allowed to cross the border of images. Therefore, we don't
            need to clip the gt bboxes in these cases. Defaults to True.
        skip_filter (bool): Whether to skip filtering rules. If it
            is True, the filter rule will not be applied, and the
            `min_bbox_size` is invalid. Default to True.
        pad_val (int): Pad value. Default to 114.
        prob (float): Probability of applying this transformation.
            Default to 1.0.
    """

    def __init__(self,
                 img_scale=(640, 640),
                 center_ratio_range=(0.5, 1.5),
                 min_bbox_size=0,
                 bbox_clip_border=True,
                 skip_filter=True,
                 pad_val=114,
                 prob=1.0):
        assert isinstance(img_scale, tuple)
        assert 0 <= prob <= 1.0, 'The probability should be in range [0,1]. '\
            f'got {prob}.'

        log_img_scale(img_scale, skip_square=True)
        self.img_scale = img_scale
        self.center_ratio_range = center_ratio_range
        self.min_bbox_size = min_bbox_size
        self.bbox_clip_border = bbox_clip_border
        self.skip_filter = skip_filter
        self.pad_val = pad_val
        self.prob = prob

    def __call__(self, results):
        """Call function to make a mosaic of image.

        Args:
            results (dict): Result dict.

        Returns:
            dict: Result dict with mosaic transformed.
        """

        if random.uniform(0, 1) > self.prob:
            return results

        results = self._mosaic_transform(results)
        return results

    def get_indexes(self, dataset):
        """Call function to collect indexes.

        Args:
            dataset (:obj:`MultiImageMixDataset`): The dataset.

        Returns:
            list: indexes.
        """

        indexes = [random.randint(0, len(dataset)) for _ in range(3)]
        return indexes

    def _mosaic_transform(self, results):
        """Mosaic transform function.

        Args:
            results (dict): Result dict.

        Returns:
            dict: Updated result dict.
        """

        assert 'mix_results' in results
        mosaic_labels = []
        mosaic_bboxes = []
        if len(results['img'].shape) == 3:
            mosaic_img = np.full(
                (int(self.img_scale[0] * 2), int(self.img_scale[1] * 2), 3),
                self.pad_val,
                dtype=results['img'].dtype)
        else:
            mosaic_img = np.full(
                (int(self.img_scale[0] * 2), int(self.img_scale[1] * 2)),
                self.pad_val,
                dtype=results['img'].dtype)

        # mosaic center x, y
        center_x = int(
            random.uniform(*self.center_ratio_range) * self.img_scale[1])
        center_y = int(
            random.uniform(*self.center_ratio_range) * self.img_scale[0])
        center_position = (center_x, center_y)

        loc_strs = ('top_left', 'top_right', 'bottom_left', 'bottom_right')
        for i, loc in enumerate(loc_strs):
            if loc == 'top_left':
                results_patch = copy.deepcopy(results)
            else:
                results_patch = copy.deepcopy(results['mix_results'][i - 1])

            img_i = results_patch['img']
            h_i, w_i = img_i.shape[:2]
            # keep_ratio resize
            scale_ratio_i = min(self.img_scale[0] / h_i,
                                self.img_scale[1] / w_i)
            img_i = mmcv.imresize(
                img_i, (int(w_i * scale_ratio_i), int(h_i * scale_ratio_i)))

            # compute the combine parameters
            paste_coord, crop_coord = self._mosaic_combine(
                loc, center_position, img_i.shape[:2][::-1])
            x1_p, y1_p, x2_p, y2_p = paste_coord
            x1_c, y1_c, x2_c, y2_c = crop_coord

            # crop and paste image
            mosaic_img[y1_p:y2_p, x1_p:x2_p] = img_i[y1_c:y2_c, x1_c:x2_c]

            # adjust coordinate
            gt_bboxes_i = results_patch['gt_bboxes']
            gt_labels_i = results_patch['gt_labels']

            if gt_bboxes_i.shape[0] > 0:
                padw = x1_p - x1_c
                padh = y1_p - y1_c
                gt_bboxes_i[:, 0::2] = \
                    scale_ratio_i * gt_bboxes_i[:, 0::2] + padw
                gt_bboxes_i[:, 1::2] = \
                    scale_ratio_i * gt_bboxes_i[:, 1::2] + padh

            mosaic_bboxes.append(gt_bboxes_i)
            mosaic_labels.append(gt_labels_i)

        if len(mosaic_labels) > 0:
            mosaic_bboxes = np.concatenate(mosaic_bboxes, 0)
            mosaic_labels = np.concatenate(mosaic_labels, 0)

            if self.bbox_clip_border:  # True
                mosaic_bboxes[:, 0::2] = np.clip(mosaic_bboxes[:, 0::2], 0,
                                                 2 * self.img_scale[1])
                mosaic_bboxes[:, 1::2] = np.clip(mosaic_bboxes[:, 1::2], 0,
                                                 2 * self.img_scale[0])

            if not self.skip_filter:  # True
                mosaic_bboxes, mosaic_labels = \
                    self._filter_box_candidates(mosaic_bboxes, mosaic_labels)

        # remove outside bboxes
        inside_inds = find_inside_bboxes(mosaic_bboxes, 2 * self.img_scale[0],
                                         2 * self.img_scale[1])
        mosaic_bboxes = mosaic_bboxes[inside_inds]
        mosaic_labels = mosaic_labels[inside_inds]

        results['img'] = mosaic_img
        results['img_shape'] = mosaic_img.shape
        results['gt_bboxes'] = mosaic_bboxes
        results['gt_labels'] = mosaic_labels

        return results

    def _mosaic_combine(self, loc, center_position_xy, img_shape_wh):
        """Calculate global coordinate of mosaic image and local coordinate of
        cropped sub-image.

        Args:
            loc (str): Index for the sub-image, loc in ('top_left',
              'top_right', 'bottom_left', 'bottom_right').
            center_position_xy (Sequence[float]): Mixing center for 4 images,
                (x, y).
            img_shape_wh (Sequence[int]): Width and height of sub-image

        Returns:
            tuple[tuple[float]]: Corresponding coordinate of pasting and
                cropping
                - paste_coord (tuple): paste corner coordinate in mosaic image.
                - crop_coord (tuple): crop corner coordinate in mosaic image.
        """
        assert loc in ('top_left', 'top_right', 'bottom_left', 'bottom_right')
        if loc == 'top_left':
            # index0 to top left part of image
            x1, y1, x2, y2 = max(center_position_xy[0] - img_shape_wh[0], 0), \
                             max(center_position_xy[1] - img_shape_wh[1], 0), \
                             center_position_xy[0], \
                             center_position_xy[1]
            crop_coord = img_shape_wh[0] - (x2 - x1), img_shape_wh[1] - (
                y2 - y1), img_shape_wh[0], img_shape_wh[1]

        elif loc == 'top_right':
            # index1 to top right part of image
            x1, y1, x2, y2 = center_position_xy[0], \
                             max(center_position_xy[1] - img_shape_wh[1], 0), \
                             min(center_position_xy[0] + img_shape_wh[0],
                                 self.img_scale[1] * 2), \
                             center_position_xy[1]
            crop_coord = 0, img_shape_wh[1] - (y2 - y1), min(
                img_shape_wh[0], x2 - x1), img_shape_wh[1]

        elif loc == 'bottom_left':
            # index2 to bottom left part of image
            x1, y1, x2, y2 = max(center_position_xy[0] - img_shape_wh[0], 0), \
                             center_position_xy[1], \
                             center_position_xy[0], \
                             min(self.img_scale[0] * 2, center_position_xy[1] +
                                 img_shape_wh[1])
            crop_coord = img_shape_wh[0] - (x2 - x1), 0, img_shape_wh[0], min(
                y2 - y1, img_shape_wh[1])

        else:
            # index3 to bottom right part of image
            x1, y1, x2, y2 = center_position_xy[0], \
                             center_position_xy[1], \
                             min(center_position_xy[0] + img_shape_wh[0],
                                 self.img_scale[1] * 2), \
                             min(self.img_scale[0] * 2, center_position_xy[1] +
                                 img_shape_wh[1])
            crop_coord = 0, 0, min(img_shape_wh[0],
                                   x2 - x1), min(y2 - y1, img_shape_wh[1])

        paste_coord = x1, y1, x2, y2
        return paste_coord, crop_coord

    def _filter_box_candidates(self, bboxes, labels):
        """Filter out bboxes too small after Mosaic."""
        bbox_w = bboxes[:, 2] - bboxes[:, 0]
        bbox_h = bboxes[:, 3] - bboxes[:, 1]
        valid_inds = (bbox_w > self.min_bbox_size) & \
                     (bbox_h > self.min_bbox_size)
        valid_inds = np.nonzero(valid_inds)[0]
        return bboxes[valid_inds], labels[valid_inds]

    def __repr__(self):
        repr_str = self.__class__.__name__
        repr_str += f'img_scale={self.img_scale}, '
        repr_str += f'center_ratio_range={self.center_ratio_range}, '
        repr_str += f'pad_val={self.pad_val}, '
        repr_str += f'min_bbox_size={self.min_bbox_size}, '
        repr_str += f'skip_filter={self.skip_filter})'
        return repr_str

因为这里的mosaic_img大小是img_scale的两倍,因为在mmdet中mosaic还需要同时结合RandomAffine使用,randomaffine中包含旋转、缩放、平移、剪切操作,其中包含参数border=(-img_scale[0] // 2, -img_scale[1] // 2)),这里就使仿射变换后的输出大小为img_scale。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/21574.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

C++string—常用接口介绍+模拟实现+习题讲解

如果调试一个程序让你很苦恼&#xff0c;千万不要放弃&#xff0c;成功永远在拐角之后&#xff0c;除非你走到拐角&#xff0c;否则你永远不知道你离他多远&#xff0c;所以&#xff0c;请记住&#xff0c;坚持不懈&#xff0c;直到成功。 目录 前言 1.string类的常用接口 1.1s…

c++提高篇——模板(下)

c提高篇——模板&#xff08;下&#xff09;一、类模板二、类模板与函数模板区别三、类模板中成员函数创建时机四、类模板对象做函数参数一、类模板 类模板可以建立一个通用类&#xff0c;类中的成员数据类型可以不具体制定&#xff0c;用一个虚拟的类型来代表。 类模板的语法…

周赛补题

leetcode &#xff1a; 第一题https://leetcode.cn/problems/number-of-unequal-triplets-in-array/可以直接暴力 class Solution { public:int unequalTriplets(vector<int>& nums) {int sum 0;int n nums.size();for(int i 0; i < n; i ){for(int j i …

kmp算法记录

看了如何更好地理解和掌握 KMP 算法?之后&#xff0c;做的整理 相关知识 尽管普通模式匹配的时间复杂度是O(mn)&#xff0c;KMP 算法的时间复杂度是O(mn)&#xff0c;但在一般情况下&#xff0c;普通模式匹配的实际执行时间近似为O(m n)&#xff0c;因此至今仍被采用。KMP算法…

一文弄懂CNN中的BatchNorm

1. 引言 本文重点介绍BatchNorm的定义和相关特性&#xff0c;并介绍了其详细实现和具体应用。希望可以帮助大家加深对其理解。 嗯嗯&#xff0c;闲话少说&#xff0c;我们直接开始吧&#xff01; 2. 什么是BatchNorm&#xff1f; BatchNorm是2015年提出的网络层&#xff0c…

一文讲懂高并发分布式系统,听不懂你来打我

众所周知&#xff0c;在分布式系统的设计与建立中&#xff0c;其中一个要考虑的问题就是高并发。 那么&#xff0c;到底什么是高并发呢? 简单来说高并发就是指通过设计系统&#xff0c;使之实现可以同时处理多个请求的能力。 现在的高并发系统主要存在有两种实现方式&#…

Utilizing Transformer Representations Efficiently

ContentsIntroductionDifferent Pooling StrategiesPooler OutputLast Hidden State OutputHidden States OutputMore...ReferencesIntroduction 在用预训练模型微调时&#xff0c;我们比较习惯于直接用 Transformer 最后一层的输出经过 FC / Bi-LSTM… 后输出最终结果。但实际…

Perforce P4V 资源汇总

Perforce P4V 入门https://download.csdn.net/download/love_xiaozhao/20533522 P4 Command Referencehttps://download.csdn.net/download/love_xiaozhao/20534062 P4V文件状态命令速查表https://download.csdn.net/download/love_xiaozhao/20533404PerforcexHelix分支策略_…

【android Framework 探究】android 13 aosp编译全记录

写在开始 自从关注Framework这一块儿&#xff0c;就有了自己编译aosp刷机的想法&#xff0c;模拟器当然是可以的&#xff0c;但是体验感还不能和真机想比&#xff0c;于是买一个二手piexl的想法就有了&#xff0c;根据预算选定piexl 5&#xff0c;支持最新的android 13&#xf…

编码命名方式知多少

文章目录1.camel case &#xff08;驼峰式&#xff09;2.snake case &#xff08;蛇形式&#xff09;3.kebab case &#xff08;烤串式&#xff09;4.匈牙利命名法5.小结参考文献编码时&#xff0c;命名无处不在。比如我们需要对文件命令&#xff0c;对目录命名&#xff0c;对变…

m低信噪比下GPS信号的捕获算法研究,使用matlab算法进行仿真

目录 1.算法概述 2.仿真效果预览 3.MATLAB部分代码预览 4.完整MATLAB程序 1.算法概述 GPS系统的星座部分是由21颗工作卫星和3颗在轨备用卫星组成&#xff0c;其高度为20183km&#xff0c;这24颗卫星均匀分布在6个等间隔的、相对轨道面倾角为55的近圆轨道上。 ​ GPS卫星的…

处理csv、bmp等常用数据分析操作--python

请先看思维导图&#xff0c;看是否包含你所需要的东西&#xff0c;如果没有&#xff0c;就可以划走了&#xff0c;免得浪费时间&#xff0c;谢谢 条条大路通罗马&#xff0c;我只是介绍了我掌握的这一条&#xff0c;不喜勿喷&#xff0c;谢谢。 目录 一、创建文件夹&#xff0…

一行日志,让整个文件导出服务导出内容都为空..

输出一行日志&#xff0c;却让整个文件上传服务挂了...问题分析小结问题 直接上代码&#xff0c;看看有无眼尖的小伙伴发现问题&#xff1a; // 设置参数 MultiValueMap<String, Object> param new LinkedMultiValueMap<>(); FileSystemResource resource new …

log4cpp初入门

目录下载与安装log4cpp框架CategoryAppenderLayoutPriortyOutput功能日志级别⽇志格式化⽇志输出日志回滚日志配置文件下载与安装 https://sourceforge.net/projects/log4cpp/ tar xvf log4cpp-1.1.3.tar.gz cd log4cpp ./configure make make check make install ldconfig…

轻松玩转树莓派Pico之三、Windows+Ubuntu虚拟机模式下VSCode C语言开发环境搭建

目录 1、VSCode下载与安装 2、VSCode基础插件安装 3、SSH连接与配置 4、SSH免密登录 5、Pico编译 工欲善其事&#xff0c;必先利其器。之前的介绍的Pico流程都是通过命令行编译&#xff0c;没有进行更深入的介绍&#xff0c;本文将介绍Pico的VSCode-C语言开发环境搭建与配…

Rust WASM 与 JS 计算素数性能对比

前言 刚接触Rust wasm&#xff0c;请各看官高抬贵手。 简介 根据网上资料&#xff0c;使用 wasm-pack 搭配wasm-bindgen将Rust代码编译成 wasm。 搭好环境后&#xff0c;想对比一下rust-wasm与js的性能差距。 环境 OS: Deepin 20.7.1 apricotKernel: Linux 5.15.34CPU: Int…

LeetCode单周赛第320场 AcWing周赛第78场总结

1. LeetCode单周赛第320场 1.1 数组中不等三元组的数目 1.1.1 原题链接&#xff1a;力扣https://leetcode.cn/problems/number-of-unequal-triplets-in-array/ 1.1.2 解题思路&#xff1a; 暴力遍历咯。 1.1.3 代码&#xff1a; class Solution { public:int unequalTripl…

JAVA的学习心路历程之JDK基础入门(上)

任务需要&#xff0c;需要我学java调用linux下的动态库&#xff0c;于是搜寻java知识更新这篇。 从我上大学起我就听别人说JAVA&#xff0c;不不&#xff0c;应该是初中&#xff0c;那时候流行带键盘的智能手机&#xff0c;里面有好些个游戏都是JAVA写的&#xff0c;可见JAVA有…

【JavaWeb】HTML

HTML1 HTML概念1.1 超文本1.2 标记语言2 HTML的入门程序3 HTML语法规则4 使用idea创建StaticWeb工程5 HTML的各个标签的使用5.1标题标签5.2段落标签5.3换行标签5.4无序列表标签5.5超链接标签5.6图像标签5.7块标签6.使用表格标签展示数据6.1未合并单元格6.2合并单元格-合并列6.3…

套接字+网络套接字函数+客户端大小写程序

NAT映射 一般来说&#xff1a;源主机和目的主机都属于局域网&#xff0c;也就是ip地址可能相同 但是&#xff1a;路由器一般是公网ip,即整个网络环境可见 每一个路由器会维护一个NAT映射表 路由器的ip一般是固定的公网ip路由器与把与它相连的主机&#xff1a;私有ip端口号映射…