Bounding boxes augmentation for object detection

news2025/1/9 1:38:45

Different annotations formats¶

Bounding boxes are rectangles that mark objects on an image. There are multiple formats of bounding boxes annotations. Each format uses its specific representation of bouning boxes coordinates 每种格式都使用其特定的边界框坐标表示。. Albumentations supports four formats: pascal_vocalbumentationscoco, and yolo .

Let's take a look at each of those formats and how they represent coordinates 坐标 of bounding boxes.

As an example, we will use an image from the dataset named Common Objects in Context. It contains one bounding box that marks a cat. The image width is 640 pixels, and its height is 480 pixels. The width of the bounding box is 322 pixels, and its height is 117 pixels.

 An example image with a bounding box from the COCO dataset

pascal_voc¶

pascal_voc is a format used by the Pascal VOC dataset. Coordinates of a bounding box are encoded with four values in pixels: [x_min, y_min, x_max, y_max]. x_min and y_min are coordinates of the top-left corner of the bounding box. x_max and y_max are coordinates of bottom-right corner of the bounding box.

Coordinates of the example bounding box in this format are [98, 345, 420, 462].

albumentations¶

albumentations is similar to pascal_voc, because it also uses four values [x_min, y_min, x_max, y_max] to represent a bounding box. But unlike pascal_vocalbumentations uses normalized values. To normalize values, we divide coordinates in pixels for the x- and y-axis by the width and the height of the image.

Coordinates of the example bounding box in this format are [98 / 640, 345 / 480, 420 / 640, 462 / 480] which are [0.153125, 0.71875, 0.65625, 0.9625].

Albumentations uses this format internally 内部 to work with bounding boxes and augment them.

coco¶

coco is a format used by the Common Objects in Context COCOCOCO dataset.

In coco, a bounding box is defined by four values in pixels [x_min, y_min, width, height]. They are coordinates of the top-left corner along with the width and height of the bounding box.

Coordinates of the example bounding box in this format are [98, 345, 322, 117].

yolo¶

In yolo, a bounding box is represented by four values [x_center, y_center, width, height]x_center and y_center are the normalized coordinates of the center of the bounding box. To make coordinates normalized, we take pixel values of x and y, which marks the center of the bounding box on the x- and y-axis. Then we divide the value of x by the width of the image and value of y by the height of the image. width and height represent the width and the height of the bounding box. They are normalized as well.

Coordinates of the example bounding box in this format are [((420 + 98) / 2) / 640, ((462 + 345) / 2) / 480, 322 / 640, 117 / 480] which are [0.4046875, 0.840625, 0.503125, 0.24375].

Bounding boxes augmentation¶

Just like with images and masks augmentation, the process of augmenting bounding boxes consists of 4 steps.

  1. You import the required libraries.
  2. You define an augmentation pipeline.
  3. You read images and bounding boxes from the disk.
  4. You pass an image and bounding boxes to the augmentation pipeline and receive augmented images and boxes.

Note

Some transforms in Albumentation don't support bounding boxes. If you try to use them you will get an exception. Please refer to this article to check whether a transform can augment bounding boxes.

Step 1. Import the required libraries.¶

import albumentations as A
import cv2

Step 2. Define an augmentation pipeline.¶

Here an example of a minimal declaration of an augmentation pipeline that works with bounding boxes.

transform = A.Compose([
    A.RandomCrop(width=450, height=450),
    A.HorizontalFlip(p=0.5),
    A.RandomBrightnessContrast(p=0.2),
], bbox_params=A.BboxParams(format='coco'))

Note that unlike image and masks augmentation, Compose now has an additional parameter bbox_params. You need to pass an instance of A.BboxParams to that argument. A.BboxParams specifies settings for working with bounding boxes. format sets the format for bounding boxes coordinates.

It can either be pascal_vocalbumentationscoco or yolo. This value is required because Albumentation needs to know the coordinates' source format for bounding boxes to apply augmentations correctly.

Besides formatA.BboxParams supports a few more settings.

Here is an example of Compose that shows all available settings with A.BboxParams:

transform = A.Compose([
    A.RandomCrop(width=450, height=450),
    A.HorizontalFlip(p=0.5),
    A.RandomBrightnessContrast(p=0.2),
], bbox_params=A.BboxParams(format='coco', min_area=1024, min_visibility=0.1, label_fields=['class_labels']))

min_area and min_visibility

min_area and min_visibility parameters control what Albumentations should do to the augmented bounding boxes if their size has changed after augmentation. The size of bounding boxes could change if you apply spatial augmentations 空间增强 , for example, when you crop 裁剪 a part of an image or when you resize an image.

min_area is a value in pixels 是以像素为单位的值. If the area of a bounding box after augmentation becomes smaller than min_area, Albumentations will drop that box. So the returned list of augmented bounding boxes won't contain that bounding box.

min_visibility is a value between 0 and 1. If the ratio of the bounding box area after augmentation to the area of the bounding box before augmentation becomes smaller than min_visibility, Albumentations will drop that box. So if the augmentation process cuts the most of the bounding box, that box won't be present in the returned list of the augmented bounding boxes.

Here is an example image that contains two bounding boxes. Bounding boxes coordinates are declared using the coco format.

 An example image with two bounding boxes

First, we apply the CenterCrop augmentation without declaring parameters min_area and min_visibility. The augmented image contains two bounding boxes.

 An example image with two bounding boxes after applying augmentation

Next, we apply the same CenterCrop augmentation, but now we also use the min_area parameter. Now, the augmented image contains only one bounding box, because the other bounding box's area after augmentation became smaller than min_area, so Albumentations dropped that bounding box.

 An example image with one bounding box after applying augmentation with 'min_area'

Finally, we apply the CenterCrop augmentation with the min_visibility. After that augmentation, the resulting image doesn't contain any bounding box, because visibility of all bounding boxes after augmentation are below threshold set by min_visibility.

An example image with zero bounding boxes after applying augmentation with 'min_visibility' 

Class labels for bounding boxes¶

Besides coordinates, each bounding box should have an associated class label that tells which object lies inside the bounding box. There are two ways to pass a label for a bounding box.

Let's say you have an example image with three objects: dogcat, and sports ball. Bounding boxes coordinates in the coco format for those objects are [23, 74, 295, 388][377, 294, 252, 161], and [333, 421, 49, 49].

An example image with 3 bounding boxes from the COCO dataset

1. You can pass labels along with bounding boxes coordinates by adding them as additional values to the list of coordinates.¶

For the image above, bounding boxes with class labels will become [23, 74, 295, 388, 'dog'][377, 294, 252, 161, 'cat'], and [333, 421, 49, 49, 'sports ball'].

Class labels could be of any type: integer, string, or any other Python data type. For example, integer values as class labels will look the following: [23, 74, 295, 388, 18][377, 294, 252, 161, 17], and [333, 421, 49, 49, 37].

 Also, you can use multiple class values for each bounding box, for example [23, 74, 295, 388, 'dog', 'animal'][377, 294, 252, 161, 'cat', 'animal'], and [333, 421, 49, 49, 'sports ball', 'item'].

2.You can pass labels for bounding boxes as a separate list (the preferred way).¶

 For example, if you have three bounding boxes like [23, 74, 295, 388][377, 294, 252, 161], and [333, 421, 49, 49] you can create a separate list with values like ['cat', 'dog', 'sports ball'], or [18, 17, 37] that contains class labels for those bounding boxes. Next, you pass that list with class labels as a separate argument to the transform function. Albumentations needs to know the names of all those lists with class labels to join them with augmented bounding boxes correctly. Then, if a bounding box is dropped after augmentation because it is no longer visible, Albumentations will drop the class label for that box as well. Use label_fields parameter to set names for all arguments in transform that will contain label descriptions for bounding boxes (more on that in Step 4).

Step 3. Read images and bounding boxes from the disk.¶

Read an image from the disk.

image = cv2.imread("/path/to/image.jpg")
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

Bounding boxes can be stored on the disk in different serialization formats: JSON, XML, YAML, CSV, etc. So the code to read bounding boxes depends on the actual format of data on the disk.

After you read the data from the disk, you need to prepare bounding boxes for Albumentations.

Albumentations expects that bounding boxes will be represented 表示 as a list of lists. Each list contains information about a single bounding box. A bounding box definition should have at list four elements that represent the coordinates of that bounding box. The actual meaning of those four values depends on the format of bounding boxes (either pascal_vocalbumentationscoco, or yolo). Besides four coordinates, each definition of a bounding box may contain one or more extra values. You can use those extra values to store additional information about the bounding box, such as a class label of the object inside the box. During augmentation, Albumentations will not process those extra values. The library will return them as is along with the updated coordinates of the augmented bounding box 库将按原样返回它们以及增强边界框的更新坐标.

Step 4. Pass an image and bounding boxes to the augmentation pipeline and receive augmented images and boxes.¶

As discussed in Step 2, there are two ways of passing class labels along with bounding boxes coordinates:

1. Pass class labels along with coordinates.¶

So, if you have coordinates of three bounding boxes that look like this:

bboxes = [
    [23, 74, 295, 388],
    [377, 294, 252, 161],
    [333, 421, 49, 49],
]

you can add a class label for each bounding box as an additional element of the list along with four coordinates. So now a list with bounding boxes and their coordinates will look the following:

bboxes = [
    [23, 74, 295, 388, 'dog'],
    [377, 294, 252, 161, 'cat'],
    [333, 421, 49, 49, 'sports ball'],
]

or with multiple labels per each bounding box:

bboxes = [
    [23, 74, 295, 388, 'dog', 'animal'],
    [377, 294, 252, 161, 'cat', 'animal'],
    [333, 421, 49, 49, 'sports ball', 'item'],
]

You can use any data type for declaring class labels. It can be string, integer, or any other Python data type.

Next, you pass an image and bounding boxes for it to the transform function and receive the augmented image and bounding boxes.

transformed = transform(image=image, bboxes=bboxes)
transformed_image = transformed['image']
transformed_bboxes = transformed['bboxes']

 Example input and output data for bounding boxes augmentation

2. Pass class labels in a separate argument to transform (the preferred way).¶

 Let's say you have coordinates of three bounding boxes

bboxes = [
    [23, 74, 295, 388],
    [377, 294, 252, 161],
    [333, 421, 49, 49],
]

You can create a separate list that contains class labels for those bounding boxes:

class_labels = ['cat', 'dog', 'parrot']

Then you pass both bounding boxes and class labels to transform. Note that to pass class labels, you need to use the name of the argument that you declared in label_fields when creating an instance of Compose in step 2. In our case, we set the name of the argument to class_labels.

transformed = transform(image=image, bboxes=bboxes, class_labels=class_labels)
transformed_image = transformed['image']
transformed_bboxes = transformed['bboxes']
transformed_class_labels = transformed['class_labels']

Example input and output data for bounding boxes augmentation with a separate argument for class labels 

Note that label_fields expects a list, so you can set multiple fields that contain labels for your bounding boxes. So if you declare Compose like

transform = A.Compose([
    A.RandomCrop(width=450, height=450),
    A.HorizontalFlip(p=0.5),
    A.RandomBrightnessContrast(p=0.2),
], bbox_params=A.BboxParams(format='coco', label_fields=['class_labels', 'class_categories'])))

you can use those multiple arguments to pass info about class labels, like

class_labels = ['cat', 'dog', 'parrot']
class_categories = ['animal', 'animal', 'item']

transformed = transform(image=image, bboxes=bboxes, class_labels=class_labels, class_categories=class_categories)
transformed_image = transformed['image']
transformed_bboxes = transformed['bboxes']
transformed_class_labels = transformed['class_labels']
transformed_class_categories = transformed['class_categories']

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1310270.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

51单片机控制1602LCD显示屏输出两行文字一

51单片机控制1602LCD显示屏输出两行文字一 1.概述 这篇文章介绍1602型号显示屏的基础知识,以及使用单片机控制它输出两行内容。 2.1602基础知识 1602 液晶显示模块是一种通用的工业液晶显示模块,专门用来显示字母、数字、符号等的点阵型液晶显示模块…

C++智能指针介绍

引言 为了充分利用RAII思想,C 11开始引入了智能指针,本文介绍RAII以及三种智能指针: std::unique_ptrstd::shared_ptrstd::weak_ptr 除此之外,本文还会介绍智能指针的常用创建方法: std::make_uniquestd::make_sha…

微信小程序:用map()将对象数组中的某一项组合成新数组

使用分析 使用map()方法来遍历 info 数组中的每个元素,并整合每一个对象中的某一项进行新数组的重组 效果展示 这里是查询对象数组中的全部name值 原始数据 提取出name的数组 核心代码 var infos items.map(item > item.name); 完整代码(用微信小程…

iOS按钮控件UIButton使用

1.在故事板中添加按钮控件,步聚如下: 同时按钮Shift+Commad+L在出现在控件库中选择Button并拖入View Controller Scene中 将控件与变量btnSelect关联 关联后空心变实心 如何关联?直接到属性窗口拖按钮变量到控件上,出现一条线,然后松开,这样就关联成功了 关联成功后属性窗口…

Redis HyperLogLog 数据结构模型统计

HyperLogLog HyperLogLog 不是一种新的数据结构 , 本质上是字符串类型。 是一种基数算法。 通过 HyperLogLog 可以节省内存空间,并完成独立总数的统计。 HyperLogLog 数据结构可用于仅使用少量恒定内存来计算集合中的唯一元素,具体而言&…

Web开发:VS2022列表导出CSV中文乱码问题(已解决)

目录 一、问题重现 二、解决方案 1.新建一个EXCEL文档 2.点击数据-点击导入(生成的文件)-设置中文格式 一、问题重现 使用VS2022 DEBUG导出列表时,打开CSV文件发现中文乱码 二、解决方案 1.新建一个EXCEL文档 2.点击数据-点击导入&…

新手选电视盒子什么牌子好?内行分享最新电视盒子排名

新手们在面对众多品牌和机型时难免不知道如何挑选电视盒子,电视盒子的品质良莠不齐,究竟电视盒子什么牌子好?我身为从业人员,身边朋友在挑选电视盒子时都会咨询我的意见,我特意整理了业内最新发布的热门电视盒子排名TO…

选择销售技巧培训机构注意事项

选择销售技巧培训机构注意事项 随着市场竞争的日益激烈,销售技巧对于企业的成功至关重要。为了提升销售团队的技能,许多企业选择投资于销售技巧培训机构。然而,在选择培训机构时,有几个关键因素需要考虑。本文将介绍选择销售技巧…

亚马逊云科技re_Invent 2023产品体验:亚马逊云科技产品应用实践 王炸产品Amazon Q,你的AI助手

本篇文章授权活动官方亚马逊云科技文章转发、改写权,包括不限于在 亚马逊云科技开发者社区, 知乎,自媒体平台,第三方开发者媒体等亚马逊云科技官方渠道 意料之中 2023年9月25日,亚马逊宣布与 Anthropic 正式展开战略合作&#x…

详解接口测试

目录 什么是接口? 接口协议的类型 接口测试是什么 HTTP接口的测试用例设计 HTTP接口的测试方法 什么是接口? 在面向对象编程中,接口是一个抽象的概念,用于定义类应该具有的方法和属性。一个类可以实现一个或多个接口&#xf…

如何解决掉你的u盘装不进去文件大小过大的文件

目录 前言1.解决方案2.原因2.1查看自己u盘格式2.2不同格式2.3分配单元大小作用 👍 点赞,你的认可是我创作的动力! ⭐️ 收藏,你的青睐是我努力的方向! ✏️ 评论,你的意见是我进步的财富! 前言…

Ubuntu环境下使用GDB调试C语言项目

1. 安装gdb //终端输入 sudo apt-get install gdb 2. 启动gdb gdb GDB常用命令大全,参考此篇博客 使用GDB调试C项目中的makefile 1.在内核配置中启用调试信息: 在内核配置中,确保启用了调试信息。可以通过以下步骤来配置内核&#xff1…

uniapp 蓝牙小程序-兼容安卓和iOS

withTimeout方法可以在搜寻设备时等待指定的秒数,如果30秒内未搜索到则取消搜索 /*** 超时控制函数* param {Promise} promise 回调函数* param {number} timeout 超时时间, 默认10s*/ export function withTimeout(promise, timeout 10000) {let timeoutEvent …

FFmpeg的AVFilter框架总成AVFilter-AVFilterContext

毫无疑问,还是和前面的一样一个context和一个包含有回调函数指针的插件结构体,想要实现自己的插件,主要实现里面的回调函数就可以了,当然,AVFilter比其它模块稍微复杂一点还要牵扯到其它一些辅助模块,在其它…

华为OD试题六(数据最节约的备份方法、TLV解码)

1. 数据最节约的备份方法 题目描述: 有若干个文件,使用刻录光盘的方式进行备份,假设每张光盘的容量是500MB,求 使用光盘最少的文件分布方式 所有文件的大小都是整数的MB,且不超过500MB;文件不能分割、分卷…

elementui select中添加新增标签

<el-select v-model"ruleForm.eventType" :placeholder"请选择事件类型&#xff0c;可手动添加" ref"template" clearable visible-change"(v) > visibleChange(v, template)"><el-option v-for"item in eventTypeOp…

复制粘贴——QT实现原理

复制粘贴——QT实现原理 QT 剪贴板相关类 QClipboard 对外通用的剪贴板类&#xff0c;一般通过QGuiApplication::clipboard() 来获取对应的剪贴板实例。 // qtbase/src/gui/kernel/qclipboard.h class Q_GUI_EXPORT QClipboard : public QObject {Q_OBJECT private:explici…

华为OD试题五(数列描述、矩阵最大值、数据分类)

1. 数列描述 示例代码&#xff1a; # 核心 从第一项 推 第N项目 # 第一项 a0 1 # 推到 第N项 N 4 def fun(a0):# 计算每一项的具体值result left 0cursor 0while cursor < len(a0):if a0[cursor] ! a0[left]:count cursor -leftresult "{}{}".format(str(…

2.2 模型基础

建模流程 作业 这次搞了10天左右终于把作业做完了。 先是去学习了下如何建模->然后将模型导入Substance Painter里绘制贴图->最后导入到unity中&#xff08;虽然最后效果很差&#xff09;&#xff0c;但是回过头来看整个过程学习到了次时代美术的工作流&#xff0c;思考…

智慧公交:提高城市出行效率的数字化之路

随着城市化进程的不断加速&#xff0c;公共交通成为人们日常出行的主要方式之一。为了提高公共交通的效率和服务质量&#xff0c;智慧公交应运而生。智慧公交是一种基于物联网、大数据、人工智能等技术&#xff0c;对公共交通进行数字化、智能化改造的新型公共交通系统。 以此为…