【计算机视觉】24-Object Detection

news2024/9/20 22:41:26

文章目录

  • 24-Object Detection
    • 1. Introduction
    • 2. Methods
      • 2.1 Sliding Window
      • 2.2 R-CNN: Region-Based CNN
      • 2.3 Fast R-CNN
      • 2.4 Faster R-CNN: Learnable Region Proposals
      • 2.5 Results of objects detection
    • 3. Summary
    • Reference

24-Object Detection

1. Introduction

  1. Task Definition

    Input: Single RGB Image

    Output: A set of detected objects;

    For each object predict:

    • Category label (from fixed, known set of categories)

    • Bounding box(four numbers: x, y, width, height)

  2. Challenges

    • Multiple outputs: Need to output variable numbers of objects per image
    • Multiple types of output: Need to predict ”what” (category label) as well as “where” (bounding box)
    • Large images: Classification works at 224x224; need higher resolution for detection, often ~800x600
  3. Detecting a single object

    image-20231120145632741

    With two branches, outputting label, and box

    Problem: Images can have more than one object! And if we use multiple single object detection, it will decrease the efficiency.

2. Methods

2.1 Sliding Window

Apply a CNN to many different crops of the image, CNN classifies each crop as an object or background:

image-20231120150748738

Problem: Need too many calculations

  • Consider an image of size H*W and a box of size h*w
  • Total possible boxes: ∑ h = 1 H ∑ w = 1 W ( W − w + 1 ) ( H − h + 1 ) = H ( H + 1 ) 2 W ( W + 1 ) 2 \sum_{h=1}^{H}\sum_{w=1}^{W}(W-w+1)(H-h+1)=\frac{H(H+1)}{2}\frac{W(W+1)}{2} h=1Hw=1W(Ww+1)(Hh+1)=2H(H+1)2W(W+1)
  • 800 x 600 image has ~58M boxes! No way we can evaluate them all.

2.2 R-CNN: Region-Based CNN

  1. Region Proposals(Selective Search)

    Selective Search is a region proposal algorithm used in object detection. It is based on computing hierarchical grouping of similar regions based on color, texture, size and shape compatibility.

    Selective Search starts by over-segmenting the image based on intensity of the pixels using a graph-based segmentation method by Felzenszwalb and Huttenlocher.

    image-20231120213007261

    Selective Search algorithm takes these oversegments as initial input and performs the following steps

    1. Add all bounding boxes corresponding to segmented parts to the list of regional proposals
    2. Group adjacent segments based on similarity
    3. Go to step 1

    At each iteration, larger segments are formed and added to the list of region proposals. Hence we create region proposals from smaller segments to larger segments in a bottom-up approach.

    As for the calculation of similarity measures based on color, texture, size and shape compatibility, please refer to Selective Search for Object Detection (C++ / Python) | LearnOpenCV

  2. Architecture of the network

    image-20231120214110598

    On two thousand selected regions, we narrow them down to the size required for classification, and after passing through the convolutional network, we output the category along with the box offset

  3. Steps

    1. Run region proposal method to compute ~2000 region proposals
    2. Resize each region to 224x224 and run independently through CNN to predict class scores and bbox transform
    3. Use scores to select a subset of region proposals to output (Many choices here: threshold on background, or per-category? Or take top K proposals per image?)
    4. Compare with ground-truth boxes
  4. Details(Focus on step3 and 4)

    1. Intersection over Union (IoU)
      I o U = Area of Intersection Area of Union IoU=\frac{\color{yellow}{\text{Area of Intersection}}}{\color{purple}{\text{Area of Union}}} IoU=Area of UnionArea of Intersection
      在这里插入图片描述

    2. Non-Max Suppression (NMS)

      • Select next highest-scoring box

      • Eliminate lower-scoring boxes(Comparing the highest-scoring box to all the others ) with IoU > threshold (e.g. 0.7)

      • If any boxes remain, GOTO 1

      Problem: NMS may eliminate ”good” boxes when objects are highly overlapping:

在这里插入图片描述

  1. Mean Average Precision (mAP)

    Use the gif to understand it(but I only have the final image):

在这里插入图片描述 For example, the mAP in COCO dataset is 0.4.

  1. Problem: Very slow! Need to do ~2k forward passes for each image!

    Solution: Run CNN before warping!

2.3 Fast R-CNN

  1. Architecture:

    image-20231120151757798
    • Most of the computation happens in the backbone network; this saves work for overlapping region proposals

    • Per-Region network is relatively lightweight

  2. The concrete architecture in Alexnet and Resnet:

    image-20231120152141617 image-20231120152156583
  3. Details:

    How to crop features?

    image-20231120222841764

    In this process, there are two errors:

    img

    如下图,假设输入图像经过一系列卷积层下采样32倍后输出的特征图大小为8x8,现有一 RoI 的左上角和右下角坐标(x, y 形式)分别为(0, 100) 和 (198, 224),映射至特征图上后坐标变为(0, 100 / 32)和(198 / 32,224 / 32),由于像素点是离散的,因此向下取整后最终坐标为(0, 3)和(6, 7),这里产生了第一次量化误差。

    假设最终需要将 RoI 变为固定的2x2大小,那么将 RoI 平均划分为2x2个区域,每个区域长宽分别为 (6 - 0 + 1) / 2 和 (7 - 3 + 1) / 2 即 3.5 和 2.5,同样,由于像素点是离散的,因此有些区域的长取3,另一些取4,而有些区域的宽取2,另一些取3,这里产生了第二次量化误差。

  4. RoI Align in Mask R-CNN

在这里插入图片描述

Notice: RoI Align needs to set a hyperparameter to represent the number of sampling points in each region, which is usually 4.

  1. Speed

    It has an enormous increase from R-CNN. But we can find that region proposals costs lots of time.

2.4 Faster R-CNN: Learnable Region Proposals

  1. Architecture:

    Insert Region Proposal Network (RPN) to predict proposals from feature
    在这里插入图片描述

  2. Details:

在这里插入图片描述

At each point, predict whether the corresponding anchor contains an object. And we use logistic regression to express the error. predict scores with conv layer

  1. Evaluation

在这里插入图片描述

  1. Improvement

    Faster R-CNN is a Two-stage object detector:

    But we want to design the structure of end to end, eliminating the second stage. So we change the function of region proposal network to predict the class label.
    在这里插入图片描述

2.5 Results of objects detection

在这里插入图片描述

  • Two-stage method (Faster R-CNN) gets the best accuracy but are slower.
  • Single-stage methods (SSD) are much faster but don’t perform as well
  • Bigger backbones improve performance, but are slower
  • Diminishing returns for slower methods

在这里插入图片描述

These results are a few years old …since then GPUs have gotten faster, and we’ve improved performance with many tricks:

  • Train longer!
  • Multiscale backbone: Feature
    Pyramid Networks
  • Better backbone: ResNeXt
  • Single-Stage methods have improved
  • Very big models work better
  • Test-time augmentation pushes
    numbers up
  • Big ensembles, more data, etc

3. Summary

Reference

[1] RoI Pooling 系列方法介绍(文末附源码) - 知乎 (zhihu.com)

[2] Selective Search for Object Detection (C++ / Python) | LearnOpenCV

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1231187.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

给大伙讲个笑话:阿里云服务器开了安全组防火墙还是无法访问到服务

铺垫: 某天我在阿里云上买了一个服务器,买完我就通过MobaXterm进行了ssh(这个软件是会保存登录信息的) 故事开始: 过了n天之后我想用这个服务器来部署流媒体服务,咔咔两下就部署好了流媒体服务器&#x…

Android Studio常见问题

Run一直是上次的apk 内存占用太大,导致闪退

ESP32-BLE基础知识

一、存储模式 两种存储模式: 大端存储:低地址存高字节,如将0x1234存成[0x12,0x34]。小端存储:低地址存低字节,如将0x1234存成[0x34,0x12]。 一般来说,我们看到的一些字符串形式的数字都是大端存储形式&a…

单片非晶磁性测量系统磁参量指标

1. 概述 单片非晶磁性测量系统是专用于测量非晶或纳米晶薄片(带)交流磁特性的装置,由精密励磁及测量装置 ( 40 Hz~65 Hz,可定制至400 Hz )、单片磁导计、全自动测量软件组成。使用该装置可在能耗、效率、材料均匀性/一致性、可靠性、整个生命…

庖丁解牛:NIO核心概念与机制详解 06 _ 连网和异步 I/O

文章目录 Pre概述异步 I/OSelectors打开一个 ServerSocketChannel选择键内部循环监听新连接接受新的连接删除处理过的 SelectionKey传入的 I/O回到主循环 Pre 庖丁解牛:NIO核心概念与机制详解 01 庖丁解牛:NIO核心概念与机制详解 02 _ 缓冲区的细节实现…

网络运维与网络安全 学习笔记2023.11.20

网络运维与网络安全 学习笔记 第二十一天 今日目标 交换网路径选择、Eth-Trunk原理、动态Eth-Trunk配置 Eth-Trunk案例实践、MUX VLAN原理、MUX VLAN配置 交换网路径选择 STP的作用 在交换网络中提供冗余/备份路径 提供冗余路径的同时,防止环路的产生 影响同网…

【高级程序设计】Week2-4Week3-1 JavaScript

一、Javascript 1. What is JS 定义A scripting language used for client-side web development.作用 an implementation of the ECMAScript standard defines the syntax/characteristics of the language and a basic set of commonly used objects such as Number, Date …

计算机毕业设计 基于SpringBoot的企业内部网络管理系统的设计与实现 Java实战项目 附源码+文档+视频讲解

博主介绍:✌从事软件开发10年之余,专注于Java技术领域、Python人工智能及数据挖掘、小程序项目开发和Android项目开发等。CSDN、掘金、华为云、InfoQ、阿里云等平台优质作者✌ 🍅文末获取源码联系🍅 👇🏻 精…

Java Web——JS中的BOM

1. Web API概述 Web API 是指浏览器提供的一套接口,这些接口允许开发人员使用 JavaScript(JS)来操作浏览器功能和页面元素。通过 Web API,开发人员可以与浏览器进行交互,以实现更复杂的功能和效果。 1.1. 初识Web AP…

鸿蒙4.0开发笔记之DevEco Studio之配置代码片段快速生成(三)

一、作用 配置代码片段可以让我们在Deveco Studio中进行开发时快速调取常用的代码块、字符串或者某段具有特殊含义的文字。其实现方式类似于调用定义好变量,然而这个变量是存在于Deveco Studio中的,并不会占用项目的资源。 二、配置代码段的方法 1、打…

Go 语言中的map和内存泄漏

map在内存中总是会增长;它不会收缩。因此,如果map导致了一些内存问题,你可以尝试不同的选项,比如强制 Go 重新创建map或使用指针。 在 Go 中使用map时,我们需要了解map增长和收缩的一些重要特性。让我们深入探讨这一点…

Kotlin学习之函数

原文链接 Understanding Kotlin Functions 函数对于编程语言来说是极其重要的一个组成部分,函数可以视为是程序的执行,是真正活的代码,为啥呢?因为运行的时候你必须要执行一个函数,一般从主函数入口,开始一…

flink源码分析之功能组件(一)-metrics

简介 本系列是flink源码分析的第二个系列,上一个《flink源码分析之集群与资源》分析集群与资源,本系列分析功能组件,kubeclient,rpc,心跳,高可用,slotpool,rest,metric,future。其中kubeclient上一个系列介绍过,本系列不在介绍。 本文介绍flink metrics组件,metric…

设计模式-责任链-笔记

动机(Motivation) 在软件构建过程中,一个请求可能被多个对象处理,但是每个请求在运行时只能有个接受者,如果显示指定,将必不可少地带来请求者与接受者的紧耦合。 如何使请求的发送者不需要指定具体的接受…

Halcon Solution Guide I basics(2): Image Acquisition(图像加载)

文章目录 文章专栏前言文章解读文章开头流程图算子介绍案例自主练习读取一张图片读取多张图片 文章专栏 Halcon开发 Halcon学习 练习项目gitee仓库 前言 今天来看Halcon的第二章,图像获取。在第二章之后,后面文章就会提供案例了。到时候我会尽量完成每一…

pnpm : 无法加载文件 E:\Soft\PromSoft\nodejs\node_global\pnpm.ps1,

pnpm : 无法加载文件 E:\Soft\PromSoft\nodejs\node_global\pnpm.ps1,因为在此系统上禁止运行脚本。有关详细信息,请参阅 https:/go.microsoft.com/fwlink/?LinkID135170 中 的 about_Execution_Policies。 所在位置 行:1 字符: 1pnpm -v~~~~ CategoryI…

【Flask使用】全知识md文档,4大部分60页第3篇:状态cookie和session保持

本文的主要内容:flask视图&路由、虚拟环境安装、路由各种定义、状态保持、cookie、session、模板基本使用、过滤器&自定义过滤器、模板代码复用:宏、继承/包含、模板中特有变量和函数、Flask-WTF 表单、CSRF、数据库操作、ORM、Flask-SQLAlchemy…

Android描边外框stroke边线、rotate旋转、circle圆形图的简洁通用方案,基于Glide与ShapeableImageView,Kotlin

Android描边外框stroke边线、rotate旋转、circle圆形图的简洁通用方案,基于Glide与ShapeableImageView,Kotlin 利用ShapeableImageView专门处理圆形和外框边线的特性,通过Glide加载图片装载到ShapeableImageView。注意,因为要描边…

Motion Plan之搜素算法笔记

背景: 16-18年做过一阵子无人驾驶,那时候痴迷于移动规划;然而当时可学习的资料非常少,网上的论文也不算太多。基本就是Darpa的几十篇无人越野几次比赛的文章,基本没有成系统的文章和代码讲解实现。所以对移动规划的认…