Boosting Crowd Counting via Multifaceted Attention之人群密度估计实践

news2025/1/12 1:47:40

这周闲来无事,看到一篇前不久刚发表的文章,是做密集人群密度估计的,这块我之前虽然也做过,但是主要是基于检测的方式实现的,这里提出来的方法还是比较有意思的,就拿来实践一下。

论文在这里,感兴趣可以看下。

可以看到还是很有意思的。

这里使用的是jhu_crowd_v2.0数据集,如下:

以train为例如下所示:

images目录如下所示:

gt目录如下所示:

实例标注数据如下:

166 228 22 27 1 0
414 218 11 15 1 0
541 232 14 14 1 0
353 213 11 15 1 0
629 222 14 14 1 0
497 243 39 43 1 0
468 222 11 15 1 0
448 227 11 15 1 0
737 220 39 43 1 0
188 228 33 30 1 0
72 198 22 27 1 0
371 214 11 15 1 0
362 242 24 32 1 0
606 260 39 43 1 0
74 228 22 27 1 0
597 226 14 14 1 0
576 213 14 14 1 0

数据集详情如下:


This file contains information about the JHU-CROWD++ (v2.0) dataset. 

-----------------------------------------------------------------------------------------------------
INTRODUCTION
-----------------------------------------------------------------------------------------------------

JHU-CROWD++ is a comprehensive dataset with 4,372 images and 1.51 million annotations. In comparison 
to existing datasets, the proposed dataset is collected under a variety of diverse scenarios and 
environmental conditions. In addition, the dataset provides comparatively richer set of annotations 
like dots, approximate bounding boxes, blur levels, etc.

-----------------------------------------------------------------------------------------------------
DIRECTORY INFO
-----------------------------------------------------------------------------------------------------
1. The dataset directory contains 3 sub-directories: train, val and test.

2. Each of these contain 2 sub-directories (images, gt) and a file "image_labels.txt".

3. The "images" directory contains images and the "gt" directory contains ground-truth files 
   corresponding to the images in the images directory.

4. The number of samples in train, val and test split are 2272, 500, 1600 respectively.

-----------------------------------------------------------------------------------------------------
GROUND-TRUTH ANNOTATIONS: "HEAD-LEVEL"
-----------------------------------------------------------------------------------------------------
1. Each ground-truth file in the "gt" directory contains "space" separated values with each row 
   inidacting x,y,w,h,o,b 

2. x,y indicate the head location.

3. w,h indicate approximate width and height of the head.

4. o indicates occlusion-level and it can take 3 possible values: 1,2,3. 
   o=1 indicates "visible"
   o=2 indicates "partial-occlusion"
   o=3 indicates "full-occlusion"
   
5. b indicates blur-level and it can take 2 possible values: 0,1. 
   b=0 indicates no-blur 
   b=1 indicates blur
   
-----------------------------------------------------------------------------------------------------
GROUND-TRUTH ANNOTATIONS: "IMAGE-LEVEL"
-----------------------------------------------------------------------------------------------------
1. Each split in the dataset contains a file "image_labels.txt". This file contains image-level labels.

2. The values in the file are comma separated and each row indicates: 
    "filename, total-count, scene-type, weather-condition, distractor"
    
3. total-count indicates the total number of people in the image

4. scene-type is an image-level label describing the scene

5. weather-condition indicates the weather-degradation in the image and can take 4 values: 0,1,2,3
   weather-condition=0 indicates "no weather degradation"
   weather-condition=1 indicates "fog/haze"
   weather-condition=2 indicates "rain"
   weather-condition=3 indicates "snow"
   
6. distractor indicates if the image is a distractor. It can take 2 values: 0,1
   distractor=0 indicates "not a distractor"
   distractor=1 indicates "distractor"

-----------------------------------------------------------------------------------------------------
CITATION
-----------------------------------------------------------------------------------------------------   

If you find this dataset useful, please consider citing the following work:

@inproceedings{sindagi2019pushing,
title={Pushing the frontiers of unconstrained crowd counting: New dataset and benchmark method},
author={Sindagi, Vishwanath A and Yasarla, Rajeev and Patel, Vishal M},
booktitle={Proceedings of the IEEE International Conference on Computer Vision},
pages={1221--1231},
year={2019}
}

@article{sindagi2020jhu-crowd,
title={JHU-CROWD++: Large-Scale Crowd Counting Dataset and A Benchmark Method},
author={Sindagi, Vishwanath A and Yasarla, Rajeev and Patel, Vishal M},
journal={Tech Report},
year={2020}
}


-----------------------------------------------------------------------------------------------------
LICENSE
-----------------------------------------------------------------------------------------------------   

This dataset is for academic and non-commercial uses (such as academic research, teaching, scientific 
publications, or personal experimentation). All images of the JHU-CROWD++ are obtained from the Internet 
which are not property of VIU-Lab, The Johns Hopkins University (JHU). please contact us if you find 
yourself or your personal belongings in the data, and we (VIU-Lab) will immediately remove the concerned
 images from our servers. By downloading and/or using the dataset, you acknowledge that you have read, 
 understand, and agree to be bound by the following terms and conditions.

1. All images are obtained from the Internet. We are not responsible for the content/meaning of these 
   images.
2. Specific care has been taken to reduce labeling errors. Nevertheless, we do not accept any responsibility 
   for errors or omissions.
3. You agree not to reproduce, duplicate, copy, sell, trade, resell or exploit for any commercial purposes, 
   any portion of the images and any portion of derived data.
4. You agree not to use the dataset or any derivative work for commercial purposes as, for example, licensing 
   or selling the data, or using the data with a purpose to procure a commercial gain.
5. All rights not expressly granted to you are reserved by us (VIU-Lab, JHU).
6. You acknowledge that the dataset is a valuable scientific resource and agree to appropriately reference 
   the following papers in any publication making use of the Data & Software:
   Sindagi et al., "Pushing the frontiers of unconstrained crowd counting: New dataset and benchmark method", 
   ICCV 2019.
   Sindagi et al., "JHU-CROWD++: Large-Scale Crowd Counting Dataset and A Benchmark Method", Arxiv 2020.


首先处理原始数据集如下:

处理完成结果如下:

之后就可以启动模型训练了,因为没有开源出来可用的预训练权重,所以这里是只能自己训练,如下:

from utils.regression_trainer_cosine import RegTrainer
import argparse
import os
import torch
args = None

def parse_args():
    parser = argparse.ArgumentParser(description='Train ')
    parser.add_argument('--model-name', default='vgg19_trans', help='the name of the model')
    parser.add_argument('--data-dir', default='./JHU-Train-Val-Test',help='training data directory')
    parser.add_argument('--save-dir', default='./model',help='directory to save models.')
    parser.add_argument('--save-all', type=bool, default=False,help='whether to save all best model')
    parser.add_argument('--lr', type=float, default=5*1e-6,help='the initial learning rate')
    parser.add_argument('--weight-decay', type=float, default=1e-5,help='the weight decay')
    parser.add_argument('--resume', default='',help='the path of resume training model')
    parser.add_argument('--max-model-num', type=int, default=1,help='max models num to save ')
    parser.add_argument('--max-epoch', type=int, default=120,help='max training epoch')
    parser.add_argument('--val-epoch', type=int, default=5,help='the num of steps to log training information')
    parser.add_argument('--val-start', type=int, default=60,help='the epoch start to val')
    parser.add_argument('--batch-size', type=int, default=8,help='train batch size')
    parser.add_argument('--device', default='0', help='assign device')
    parser.add_argument('--num-workers', type=int, default=0,help='the num of training process')
    parser.add_argument('--is-gray', type=bool, default=False,help='whether the input image is gray')
    parser.add_argument('--crop-size', type=int, default=512,help='the crop size of the train image')
    parser.add_argument('--downsample-ratio', type=int, default=16,help='downsample ratio')
    parser.add_argument('--use-background', type=bool, default=True,help='whether to use background modelling')
    parser.add_argument('--sigma', type=float, default=8.0, help='sigma for likelihood')
    parser.add_argument('--background-ratio', type=float, default=0.15,help='background ratio')
    args = parser.parse_args()
    return args


if __name__ == '__main__':
    args = parse_args()
    torch.backends.cudnn.benchmark = True
    os.environ['CUDA_VISIBLE_DEVICES'] = args.device.strip()  # set vis gpu
    trainer = RegTrainer(args)
    trainer.setup()
    trainer.train()

训练时间还是很长的,这里等待训练结束后,结果文件如下所示:

日志输出如下所示:

02-14 12:25:04 using 1 gpus
02-14 12:25:10 -----Epoch 0/119-----
02-14 12:27:03 Epoch 0 Train, Loss: 2914.89, MSE: 202.49 MAE: 81.49, Cost 112.6 sec
02-14 12:27:03 -----Epoch 1/119-----
02-14 12:28:45 Epoch 1 Train, Loss: 2691.07, MSE: 128.28 MAE: 44.69, Cost 102.0 sec
02-14 12:28:46 -----Epoch 2/119-----
02-14 12:30:28 Epoch 2 Train, Loss: 2687.40, MSE: 140.69 MAE: 43.30, Cost 102.5 sec
02-14 12:30:29 -----Epoch 3/119-----
02-14 12:32:11 Epoch 3 Train, Loss: 2688.95, MSE: 208.25 MAE: 45.59, Cost 102.1 sec
02-14 12:32:12 -----Epoch 4/119-----
02-14 12:33:55 Epoch 4 Train, Loss: 2682.65, MSE: 163.37 MAE: 39.28, Cost 103.2 sec
02-14 12:33:55 -----Epoch 5/119-----
02-14 12:35:37 Epoch 5 Train, Loss: 2677.02, MSE: 103.38 MAE: 33.43, Cost 102.0 sec
02-14 12:35:38 -----Epoch 6/119-----
02-14 12:37:15 Epoch 6 Train, Loss: 2677.04, MSE: 108.78 MAE: 34.17, Cost 96.5 sec
02-14 12:37:15 -----Epoch 7/119-----
02-14 12:38:58 Epoch 7 Train, Loss: 2676.39, MSE: 97.53 MAE: 33.18, Cost 103.1 sec
02-14 12:38:59 -----Epoch 8/119-----
02-14 12:40:41 Epoch 8 Train, Loss: 2675.40, MSE: 100.08 MAE: 31.75, Cost 102.4 sec
02-14 12:40:42 -----Epoch 9/119-----
02-14 12:42:24 Epoch 9 Train, Loss: 2676.26, MSE: 115.38 MAE: 33.94, Cost 101.8 sec
02-14 12:42:24 -----Epoch 10/119-----
02-14 12:44:07 Epoch 10 Train, Loss: 2674.91, MSE: 107.85 MAE: 31.79, Cost 102.7 sec
02-14 12:44:08 -----Epoch 11/119-----
02-14 12:45:49 Epoch 11 Train, Loss: 2675.62, MSE: 128.87 MAE: 31.46, Cost 101.5 sec
02-14 12:45:50 -----Epoch 12/119-----
02-14 12:47:32 Epoch 12 Train, Loss: 2672.00, MSE: 90.30 MAE: 27.87, Cost 102.0 sec
02-14 12:47:32 -----Epoch 13/119-----
02-14 12:49:14 Epoch 13 Train, Loss: 2671.85, MSE: 93.11 MAE: 28.77, Cost 101.6 sec
02-14 12:49:14 -----Epoch 14/119-----
02-14 12:50:57 Epoch 14 Train, Loss: 2674.60, MSE: 111.70 MAE: 31.27, Cost 102.4 sec

为了直观可视化分析,这里我对其结果日志进行可视化展示如下所示:

直观来看整体训练还不错。

接下来绘制密度热力图如下:

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/373681.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

chatGPT模型原理

文章目录简介BertGPT 初代GPT-2GPT-3chatGPT开源ChatGPT简介 openai 的 GPT 大模型的发展历程。 Bert 2018年,自然语言处理 NLP 领域也步入了 LLM 时代,谷歌出品的 Bert 模型横空出世,碾压了以往的所有模型,直接在各种NLP的建模…

Java中的反射使用

1、获取Class对象的三种方式 1、对象调用Object类的getClass()方法(对象.getClass()) 2、调用类的class属性(类名.class) 3、调用Class类的静态方法(Class.forName(“包名.类名”))常用 Student类 package…

Xcode Developer Document 开发者文档

总目录 iOS开发笔记目录 从一无所知到入门 文章目录IntroDeveloper Documentation 打开方式菜单栏点击 | 快捷键方式另一种打开方式Intro 2016年我在学校学Java的时候,要查某个Java类/方法的用法还得自己手动下载一种.chm格式的开发文档文件&#xff0c…

python爬虫常见错误

python爬虫常见错误前言python常见错误1. AttributeError: WebDriver object has no attribute find_element_by_id1. 问题描述2. 解决办法2. selenium:DeprecationWarning: executable_path has been deprecated, please pass in1. 问题描述2. 解决办法3. 下载了包…

4、算法MATLAB---认识矩阵

认识矩阵1、矩阵定义和基本运算1.1 赋值运算符:1.2 等号运算符:1.3 空矩阵1.4 一行一列矩阵1.5 行矩阵(元素用空格或逗号分隔)1.6 列矩阵(分号表示换行)1.7 m行n列的矩阵:行值用逗号间隔&#x…

SPI总线设备驱动模型

SPI总线设备驱动模型 文章目录SPI总线设备驱动模型参考资料:一、平台总线设备驱动模型二、 数据结构2.1 SPI控制器数据结构2.2 SPI设备数据结构2.3 SPI设备驱动三、 SPI驱动框架3.1 SPI控制器驱动程序3.2 SPI设备驱动程序致谢参考资料: 内核头文件&…

角角的Qt自学日记:Qt的安装

目录 2. 打开下载器,输入账号和密码,然后单击下一步: 3. 分别单击2个单选框,其它不用管,直接单击下一步: 4. 先设置一下安装目录,因为现在Qt基本都好几个g,建议找个内存够的盘。然…

尝试用程序计算Π(3.141592653......)

文章目录1. π\piπ2. 用微积分来计算π\piπ2.1 原理2.2 代码2.3 结果2.4 分析1. π\piπ π\piπ的重要性或者地位不用多说,有时候还是很好奇,精确地π\piπ值是怎么计算出来的。研究π\piπ的精确计算应该是很多数学家计算机科学家努力的方向&#xf…

【老卫搬砖】034期:HarmonyOS 3.1 Beta 1初体验,我在本地模拟器里面刷短视频

今天啊打开这个DevEco Studio的话,已经提示有3.1Beta1版本的一个更新啊。然后看一下它的一些特性。本文也演示了如何在本地模拟器里面运行HarmonyOS版短视频。 主要特性 新特性包括: Added support for Windows 11 64-bit and macOS 13.x OSs, as well…

vue中render函数的作用和参数(vue2中render函数用法)

render 函数是 Vue2.x 新增的一个函数、主要用来提升节点的性能,它是基于 JavaScript 计算。使用 Render 函数将 Template 里面的节点解析成虚拟的 Dom 。Vue 推荐在绝大多数情况下使用模板来创建 HTML。然而在一些场景中,需要 JavaScript 的完全编程能力…

RK3568平台开发系列讲解(驱动基础篇)GIC v3中断控制器

🚀返回专栏总目录 文章目录 一、什么是GIC二、GIC v3中断类型三、GIC v3基本结构3.1、Distributor3.2、CPU interface简介3.3、Redistributor简介3.4、ITS(Interrupt translation service)4、中断状态和处理流程沉淀、分享、成长,让自己和他人都能有所收获!😄 📢ARM多核…

在线文档技术-编辑器篇

这是在线文档技术的第二篇文章,本文将对目前市面上所有的主流编辑器和在线文档进行一次深入的剖析和研究,从而使大家对在线文档技术有更深入的了解,也让更多人能够参与其开发与设计中来。 注意:出于对主流文档产品的尊重&#xf…

【Linux环境配置】7. Linux部署code-server

安装 code-server 两种方法,一种是在线安装,另一种是本地安装。因为主机访问github可能会报443错误,因此这里我推荐使用本地安装方法! 本地安装方法 进入github,搜索code-server找到项目地址:https://gi…

链表(一):移除链表元素、设计链表等力扣经典链表题目

203.移除链表元素相关题目链接:力扣 - 移除链表元素题目重现给你一个链表的头节点 head 和一个整数 val ,请你删除链表中所有满足 Node.val val 的节点,并返回 新的头节点 。思路链表的删除操作如上图所示,我们需要先找到要删除的…

不需要高深技术,只需要Python:创建一个可定制的HTTP服务器!

目录 1、编写服务端代码,命名为httpserver.py文件。 2、编写网页htmlcss文件,命名为index.html和style.css文件。 3、复制htmlcss到服务端py文件同一文件夹下。 4、运行服务端程序。 5、浏览器中输入localhost:8080显示如下: 要编写一个简单的能发布…

UML类图中的类图、接口图、关联、聚合、依赖、组合概念的解释

文章目录UML类图依赖和关联的主要区别UML类图 类&#xff1a;类有三层结构 第一层&#xff1a;类的名字第二层&#xff1a;类的属性第三层&#xff1a;类的方法 接口&#xff1a;接口跟类相似&#xff0c;不过多了一个<<interface>>来表示它是一个接口 第一层&a…

华为OD机试用Python实现 -【统一限载货物数最小值】(2023-Q1 新题)

华为OD机试题 华为OD机试300题大纲统一限载货物数最小值题目描述输入描述输出描述说明示例一输入输出说明示例二输入输出说明Python 代码实现算法逻辑华为OD机试300题大纲 参加华为od机试,一定要注意不要完全背诵代码,需要理解之后模仿写出,通过率才会高。 华为 OD 清单查…

社畜大学生的Python之pandas学习笔记,保姆入门级教学

接上期&#xff0c;上篇介绍了 NumPy&#xff0c;本篇介绍 pandas。 目录 pandas 入门pandas 的数据结构介绍基本功能汇总和计算描述统计处理缺失数据层次化索引 pandas 入门 Pandas 是基于 Numpy 构建的&#xff0c;让以 NumPy 为中心的应用变的更加简单。 Pandas是基于Numpy…

【20230225】【剑指1】分治算法(中等)

1.重建二叉树class Solution { public:TreeNode* traversal(vector<int>& preorder,vector<int>& inorder){if(preorder.size()0) return NULL;int rootValuepreorder.front();TreeNode* rootnew TreeNode(rootValue);//int rootValuepreorder[0];if(preo…

Java学习--多线程(等待唤醒机制)生产者消费者

3.生产者消费者 3.1生产者和消费者模式概述【应用】 概述 生产者消费者模式是一个十分经典的多线程协作的模式&#xff0c;弄懂生产者消费者问题能够让我们对多线程编程的理解更加深刻。 所谓生产者消费者问题&#xff0c;实际上主要是包含了两类线程&#xff1a; ​ 一类是生…