VQA评测evaluation代码:gqa / aokvqa / vqav2 / scienceQA

news2024/10/7 12:29:32

VQA评测分多种,这里提几种,代码参考来自lavis和mmpretrain。

一、gqa评测(只有一个answer)

数据集下载及格式:

blip中json地址
图片下载

# gqa格式已重新整理,特点是每个question对应的gt_answers只有一个
[{'image': 'n161313.jpg',
  'gt_answers': ['no'],
  'question': 'Is it overcast?',
  'question_id': 201307251},
 {'image': 'n235859.jpg',
  'gt_answers': ['women'],
  'question': 'Who is wearing the dress?',
  'question_id': 201640614},
   ……]

评测代码:

# 参考:https://github.com/salesforce/LAVIS/blob/main/lavis/tasks/vqa.py
vqa_tool = VQAEval()
acc = []
for res in results:
    pred = res["pred_answer"]
    gt_ans = res["gt_answers"]

    if type(pred) is list:
        pred = pred[0]

    # 如果是生成式语言模型生成答案,会对答案进行对齐处理,如three处理成3。
    # 这里的处理各评测不一,BLIP对pred做了处理。mmpretrain对pred和gt均做了处理。
    # all responses are made lowercase, numbers converted to digits, and punctuation & articles removed.
    if self.inference_method == "generate":
        pred = vqa_tool.processPunctuation(pred)
        pred = vqa_tool.processDigitArticle(pred)

    vqa_acc = 1 if [pred] == gt_ans else 0

    acc.append(vqa_acc)
overall_acc = sum(acc) / len(acc) * 100

对预测结果的处理如下:

class VQAEval:
    def __init__(self,):
        self.contractions = {
            "aint": "ain't",
            "arent": "aren't",
            "cant": "can't",
            "couldve": "could've",
            "couldnt": "couldn't",
            "couldn'tve": "couldn't've",
            "couldnt've": "couldn't've",
            "didnt": "didn't",
            "doesnt": "doesn't",
            "dont": "don't",
            "hadnt": "hadn't",
            "hadnt've": "hadn't've",
            "hadn'tve": "hadn't've",
            "hasnt": "hasn't",
            "havent": "haven't",
            "hed": "he'd",
            "hed've": "he'd've",
            "he'dve": "he'd've",
            "hes": "he's",
            "howd": "how'd",
            "howll": "how'll",
            "hows": "how's",
            "Id've": "I'd've",
            "I'dve": "I'd've",
            "Im": "I'm",
            "Ive": "I've",
            "isnt": "isn't",
            "itd": "it'd",
            "itd've": "it'd've",
            "it'dve": "it'd've",
            "itll": "it'll",
            "let's": "let's",
            "maam": "ma'am",
            "mightnt": "mightn't",
            "mightnt've": "mightn't've",
            "mightn'tve": "mightn't've",
            "mightve": "might've",
            "mustnt": "mustn't",
            "mustve": "must've",
            "neednt": "needn't",
            "notve": "not've",
            "oclock": "o'clock",
            "oughtnt": "oughtn't",
            "ow's'at": "'ow's'at",
            "'ows'at": "'ow's'at",
            "'ow'sat": "'ow's'at",
            "shant": "shan't",
            "shed've": "she'd've",
            "she'dve": "she'd've",
            "she's": "she's",
            "shouldve": "should've",
            "shouldnt": "shouldn't",
            "shouldnt've": "shouldn't've",
            "shouldn'tve": "shouldn't've",
            "somebody'd": "somebodyd",
            "somebodyd've": "somebody'd've",
            "somebody'dve": "somebody'd've",
            "somebodyll": "somebody'll",
            "somebodys": "somebody's",
            "someoned": "someone'd",
            "someoned've": "someone'd've",
            "someone'dve": "someone'd've",
            "someonell": "someone'll",
            "someones": "someone's",
            "somethingd": "something'd",
            "somethingd've": "something'd've",
            "something'dve": "something'd've",
            "somethingll": "something'll",
            "thats": "that's",
            "thered": "there'd",
            "thered've": "there'd've",
            "there'dve": "there'd've",
            "therere": "there're",
            "theres": "there's",
            "theyd": "they'd",
            "theyd've": "they'd've",
            "they'dve": "they'd've",
            "theyll": "they'll",
            "theyre": "they're",
            "theyve": "they've",
            "twas": "'twas",
            "wasnt": "wasn't",
            "wed've": "we'd've",
            "we'dve": "we'd've",
            "weve": "we've",
            "werent": "weren't",
            "whatll": "what'll",
            "whatre": "what're",
            "whats": "what's",
            "whatve": "what've",
            "whens": "when's",
            "whered": "where'd",
            "wheres": "where's",
            "whereve": "where've",
            "whod": "who'd",
            "whod've": "who'd've",
            "who'dve": "who'd've",
            "wholl": "who'll",
            "whos": "who's",
            "whove": "who've",
            "whyll": "why'll",
            "whyre": "why're",
            "whys": "why's",
            "wont": "won't",
            "wouldve": "would've",
            "wouldnt": "wouldn't",
            "wouldnt've": "wouldn't've",
            "wouldn'tve": "wouldn't've",
            "yall": "y'all",
            "yall'll": "y'all'll",
            "y'allll": "y'all'll",
            "yall'd've": "y'all'd've",
            "y'alld've": "y'all'd've",
            "y'all'dve": "y'all'd've",
            "youd": "you'd",
            "youd've": "you'd've",
            "you'dve": "you'd've",
            "youll": "you'll",
            "youre": "you're",
            "youve": "you've",
        }
        self.manualMap = {
            "none": "0",
            "zero": "0",
            "one": "1",
            "two": "2",
            "three": "3",
            "four": "4",
            "five": "5",
            "six": "6",
            "seven": "7",
            "eight": "8",
            "nine": "9",
            "ten": "10",
        }
        self.articles = ["a", "an", "the"]

        self.periodStrip = re.compile("(?!<=\d)(\.)(?!\d)")
        self.commaStrip = re.compile("(\d)(,)(\d)")
        self.punct = [
            ";",
            r"/",
            "[",
            "]",
            '"',
            "{",
            "}",
            "(",
            ")",
            "=",
            "+",
            "\\",
            "_",
            "-",
            ">",
            "<",
            "@",
            "`",
            ",",
            "?",
            "!",
        ]
    def processPunctuation(self, inText):
        outText = inText
        for p in self.punct:
            if (p + " " in inText or " " + p in inText) or (
                re.search(self.commaStrip, inText) != None
            ):
                outText = outText.replace(p, "")
            else:
                outText = outText.replace(p, " ")
        outText = self.periodStrip.sub("", outText, re.UNICODE)
        return outText
    def processDigitArticle(self, inText):
        outText = []
        tempText = inText.lower().split()
        for word in tempText:
            word = self.manualMap.setdefault(word, word)
            if word not in self.articles:
                outText.append(word)
            else:
                pass
        for wordId, word in enumerate(outText):
            if word in self.contractions:
                outText[wordId] = self.contractions[word]
        outText = " ".join(outText)
        return outText

二、aokvqa评测(10个answer,使用普遍)

数据集下载:

json
图片:可以从coco官网下载(2014/2017均可),也可以从aokvqa官网下载

# 如果是生成式,direct_answers为gt;如果是选择,choices和correct_choice_idx为gt。
[{'split': 'train',
  'image_id': 299207,
  'question_id': '22MexNkBPpdZGX6sxbxVBH',
  'question': 'What is the man by the bags awaiting?',
  'choices': ['skateboarder', 'train', 'delivery', 'cab'],
  'correct_choice_idx': 3,
  'direct_answers': ['ride',
   'ride',
   'bus',
   'taxi',
   'travelling',
   'traffic',
   'taxi',
   'cab',
   'cab',
   'his ride'],
  'difficult_direct_answer': False,
  'rationales': ['A train would not be on the street, he would not have luggage waiting for a delivery, and the skateboarder is there and not paying attention to him so a cab is the only possible answer.',
   'He has bags as if he is going someone, and he is on a road waiting for vehicle that can only be moved on the road and is big enough to hold the bags.',
   'He looks to be waiting for a paid ride to pick him up.'],
  'image': 'val2014/COCO_val2014_000000299207.jpg',
  'dataset': 'aokvqa'}]

评测代码:

aokvqa论文中提到,这种评测方式参考《VQA: Visual Question Answering》
在这里插入图片描述
an answer is deemed 100% accurate if at least 3 workers provided that exact answer. Before comparison, all responses are made lowercase, numbers converted to digits, and punctuation & articles removed.

# 参考:https://github.com/salesforce/LAVIS/blob/main/lavis/tasks/vqa.py
# VQAEval()见gqa部分代码
vqa_tool = VQAEval()
acc = []

for res in results:

    pred = res["answer"]
    gt_ans = res["gt_answers"]
    
    if type(pred) is list:
        pred = pred[0]

    # 这里blip是对pred做了处理,最新代码中已删除
    # mmpretrain对pred和gt_ans均做了处理
    # all responses are made lowercase, numbers converted to digits, and punctuation & articles removed.
    if self.inference_method == "generate":
        pred = vqa_tool.processPunctuation(pred)
        pred = vqa_tool.processDigitArticle(pred)
    
    num_match = sum([pred == gt for gt in gt_ans])
    vqa_acc = min(1.0, num_match / 3.0)
    
    acc.append(vqa_acc)
    
accuracy = sum(acc) / len(acc) * 100

三、VQAv2(10个answer)

数据集下载:

  1. json文件下载
  2. 图片:coco2014,可以从coco或vqa官网下载
[{'question_id': 262148000,
  'question': 'Where is he looking?',
  'answer': ['down',
   'down',
   'at table',
   'skateboard',
   'down',
   'table',
   'down',
   'down',
   'down',
   'down'],
  'image': 'val2014/COCO_val2014_000000262148.jpg',
  'dataset': 'vqa'},
 ……]

评测代码(10个answer)

We introduce a new evaluation metric which is robust to inter-human variability in phrasing the answers:
请添加图片描述
In order to be consistent with ‘human accuracies’, machine accuracies are averaged over all 10 choose 9 sets of human annotators.
(这里处理不同,比较复杂,对于一个q,先剔除gt第一个,使pred_a和9个gt计算acc1,再剔除gt第二个,使pred_a和9个gt计算acc2,以此往复,得到10个acc,做平均,即得到这个q的平均acc。具体见代码)

Before evaluating machine generated answers, we do the following processing:
● Making all characters lowercase
● Removing periods except if it occurs as decimal
● Converting number words to digits
● Removing articles (a, an, the)
● Adding apostrophe if a contraction is missing it (e.g., convert “dont” to “don’t”)
● Replacing all punctuation (except apostrophe and colon) with a space character. We do not remove apostrophe because it can incorrectly change possessives to plural, e.g., “girl’s” to “girls” and colons because they often refer to time, e.g., 2:50 pm. In case of comma, no space is inserted if it occurs between digits, e.g., convert 100,978 to 100978. (This processing step is done for ground truth answers as well.)

# 参考https://github.com/salesforce/LAVIS/blob/main/lavis/tasks/vqa.py#L259
vqa_tool = VQAEval()

accQA = []
for pred_ann in resfile:
    resAns = pred_ann['answer']
    if type(resAns) is list:
        resAns = resAns[0]

    resAns = resAns.replace("\n", " ")
    resAns = resAns.replace("\t", " ")
    resAns = resAns.strip()
    resAns = vqa_tool.processPunctuation(resAns)
    resAns = vqa_tool.processDigitArticle(resAns)
    gtAnswers = []
    gtAcc = []
    for ansDic in pred_ann["gt_answers"]:
        gtAnswers.append(vqa_tool.processPunctuation(ansDic))
    for gtAnsDatum in gtAnswers:
        otherGTAns = copy.deepcopy(gtAnswers)
        otherGTAns.remove(gtAnsDatum)
        matchingAns = [item for item in otherGTAns if item == resAns]
        acc = min(1, float(len(matchingAns)) / 3)
        gtAcc.append(acc)
    if gtAcc:
        avgGTAcc = float(sum(gtAcc)) / len(gtAcc)
        accQA.append(avgGTAcc)
    else:
        accQA.append(0)
overall_acc = round(100 * float(sum(accQA)) / len(accQA), 2)

四、ScienceQA(有选项)

数据集下载:

下载地址:https://scienceqa.github.io/

评测方式:

可以参考https://github.com/open-mmlab/mmpretrain/blob/main/mmpretrain/evaluation/metrics/scienceqa.py

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/750953.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

Linux操作系统知识点总结(附VMware、CentOS以及finalshell的安装教程)

1. 计算机的组成部分&#xff1a;输入单元&#xff0c;中央处理器&#xff08;CPU&#xff09;&#xff0c;输出单元。 CPU的种类包括&#xff1a;精简指令集&#xff08;RISC&#xff09;和复杂指令集&#xff08;CISC&#xff09;。 计算机的五大单元包括输入单元、输出单元、…

vue3+element+sortablejs实现table表格 行列动态拖拽

vue3elementsortablejs实现table动态拖拽 1.第一步我们要安装sortablejs依赖2.在我们需要的组件中引入3.完整代码4.效果 1.第一步我们要安装sortablejs依赖 去博客设置页面&#xff0c;选择一款你喜欢的代码片高亮样式&#xff0c;下面展示同样高亮的 代码片. npm install so…

【力扣算法09】之 6. N 字形变换 python

文章目录 问题描述示例1示例2示例3提示 思路分析代码分析完整代码详细分析运行效果截图调用示例运行结果 完结 问题描述 将一个给定字符串 s 根据给定的行数 numRows &#xff0c;以从上往下、从左到右进行 Z 字形排列。 比如输入字符串为 “PAYPALISHIRING” 行数为 3 时&…

组合(力扣)dfs + 回溯 + 剪枝 JAVA

给定两个整数 n 和 k&#xff0c;返回范围 [1, n] 中所有可能的 k 个数的组合。 你可以按 任何顺序 返回答案。 示例 1&#xff1a; 输入&#xff1a;n 4, k 2 输出&#xff1a; [ [2,4], [3,4], [2,3], [1,2], [1,3], [1,4], ] 示例 2&#xff1a; 输入&#xff1a;n 1, …

设计模式之模板模式

1. 模板模式介绍 1、模板模式即模板方法模式自定义了一个操作中的算法骨架&#xff0c;而将步骤延迟到子类中&#xff0c;使得子类可以不改变一个算法的结构&#xff0c;可以自定义该算法的某些特定步骤&#xff1b; 2、父类中提取了公共的部分代码&#xff0c;便于代码复用&am…

rabbitmq延时队列自动解锁库存

使用了最终一致性来解决分布式事务 当order服务出现异常回滚&#xff0c;此时ware服务无法回滚&#xff0c;怎么办&#xff1f; 使用seata全局事务虽然能在order服务出现异常导致回滚时使其他服务的也能同时回滚&#xff0c;但在流量大的情况下是使用加锁的方式&#xff0c;效…

transformer 学习

原理学习: (3条消息) The Illustrated Transformer【译】_于建民的博客-CSDN博客 代码学习: https://github.com/jadore801120/attention-is-all-you-need-pytorch/tree/master/transformer mask学习: (3条消息) NLP 中的Mask全解_mask在自然语言处理代表什么_郝伟博士的…

HTTP原理解析-超详细

作者&#xff1a;20岁爱吃必胜客&#xff08;坤制作人&#xff09;&#xff0c;近十年开发经验, 跨域学习者&#xff0c;目前于海外某世界知名高校就读计算机相关专业。荣誉&#xff1a;阿里云博客专家认证、腾讯开发者社区优质创作者&#xff0c;在CTF省赛校赛多次取得好成绩。…

linux 安装 milvus 和 Attu

效果图 准备 建议使用docker安装&#xff0c;比较简单易操作 查看自己是否安装docker-compose docker-compose --version 如果docker-compose 的版本低于2.0&#xff0c;会报错&#xff0c;报错内容如下&#xff1a; 所以在此之前需要把docker-compose升级到2.0版本 升级d…

Kafka 概述、Filebeat+Kafka+ELK

Kafka 概述、FilebeatKafkaELK 一、为什么需要消息队列&#xff08;MQ&#xff09;1、使用消息队列的好处2、消息队列的两种模式 二、Kafka 定义1、Kafka 简介2、Kafka 的特性3、Kafka 系统架构 三、部署 kafka 集群1.下载安装包2.安装 Kafka3.Kafka 命令行操作 四、Kafka 架构…

解决win11选择打开方式时卡死

解决win11选择打开方式时卡死 问题描述 右键想要打开的文件&#xff0c;选择打开方式&#xff0c;点击在电脑上选择应用&#xff0c;在地址栏输入地址&#xff0c;卡死 解决方法 在桌面底部点击右键&#xff0c;打开“任务管理器” 搜索“选取应用”进程 右键该进程&#…

Java postman+ajax

0目录 1.PostMan 2.实战&#xff08;引入阿贾克斯&#xff09; 1.PostMan 定义 Postman是一个接口测试工具 doPost 和doGet方法 配置xml 测试 PostMan测试 Get 请求 Post请求 测试 新建add.jsp 利用jsp实现post请求 Service方法实现doPost…

实时进度追踪与可视化:Gradio库中的Progress模块详解

❤️觉得内容不错的话&#xff0c;欢迎点赞收藏加关注&#x1f60a;&#x1f60a;&#x1f60a;&#xff0c;后续会继续输入更多优质内容❤️ &#x1f449;有问题欢迎大家加关注私戳或者评论&#xff08;包括但不限于NLP算法相关&#xff0c;linux学习相关&#xff0c;读研读博…

【剑指offer】20. 链表中环的入口结点(java)

文章目录 链表中环的入口结点描述输入描述&#xff1a;返回值描述&#xff1a; 示例1示例2示例3思路完整代码 链表中环的入口结点 描述 给一个长度为n链表&#xff0c;若其中包含环&#xff0c;请找出该链表的环的入口结点&#xff0c;否则&#xff0c;返回null。 数据范围&…

java学习路程之篇五、知识点、变量、标识符、数据类型、Scanner键盘录入

文章目录 1、变量2、标识符3、数据类型4、Scanner键盘录入 1、变量 2、标识符 3、数据类型 4、Scanner键盘录入

twaver——树中选择子网,拓扑中显示子网里面的拓扑

twaver.network.Network.setCurrentSubNetwork ( currentSubNetwork [animate] [finishFunction] ) 将当前子网设置为指定子网&#xff0c;并且可以设置是否有动画效果&#xff0c;而且能指定设置当前子网结束后执行的动作 Parameters: currentSubNetwork twaver.SubNetwork 子…

OSPF(链路状态路由协议)

目录 OSPF&#xff08;链路状态路由协议&#xff09; 动态路由评判标准&#xff1a; 1.选路佳 2.收敛快 3.资源占用&#xff08;越小越好&#xff09; 相同于不同 RIP 和OSPF相同点&#xff1a; RIP 和OSPF不同点&#xff1a; 结构部署&#xff1a;区域规划 OSPF区域划…

4G 网络跟 5G 的区别

4G网络和5G网络是两种不同的移动通信技术&#xff0c;它们在数据传输速度、延迟、连接密度和网络容量等方面存在一些区别。以下是它们之间的主要区别&#xff1a; 1. 速度&#xff1a;5G网络的速度比4G网络更快。5G网络具备更广的频段和更高的频率&#xff0c;能够提供更大的带…

上位机一般的开发工具?

上位机开发工具是用于开发和构建上位机应用程序的软件工具。它们提供了一系列功能和资源&#xff0c;帮助开发人员设计、编写和调试上位机应用程序。以下是一些常见的上位机开发工具&#xff1a;Visual Studio&#xff1a;作为一种集成开发环境&#xff08;IDE&#xff09;&…

shardingsphere mybatisplus properties和yml配置实现

shardingsphere mybatisplus properties和yml配置实现 目录结构 model package com.oujiong.entity; import com.baomidou.mybatisplus.annotation.TableName; import lombok.Data; import java.util.Date;/*** user表*/ TableName("user") Data public class Use…