基于词袋模型的场景识别(附源代码!!!)

news2024/10/7 11:32:16

目录

  • 1. 任务要求
  • 2. 数据集
  • 3. 实现算法
  • 4. 实验结果
  • 5. 源代码


1. 任务要求

  • 输入:给定测试集图片,预测在15个场景中的类别。
  • 任务
    • 实现Tiny images representation。
    • 实现最近邻分类器nearest neighbor classifier。
    • 实现SIFT特征词袋表示
  • 输出
    • 针对Tiny images representation 和SIFT 词袋表示,报告每个类别的准确度和平均准确度。
    • 对这两种方案,对正确和错误的识别结果挑出示例进行可视化。
    • 探索不同的参数设置对结果的影响,总结成表格。
    • 通过实验讨论词汇量的大小对识别分类结果的影响,比如哪个类别的识别准确率最高/最低,原因是什么。

2. 数据集

http://www.cad.zju.edu.cn/home/gfzhang/course/cv/Homework3.zip


3. 实现算法

  1. Tiny images representation
  2. SIFT特征词袋表示
  3. 分类算法:HOG、NNS和SVM。

4. 实验结果

场景识别与词袋模型

在这里插入图片描述


5. 源代码

  • 完整项目地址:https://github.com/Jurio0304/Scene_Recognition_with_Bag_of_Words,参考引用麻烦点点star,非常感谢!
  • main.py源代码如下:
#!/usr/bin/python
import sys

import numpy as np
import os
import argparse

from create_results import create_results
from get_image_path import get_image_paths
from get_tiny_images import get_tiny_images
from build_vocabulary import build_vocabulary, build_vocabulary_sift
from get_bags_of_words import get_bags_of_words, get_bags_of_words_sift
from svm_classify import svm_classify
from nearest_neighbor_classify import nearest_neighbor_classify


# from create_results_webpage import create_results_webpage


def scene_recognition(feature='chance_feature', feature_detector='sift', classifier='chance_classifier'):
    """
    For this project, you will need to report performance for three
    combinations of features / classifiers. We recommend that you code them in
    this order:
        1) Tiny image features and nearest neighbor classifier
        2) Bag of word features and nearest neighbor classifier
        3) Bag of word features and linear SVM classifier
    The starter code is initialized to 'chance_' just so that the starter
    code does not crash when run unmodified and you can get a preview of how
    results are presented.

    Interpreting your performance with 100 training examples per category:
     accuracy  =   0 -> Something is broken.
     accuracy ~= .07 -> Your performance is equal to chance.
                        Something is broken or you ran the starter code unchanged.
     accuracy ~= .20 -> Rough performance with tiny images and nearest
                        neighbor classifier. Performance goes up a few
                        percentage points with K-NN instead of 1-NN.
     accuracy ~= .20 -> Rough performance with tiny images and linear SVM
                        classifier. Although the accuracy is about the same as
                        nearest neighbor, the confusion matrix is very different.
     accuracy ~= .40 -> Rough performance with bag of word and nearest
                        neighbor classifier. Can reach .60 with K-NN and
                        different distance metrics.
     accuracy ~= .50 -> You've gotten things roughly correct with bag of
                        word and a linear SVM classifier.
     accuracy >= .70 -> You've also tuned your parameters well. E.g. number
                        of clusters, SVM regularization, number of patches
                        sampled when building vocabulary, size and step for
                        dense features.
     accuracy >= .80 -> You've added in spatial information somehow or you've
                        added additional, complementary image features. This
                        represents state of the art in Lazebnik et al 2006.
     accuracy >= .85 -> You've done extremely well. This is the state of the
                        art in the 2010 SUN database paper from fusing many
                        features. Don't trust this number unless you actually
                        measure many random splits.
     accuracy >= .90 -> You used modern deep features trained on much larger
                        image databases.
     accuracy >= .96 -> You can beat a human at this task. This isn't a
                        realistic number. Some accuracy calculation is broken
                        or your classifier is cheating and seeing the test
                        labels.
    """

    # Step 0: Set up parameters, category list, and image paths.
    FEATURE = feature
    CLASSIFIER = classifier

    # This is the path the script will look at to load images from.
    data_path = './data/'

    # This is the list of categories / directories to use. The categories are
    # somewhat sorted by similarity so that the confusion matrix looks more
    # structured (indoor and then urban and then rural).
    categories = ['Kitchen', 'Store', 'Bedroom', 'LivingRoom', 'Office',
                  'Industrial', 'Suburb', 'InsideCity', 'TallBuilding', 'Street',
                  'Highway', 'OpenCountry', 'Coast', 'Mountain', 'Forest']

    # This list of shortened category names is used later for visualization.
    abbr_categories = ['Kit', 'Sto', 'Bed', 'Liv', 'Off', 'Ind', 'Sub',
                       'Cty', 'Bld', 'St', 'HW', 'OC', 'Cst', 'Mnt', 'For']

    # Number of training examples per category to use. Max is 100.
    # For simplicity, we assume this is the number of test cases per category as well.
    num_train_per_cat = 100

    # This function returns string arrays containing the file path for each train and test image
    print('Getting paths and labels for all train and test data.')

    train_image_paths, test_image_paths, train_labels, test_labels = \
        get_image_paths(data_path, categories, num_train_per_cat)
    #   train_image_paths  1500x1   list
    #   test_image_paths   1500x1   list
    #   train_labels       1500x1   list
    #   test_labels        1500x1   list

    ############################################################################
    # Step 1: Represent each image with the appropriate feature
    # Each function to construct features should return an N x d matrix, where
    # N is the number of paths passed to the function and d is the
    # dimensionality of each image representation. See the starter code for
    # each function for more details.
    ############################################################################

    print('Using %s representation for images.' % FEATURE)

    if FEATURE.lower() == 'tiny_image':
        print('Loading tiny images...')
        h, w = 16, 32

        train_image_feats = get_tiny_images(train_image_paths, h_size=h, w_size=w)
        test_image_feats = get_tiny_images(test_image_paths, h_size=h, w_size=w)
        print('Tiny images loaded.')

    elif FEATURE.lower() == 'bag_of_words':
        # Because building the vocabulary takes a long time, we save the generated
        # vocab to a file and re-load it each time to make testing faster.

        # Larger values will work better (to a point), but are slower to compute
        vocab_size = 50
        if not os.path.isfile(f'{feature_detector}_vocab_{vocab_size}.npy'):
            print('No existing visual word vocabulary found. Computing one from training images.')

            if feature_detector.lower() == 'sift':
                vocab = build_vocabulary_sift(train_image_paths, vocab_size)
            else:
                vocab = build_vocabulary(train_image_paths, vocab_size)

            np.save(f'{feature_detector}_vocab_{vocab_size}.npy', vocab)

        if feature_detector.lower() == 'sift':
            train_image_feats = get_bags_of_words_sift(train_image_paths, vocab_size, feature_detector)
            test_image_feats = get_bags_of_words_sift(test_image_paths, vocab_size, feature_detector)
        else:
            train_image_feats = get_bags_of_words(train_image_paths, vocab_size)
            test_image_feats = get_bags_of_words(test_image_paths, vocab_size)

    elif FEATURE.lower() == 'chance_feature':
        train_image_feats = []
        test_image_feats = []

    else:
        raise ValueError('Unknown feature type!')

    ############################################################################
    # Step 2: Classify each test image by training and using the appropriate classifier
    # Each function to classify test features will return an N x 1 string array,
    # where N is the number of test cases and each entry is a string indicating
    # the predicted category for each test image. Each entry in
    # 'predicted_categories' must be one of the 15 strings in 'categories',
    # 'train_labels', and 'test_labels'. See the starter code for each function
    # for more details.
    ############################################################################

    print('Using %s classifier to predict test set categories.' % CLASSIFIER)

    if CLASSIFIER.lower() == 'nearest_neighbor':
        predicted_categories = nearest_neighbor_classify(train_image_feats, train_labels, test_image_feats)

    elif CLASSIFIER.lower() == 'support_vector_machine':
        predicted_categories = svm_classify(train_image_feats, train_labels, test_image_feats)

    elif CLASSIFIER.lower() == 'chance_classifier':
        # The placeholder classifier simply predicts a random category for every test case
        random_permutation = np.random.permutation(len(test_labels))
        predicted_categories = [test_labels[i] for i in random_permutation]

    else:
        raise ValueError('Unknown classifier type')

    ############################################################################
    # Step 3: Build a confusion matrix and score the recognition system
    # You do not need to code anything in this section.

    # If we wanted to evaluate our recognition method properly we would train
    # and test on many random splits of the data. You are not required to do so
    # for this project.

    # This function will recreate results_webpage/index.html and various image
    # thumbnails each time it is called. View the webpage to help interpret
    # your classifier performance. Where is it making mistakes? Are the
    # confusions reasonable?
    ############################################################################
    result_path = f'results/{feature}_{classifier}'
    if not os.path.isdir('./results'):
        print('Making results directory.')
        os.mkdir('./results')
    if not os.path.isdir(result_path):
        os.mkdir(result_path)

    create_results(train_image_paths, test_image_paths, train_labels, test_labels, categories, abbr_categories,
                   predicted_categories, result_path)


if __name__ == '__main__':
    '''
    Command line usage:
    python main.py [-f | --feature <representation to use>] [-c | --classifier <classifier method>]
    
    '''
    # create the command line parser
    parser = argparse.ArgumentParser()

    parser.add_argument('-f', '--feature', default='bag_of_words',
                        help='Either chance_feature, tiny_image, or bag_of_words')
    parser.add_argument('-fd', '--feature_detector', default='sift',
                        help='Either sift or hog')
    parser.add_argument('-c', '--classifier', default='support_vector_machine',
                        help='Either chance_classifier, nearest_neighbor, or support_vector_machine')

    args = parser.parse_args()

    # RUN THE MAIN SCRIPT
    scene_recognition(args.feature, args.feature_detector, args.classifier)

    sys.exit(0)

创作不易,麻烦点点赞和关注咯!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1494442.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

diffusion model (扩散模型)原理

扩散模型分为正向过程和反向过程。 正向过程为一点点在图片上添加噪声的过程&#xff0c;反向过程为去噪声的过程。 图片的生成就是反向过程&#xff0c;给一张高斯噪声图片&#xff0c;逐步去噪生成图片。 扩散模型和VAE的区别&#xff0c; VAE是一步到位的&#xff08;通过…

中文版国产Figma简单好上手

在过去的两年里&#xff0c;国内外协同办公室发展迅速。一方面&#xff0c;它是由突如其来的疫情推动的&#xff0c;另一方面&#xff0c;它是科学技术不断进步的必然结果。在市场的推动下&#xff0c;市场上出现了越来越多的协同办公软件&#xff0c;使工作场所的工作更加高效…

Java开发避坑指南,手把手教你写Java项目文档

前言 作为一个有丰富经验的微服务系统架构师&#xff0c;经常有人问我&#xff0c;“应该选择RabbitMQ还是Kafka&#xff1f;” 基于某些原因&#xff0c; 许多开发者会把这两种技术当做等价的来看待。的确&#xff0c;在一些案例场景下选择RabbitMQ还是Kafka没什么差别&…

2.Rust变量

变量的声明 let关键字 在Rust中变量必须要先声明才能使用&#xff0c;let关键字用于声明变量并将一个值绑定到该变量上。如下: fn main() {let var_name:i32 123123;println!("{}",var_name) //println! 是一个宏&#xff08;macros&#xff09;&#xff0c;可以…

遇见未来的你——陪伴是最长情的告白

目录 一、背景介绍二、思路&方案三、过程1.家庭中彼此的陪伴最长情2.事业中与合伙人与同事与朋友与产品的陪伴最长情3.人生中与计划与落实与啊哈的陪伴最长情4.肉体与灵魂分分合合的体验 四、总结 一、背景介绍 人有时候一转身就是一辈子&#xff0c;所以珍惜转身的每一个…

如何计算搭建光伏电站需要多少成本?

光伏电站&#xff0c;又称太阳能电站&#xff0c;是一种利用太阳能发电的电力系统。随着全球对可再生能源的需求日益增加&#xff0c;光伏电站的建设变得越来越普遍。然而&#xff0c;在投资光伏电站之前&#xff0c;了解其建设成本是非常重要的。本文将介绍如何计算搭建光伏电…

96、C++ 性能优化一览

在对 C++ 版本的 resnet50 经过大约 5 个版本的优化之后,性能也基本达到了预期。至少利用手写的 resnet50 在 CPU 上推理一张图片感觉不到卡顿了。 下面对这几个版本的性能优化做一个总结。 初始版本1 第一版本的 C++ 代码,并没有考虑性能问题,仅仅是想按照手写 resnet50 …

【学习】torch.nn.CrossEntropyLoss交叉熵损失函数

交叉熵损失函数torch.nn.CrossEntropyLoss 交叉熵主要是用来判定实际的输出与期望的输出的接近程度&#xff0c;为什么这么说呢&#xff0c;举个例子&#xff1a; 在做分类的训练的时候&#xff0c;如果一个样本属于第K类&#xff0c;那么这个类别所对应的输出节点的输出值应…

腾讯云服务器99元一年是真的吗?只要61元!

腾讯云服务器99元一年是真的吗&#xff1f;假的&#xff0c;不要99&#xff0c;只要61元&#xff01;又降价了&#xff01;腾讯云服务器多少钱一年&#xff1f;61元一年起&#xff0c;2核2G3M配置&#xff0c;腾讯云2核4G5M轻量应用服务器165元一年、756元3年&#xff0c;4核16…

腾讯云服务器99元一年是真的吗?又降价,现在只要61元

腾讯云服务器99元一年是真的吗&#xff1f;又降价&#xff0c;现在只要61元。腾讯云服务器多少钱一年&#xff1f;61元一年起&#xff0c;2核2G3M配置&#xff0c;腾讯云2核4G5M轻量应用服务器165元一年、756元3年&#xff0c;4核16G12M服务器32元1个月、312元一年&#xff0c;…

【好书推荐-第九期】Sora核心技术相关书籍《扩散模型:从原理到实战》与《GPT 图解:大模型是怎样构建的》:Sora的两大核心技术,都藏在这两本书里!

&#x1f60e; 作者介绍&#xff1a;我是程序员洲洲&#xff0c;一个热爱写作的非著名程序员。CSDN全栈优质领域创作者、华为云博客社区云享专家、阿里云博客社区专家博主、前后端开发、人工智能研究生。公众号&#xff1a;洲与AI。 &#x1f388; 本文专栏&#xff1a;本文收录…

底层day1作业

思维导图&#xff1a; 一.总结keil5下载代码和编译代码需要注意的事项 当使用Keil5下载代码和编译代码时&#xff0c;有一些需要注意的事项。以下是总结&#xff1a; 1. 确保正确配置目标 2. 配置编译器选项。 3. 确保正确配置连接器脚本 4. 检查编译错误和警告。 5. …

生成式模型实战—小小案例(python)

实战之前&#xff0c;环境需要已经搭建好。如果环境没有搭建好&#xff0c;可以参考PyTorch2.0 环境搭建详细步骤(Nvidia显卡)-CSDN博客 接下来&#xff0c;我们今天的学习之旅~ Step1. 安装类库 transformers pip install transformers Step2. 代码敬上 from transformer…

使用VS Code运行Java SpringBoot项目

本文并不详细讲 Java 项目启动前需要哪些配置&#xff0c;本文主要受众是平时用惯了 Idea 的 Java 程序员&#xff0c;仅讲解如何用 VS Code 启动服务 前提条件 Jdk、Maven、Nacos、Seata、TDengine等该配置的配置&#xff0c;该启动的启动&#xff0c;就你平时用 Idea 启动项…

力扣hot---岛屿数量

思路dfs&#xff1a; 首先通过两层for循环遍历每一个点&#xff0c;如果这个点为0或者2&#xff08;这个2是什么呢&#xff1f;是在遍历该点以及该点连成的这一片区域中&#xff0c;因为通过深度优先搜索&#xff0c;遍历该点就等于遍历这一片区域&#xff0c;遍历这篇区域中的…

Qt 类的前置声明和头文件包含

1. 在头文件中引入另一个类经常有两种写法 1&#xff09;前置声明 2&#xff09;头文件包含 #ifndef FRMCOUPLE2_H #define FRMCOUPLE2_H#include <QWidget> //头文件包含namespace Ui { class frmcouple2; }//前置声明&#xff1a;QPushButton frmchkeyboard…

java网络编程 01 IP,端口,域名,TCP/UDP, InetAddress

01.IP 要想让网络中的计算机能够互相通信&#xff0c;必须为计算机指定一个标识号&#xff0c;通过这个标识号来指定要接受数据的计算机和识别发送的计算机&#xff0c;而IP地址就是这个标识号&#xff0c;也就是设备的标识。 ip地址组成&#xff1a; ip地址分类&#xff1a;…

基础小白快速入门web前端开发技术------>web概述

Web概述 我们在编程的学习中&#xff0c;随着学习的深入&#xff0c;我们会理解到WEB这个东西&#xff0c;那么 web究竟是个啥&#xff0c;到底该咋用&#xff1f; web&#xff0c;是网站的英文意思&#xff0c;又被称作“下一代Web3.0&#xff0c;互联网”&#xff0c;是在We…

2024.3.6

利用c语言通过sqlite3实现数据库增删改查&#xff1a; #include<myhead.h> int do_add(sqlite3 * ppDb) {char sql_insert[128]"insert into worker values ";char info[500]"";printf("请输入要添加的员工信息&#xff1a;\n");scanf(&…

Android开发揭秘,我了解到的面试的一些小内幕

前言 尤其是在最近一段时间内&#xff0c;感觉一天天的时间过得又慢又快&#xff0c;慢的是感觉复工了以后在公司的8.9个小时简直算是煎熬了&#xff0c;快的是常常感觉时间一天天&#xff0c;一月月的过去了&#xff0c;可是发现自己还在原路踏步走。看似每天忙成狗&#xff…