Performance Metrics in Evaluating Stable Diffusion Models

news2024/9/23 6:22:19

1.Performance Metrics in Evaluating Stable Diffusion Models

笔记来源:
1.Performance Metrics in Evaluating Stable Diffusion Models
2.Denoising Diffusion Probabilistic Models
3.A simple explanation of the Inception Score
4.What is the inception score (IS)?
5.Kullback–Leibler divergence
6.Inception Score (IS) 与 Fréchet Inception Distance (FID)
7.Fréchet inception distance
8.Using CLIP Score to evaluated images

下图引用自:Wikipedia

1.1 Inception Score (IS): Evaluating Realism Through Classification


IS takes a unique approach by assessing the likelihood of a generated image being classified as accurate by a pre-trained image classifier.

Higher IS scores reflect greater realism and logic in generated images. Also, it shows the model’s proficiency in capturing real image essence.

Prerequisites
(1)Pre-trained Inception v3 Network: This model is used to classify the generated images.
(2)Generated Images: A diverse set of images generated by the Stable Diffusion model based on various text prompts.

Steps to Calculate Inception Score
(1)Generate Images
Use the Stable Diffusion model to generate a large number of images from diverse text prompts. The more diverse the text prompts, the better the evaluation.
(2)Preprocess Images
Ensure that the images are correctly sized (typically 299x299 pixels) and normalized to the format expected by the Inception v3 network.
(3)Pass Images Through Inception v3
Feed each generated image into the Inception v3 network to obtain the predicted label distributions p ( y ∣ x ) p(y|x) p(yx). This provides a probability distribution over classes for each image.(x:image,y:label)

(4)Compute Marginal Distribution
Calculate the marginal distribution p ( y ) p(y) p(y) over all generated images.

(5)Calculate KL Divergence
Compute the Kullback-Leibler (KL) divergence between the conditional distribution p ( y ∣ x ) p(y|x) p(yx) for each generated image and the marginal distribution p ( y ) p(y) p(y) over all generated images. Average the KL divergences across all images.
(The KL divergence is a measure of how similar/different two probability distributions are.)

下图引用自:Kullback–Leibler divergence

KL散度衡量两个概率分布之间的差异程度,通过计算KL散度值,我们可以了解两个概率分布到底有多相似
两个概率分布的差异程度越大,则KL散度值越大
两个概率分布的差异程度小,则KL散度值越小
两个概率分布相同,则KL散度值为0


以下是 KL 散度如何根据我们的两个分布而变化:

(6)Exponentiation

The Inception Score is the exponentiation of the average KL divergence.
To get the final score, we take the exponential of the KL divergence (to make the score grow to bigger numbers to make it easier to see it improve) and finally take the average of this for all of our images. The result is the Inception score!

计算过程梳理:
(1)通过inception v3网络求出每一张生成图片的概率分布 p ( y ∣ x ) p(y|x) p(yx)
(2)求出所有生成图片的概率分布 p ( y ) p(y) p(y)
(3)计算每一张生成图像概率分布 p ( y ∣ x ) p(y|x) p(yx)和所有生成图片之间概率分布 p ( y ) p(y) p(y)的KL散度,这里我们得到多个KL散度值
(4)我们将上述多个KL散度值求和后平均
(5)将(4)中值进行指数运算得到最终Inception Score

KL散度值大代表着单个生成图片的具有较高质量且易区分(被分类器区分)
IS值大代表生成的图片不仅多样性大而且具有较高质量

对生成图片计算IS,代码引用自:python实现Inception Score代码(读取自己生成的图片) 也可参考:sbarratt /inception-score-pytorch

import torch
from torch import nn
from torch.nn import functional as F
import numpy as np
from torchvision.models.inception import inception_v3
from PIL import Image
import os
from scipy.stats import entropy
import argparse
from tqdm import tqdm


'''
(1)Generate Images
(2)Preprocess Images
    Ensure that the images are correctly sized (typically 299x299 pixels) 
    and normalized to the format expected by the Inception v3 network.
(3)Compute predicted label distributions p(y|x)
    Pass Images Through Inception v3 to obtain the predicted label distributions p(y|x). 
    This provides a probability distribution over classes for each image
(4)Compute Marginal Distribution p(y)
    Calculate the marginal distribution p(y) over all generated images.
(5)Calculate KL Divergence
    D_KL(p(y|x)|p(y))
(6)Average the KL divergences across all images.
    Expectation(D_KL(p(y|x)|p(y)))
(7)Exponentiation
    Exp(Expectation(D_KL(p(y|x)|p(y)))) equivalent to Expectation(Exp(D_KL(p(y|x)|p(y))))
'''

# (1) python Inception_score.py --inupt_image_dir path_of_your_generated_images
# Argument parser setup
parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument('--input_image_dir', type=str, default='./input_images', help='Directory containing input images')
parser.add_argument('--batch_size', type=int, default=1, help='Batch size for processing images')
parser.add_argument('--device', type=str, choices=["cuda:0", "cpu"], default="cuda:0", help='Device for computation')
args = parser.parse_args()

# (2) Preprocess images: Normalization
# Inception v3 model preprocessing constants
mean_inception = [0.485, 0.456, 0.406]  # Mean for normalization
std_inception = [0.229, 0.224, 0.225]  # Standard deviation for normalization

# image -> array
def imread(filename):
    """
    Loads an image file and converts it into a (height, width, 3) uint8 numpy array.
    Args:
        filename (str): Path to the image file.
    Returns:
        np.ndarray: Image data in a (height, width, 3) format.
    """
    return np.asarray(Image.open(filename), dtype=np.uint8)[..., :3]

# calculate IS
def inception_score(batch_size=args.batch_size, resize=True, splits=1):
    """
    Computes the Inception Score for images in the specified directory.
    Args:
        batch_size (int): Number of images to process in each batch.
        resize (bool): Whether to resize images to the input size of the Inception model.
        splits (int): Number of subsets
    Returns:
        tuple: Maximum Inception Score and average Inception Score.
    """
    device = torch.device(args.device)  # Set computation device (CPU or GPU)

    # Load pre-trained Inception v3 model
    inception_model = inception_v3(pretrained=True, transform_input=False).to(device)
    inception_model.eval()  # Set model to evaluation mode
    # Ensure that the images are correctly sized (typically 299x299 pixels)
    # normalized to the format expected by the Inception v3 network.
    up = nn.Upsample(size=(299, 299), mode='bilinear', align_corners=False).to(device)
    # calculate p(y|x)
    # y:label of class, x: an image
    def get_pred(x):
        """
        Computes class probabilities using the Inception model.
        Args:
            x (torch.Tensor): Batch of images.
        Returns:
            np.ndarray: Class probabilities for each image.
        """
        if resize:
            x = up(x)  # Resize images if needed
        x = inception_model(x)  # Get model predictions
        return F.softmax(x, dim=1).data.cpu().numpy()  # Apply softmax and move to CPU

    print('Computing predictions using Inception v3 model')

    files = read_dir()  # Get list of image files
    N = len(files)
    # store p(y|x) of each image
    # Initialize a numpy array to store predictions for all images
    # N is num of generated images
    # 1000 corresponds to the number of output classes in the Inception v3 model
    # Each row will store the prediction (class probabilities) for one image
    preds = np.zeros((N, 1000))  # Array to store predictions

    # Adjust batch size if it's larger than the number of images
    if batch_size > N:
        print('Warning: Batch size is larger than the number of images. Setting batch size to data size.')
        batch_size = N

    # Process images in batches
    for i in tqdm(range(0, N, batch_size)):  # Loop over the range of image indices in steps of batch_size
        start = i  # Start index for the current batch
        end = min(i + batch_size, N)  # End index for the current batch, ensuring it doesn't exceed the number of images
        # Convert the list of image arrays to a single numpy array
        # For each file in the current batch, read the image and convert it to a float32 numpy array
        images = np.array(
            [imread(f).astype(np.float32) for f in files[start:end]])  # Read and convert images to float32
        # Rearrange the dimensions of the images to (n_images, 3, height, width)
        # normalize pixel values to [0, 1] range
        images = images.transpose((0, 3, 1, 2)) / 255
        # Convert the NumPy array to a PyTorch tensor of type FloatTensor and move it to the specified device
        batch = torch.from_numpy(images).type(torch.FloatTensor).to(device)
        # Compute class probabilities for the current batch using the Inception model and store the predictions
        # in the preds array at indices corresponding to the current batch
        preds[start:end] = get_pred(batch)  # Store predictions for the current batch
    # Ensure that the batch size is greater than 0 to avoid invalid batch processing
    assert batch_size > 0
    # Ensure that the total number of images is greater than the batch size
    # to allow for meaningful splitting and processing
    assert N > batch_size

    # Compute the Inception Score using KL Divergence
    print('Computing KL Divergence')
    # The split_scores list gathers the Inception Scores for each subset,
    # which are then averaged to obtain a final score.
    split_scores = []  # Initialize an empty list to store scores for each split
    for k in range(splits):
        part = preds[k * (N // splits): (k + 1) * (N // splits), :]  # Split predictions into equal parts
        # p(y)
        py = np.mean(part, axis=0)  # Compute the marginal probability by averaging predictions in the split
        # Compute the KL Divergence for each image's prediction against the marginal probability
        # D_KL(p(y|x)|p(y))
        scores = [entropy(pyx, py) for pyx in part]
        # Exp(D_KL(p(y|x)|p(y)))
        split_scores.append(
            np.exp(scores))  # Convert the KL Divergence scores to exponentials and append to split_scores
    # mean Expectation(Exp(D_KL(p(y|x)|p(y))))
    return np.max(split_scores), np.mean(split_scores)  # Return the maximum and average Inception Scores


def read_dir():
    """
    Recursively reads all image files from the specified directory.
    Returns:
        list: List of file paths.
    """
    dirPath = args.input_image_dir  # Get the directory path from command-line arguments
    allFiles = []  # Initialize an empty list to store file paths

    if os.path.isdir(dirPath):  # Check if the specified path is a directory
        # Walk through the directory tree
        for root, _, files in os.walk(dirPath):
            for file in files:
                # For each file, construct the full path and add it to the list
                allFiles.append(os.path.join(root, file))
    else:
        # Print an error message if the specified path is not a directory
        print('Error: Specified path is not a directory.')

    return allFiles  # Return the list of file paths

# Splitting the Data: The splits parameter allows dividing the predictions into multiple subsets.
# This is helpful for reducing the variance in the final Inception Score.
if __name__ == '__main__':
    max_is, avg_is = inception_score(splits=1)  # Compute Inception Scores
    print(f'MAX IS: {max_is:.4f}')
    print(f'Average IS: {avg_is:.4f}')

使用预训练模型生成六张图片,实际计算IS需要大量图片(如50000张),这里仅做测试

IS计算结果如下图,IS值越大说明生成图片的质量越好,多样性越大

IS局限性

(1) If you’re learning to generate something not present in the classifier’s training data (e.g. sharks are not in ILSVRC 2014) then you may always get low IS despite generating high quality images since that image doesn’t get classified as a distinct class

(2) If the classifier network cannot detect features relevant to your concept of image quality, then poor quality images may still get high scores.

1.2 Fréchet inception distance (FID): Assessing Image Distribution Similarity

Differences between IS and FID
Unlike the earlier inception score (IS), which evaluates only the distribution of generated images,
the FID compares the distribution of generated images with the distribution of a set of real images (“ground truth”).

在 Inception V3 的“世界观”下,凡是不像 ImageNet 的数据,都是不真实的,都不能保证输出一个 sharp 的 predition distribution。所以,要想更好地评价生成模型,就要使用更加有效的方法计算真实分布与生成样本之间的距离。FID正是衡量了生成样本与真实世界样本之间的距离。—引用自:Inception Score (IS) 与 Fréchet Inception Distance (FID)

FID

FID stands as a cornerstone metric that measures the distance between the distributions of generated and real images.

Lower FID scores signify a closer match between generated and real-world images. In addition, it shows superior model performance in mimicking real data distributions.

下图引用自:Fréchet inception distance

(1)Generating Images with Prompts
Use your diffusion model to generate images from text prompts.

(2)Extract Features
Pass both the generated images and a set of reference images through a pre-trained Inception network to extract feature vectors. Usually, the Inception v3 model is used for this purpose.

(3)Compute FID Score
Calculate the FID score between the feature distributions of the generated images and the reference images.

代码参考:mseitzer/pytorch-fid,其中主要的两个文件InceptionV3和计算FID Score
可安装后将其作为模块,直接进行计算

pip install pytorch-fid

生成图片作为sample dataset,ImageNet数据集本身作为reference dataset

python -m pytorch_fid path/to/dataset1 path/to/dataset2

1.3 CLIP Score

Text-guided image generation involves the use of models like StableDiffusionPipeline to generate images based on textual prompts. Also, it evaluates them using CLIP scores.

CLIP scores measure the fit between image-caption pairs. Higher scores signify better compatibility between the image and its associated caption.


Practical Implementation
(1)Generating Images with Prompts
StableDiffusionPipeline generates images based on multiple prompts. And it creates a diverse set of images aligned with the given textual cues.

(2)Computing CLIP Scores
After generating images, the CLIP scores are calculated to quantify the compatibility between each image and its corresponding prompt.

(3)Comparative Evaluation

Comparing Different Checkpoints: Generating images using different checkpoints, calculating CLIP scores for each set, and performing a comparative analysis assesses the performance differences between the versions. For example, comparing v1–4 and v1–5 checkpoints revealed improved performance in the latter.

以下网站可直接对图片和其对应文本进行评分:taesiri/CLIPScore

代码参考一:CLIP Score for PyTorch

Install PyTorch

pip install torch  # Choose a version that suits your GPU

Install CLIP

pip install git+https://github.com/openai/CLIP.git

Install clip-score from PyPI

pip install clip-score

Usage

python -m clip_score path/to/image path/to/text

代码参考二:Using CLIP Score to evaluated images

pip install -U torch torchvision
pip install -U git+https://github.com/openai/CLIP.git
import torch
import clip
from PIL import Image

def get_clip_score(image_path, text):
# Load the pre-trained CLIP model and the image
model, preprocess = clip.load('ViT-B/32')
image = Image.open(image_path)

    # Preprocess the image and tokenize the text
    image_input = preprocess(image).unsqueeze(0)
    text_input = clip.tokenize([text])
    
    # Move the inputs to GPU if available
    device = "cuda" if torch.cuda.is_available() else "cpu"
    image_input = image_input.to(device)
    text_input = text_input.to(device)
    model = model.to(device)
    
    # Generate embeddings for the image and text
    with torch.no_grad():
        image_features = model.encode_image(image_input)
        text_features = model.encode_text(text_input)
    
    # Normalize the features
    image_features = image_features / image_features.norm(dim=-1, keepdim=True)
    text_features = text_features / text_features.norm(dim=-1, keepdim=True)
    
    # Calculate the cosine similarity to get the CLIP score
    clip_score = torch.matmul(image_features, text_features.T).item()
    
    return clip_score

image_path = "path/to/your/image.jpg"
text = "your text description"

score = get_clip_score(image_path, text)
print(f"CLIP Score: {score}")

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1942678.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

【LLM】-05-提示工程-部署Langchain-Chat

目录 1、软硬件要求 1.1、软件要求 1.2、硬件要求 1.3、个人配置参考 2、创建cuda环境 3、下载源码及模型 4、配置文件修改 5、初始化知识库 5.1、训练自己的知识库 6、启动 7、API接口调用 7.1、使用openai 参考官方wiki,本文以Ubuntu20.04_x64&#xf…

揭秘!电源炼成记:从基础原理到高端设计的全面解析

文章目录 初始构想:需求驱动设计原理探索:选择适合的拓扑结构精细设计:元器件选型与布局环路稳定:控制策略与补偿网络严格测试:验证与优化持续改进:创新与技术迭代《硬件十万个为什么(电源是怎样…

云计算实训11——web服务器的搭建、nfs服务器的搭建、备份静态文件、基于linux和windows实现文件共享

一、搭建web服务器 1.关闭firewall和selinux 关闭防火墙 systemctl stop firewalld systemctl disable firewalld 停用selinux setenforce 0 配置文件中让sellinux不再启动 vim /etc/selinux/config SELINUXpermissive 2.编辑dns配置文件 vim /etc/resolv.conf nameserver 114.…

Sql Server缓冲池、连接池等基本知识(附Demo)

目录 前言1. 缓存池2. 连接池3. 彩蛋 前言 基本的知识推荐阅读: java框架 零基础从入门到精通的学习路线 附开源项目面经等(超全)Mysql优化高级篇(全)Mysql底层原理详细剖析常见面试题(全) 1…

【深度学习入门篇 ⑪】自注意力机制

【🍊易编橙:一个帮助编程小伙伴少走弯路的终身成长社群🍊】 大家好,我是小森( ﹡ˆoˆ﹡ ) ! 易编橙终身成长社群创始团队嘉宾,橙似锦计划领衔成员、阿里云专家博主、腾讯云内容共创官…

基于微信小程序+SpringBoot+Vue的大学生科技竞赛管理系统(带1w+文档)

基于微信小程序SpringBootVue的大学生科技竞赛管理系统(带1w文档) 基于微信小程序SpringBootVue的大学生科技竞赛管理系统(带1w文档) 本系统中采用的开发工具包括软件工具和硬件工具,软件采用了Java语言和MySQL数据库,利用微信小程序技术,框架…

从零训练一个多模态LLM:预训练+指令微调+对齐+融合多模态+链接外部系统

本文尝试梳理一个完整的多模态LLM的训练流程。包括模型结构选择、数据预处理、模型预训练、指令微调、对齐、融合多模态以及链接外部系统等环节。 01 准备阶段 1 模型结构 目前主要有三种模型架构,基于Transformer解码器,基于General Language Model&a…

51单片机嵌入式开发:16、STC89C52RC 嵌入式之 步进电机28BYJ48、四拍八拍操作

STC89C52RC 嵌入式之 步进电机28BYJ48、四拍八拍操作 STC89C52RC 之 步进电机28BYJ48操作1 概述1.1 步进电机概述1.2 28BYJ48概述 2 步进电机工作原理2.1 基本原理2.2 28BYJ48工作原理2.3 28BYJ48控制原理 3 电路及软件代码实现4 步进电机市场价值 STC89C52RC 之 步进电机28BYJ…

英语(二)-我的学习方式

章节章节汇总我的学习方式历年真题作文&范文 目录 1、背单词 2、学语法 3、做真题 4、胶囊助学计划 写在最前:我是零基础,初二就听天书的那种。 本专栏持续更新学习资料 1、背单词 单词是基础,一定要背单词!考纲要求要…

瑞吉外卖学习(一)

pom文件的导入中 <parent><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-parent</artifactId><version>2.6.6</version><relativePath/> <!-- lookup parent from repository --></…

【STM32 HAL库】DMA+串口

DMA 直接存储器访问 DMA传输&#xff0c;将数据从一个地址空间复制到另一个地址空间。-----“数据搬运工”。 DMA传输无需CPU直接控制传输&#xff0c;也没有中断处理方式那样保留现场和恢复现场&#xff0c;它是通过硬件为RAM和IO设备开辟一条直接传输数据的通道&#xff0c…

构建网络安全之盾:应对“微软蓝屏”教训的全面策略

✨✨ 欢迎大家来访Srlua的博文&#xff08;づ&#xffe3;3&#xffe3;&#xff09;づ╭❤&#xff5e;✨✨ &#x1f31f;&#x1f31f; 欢迎各位亲爱的读者&#xff0c;感谢你们抽出宝贵的时间来阅读我的文章。 我是Srlua小谢&#xff0c;在这里我会分享我的知识和经验。&am…

【算法】一致性哈希

一、引言 在分布式系统中&#xff0c;数据存储和访问的均匀性、高可用性以及可扩展性一直是核心问题。一致性哈希算法&#xff08;Consistent Hashing&#xff09;是一种分布式算法&#xff0c;因其出色的分布式数据存储特性&#xff0c;被广泛应用于缓存、负载均衡、数据库分片…

【Django5】模板引擎

系列文章目录 第一章 Django使用的基础知识 第二章 setting.py文件的配置 第三章 路由的定义与使用 第四章 视图的定义与使用 第五章 二进制文件下载响应 第六章 Http请求&HttpRequest请求类 第七章 会话管理&#xff08;Cookies&Session&#xff09; 第八章 文件上传…

如何检查我的网站是否支持HTTPS

HTTPS是一种用于安全通信的协议&#xff0c;是HTTP的安全版本。HTTPS的主要作用在于为互联网上的数据传输提供安全性和隐私保护。通常是需要在网站安装部署SSL证书来实现网络数据加密传输&#xff0c;安全加密功能。 那么如果要检查你的网站是否支持HTTPS&#xff0c;可以看下…

培训第十一天(nfs与samba共享文件)

上午 1、环境准备 &#xff08;1&#xff09;yum源 &#xff08;一个云仓库pepl仓库&#xff09; [rootweb ~]# vim /etc/yum.repos.d/hh.repo [a]nameabaseurlfile:///mntgpgcheck0[rootweb ~]# vim /etc/fstab /dev/cdrom /mnt iso9660 defaults 0 0[rootweb ~]# mount -a[…

软件测试09 自动化测试技术(Selenium)

重点/难点 重点&#xff1a;理解自动化测试的原理及其流程难点&#xff1a;Selinum自动化测试工具的使用 目录 系统测试 什么是系统测试什么是功能测试什么是性能测试常见的性能指标有哪些 自动化测试概述 测试面临的问题 测试用例数量增多&#xff0c;工作量增大&#xff…

数据结构初阶(C语言)-二叉树

一&#xff0c;树的概念与结构 树是⼀种非线性的数据结构&#xff0c;它是由 n&#xff08;n>0&#xff09; 个有限结点组成⼀个具有层次关系的集合。把它叫做树是因为它看起来像⼀棵倒挂的树&#xff0c;也就是说它是根朝上&#xff0c;而叶朝下的。 1.有⼀个特殊的结点&a…

ubuntu22安装拼音输入法

专栏总目录 一、安装命令&#xff1a; sudo apt update sudo apt install fcitx sudo apt install fcitx-pinyin 二、切换输入法

吴恩达深度学习笔记1 Neural Networks and Deep Learning

参考视频&#xff1a;(超爽中英!) 2024公认最好的【吴恩达深度学习】教程&#xff01;附课件代码 Professionalization of Deep Learning_哔哩哔哩_bilibili Neural Networks and Deep Learning 1. 深度学习引言(Introduction to Deep Learning) 2. 神 经 网 络 的 编 程 基 础…