tensorflow案例6--基于VGG16的猫狗识别(准确率99.8%+)，以及tqdm、train_on

🍨 本文为🔗365天深度学习训练营中的学习记录博客
🍖 原作者：K同学啊

前言

本次还是学习API和如何搭建神经网络为主，这一次用VGG16去对猫狗分类，效果还是很好的，达到了99.8%+

文章目录

1、tqdm、train_on_batch、预测简介
- tqdm
- train_on_bacth
- - `train_on_batch` 方法签名
- 预测需要注意
2、猫狗识别实现
- 1、数据处理
- - 1、导入库
  - 2、查看数据数量与类别
  - 3、导入数据
  - 4、数据图片展示
  - 5、数据归一化
  - 6、设置内存加速
- 2、模型构建
- 3、模型训练
- - 1、超参数设置
  - 2、模型训练
- 4、结果显示
- 5、预测

1、tqdm、train_on_batch、预测简介

tqdm

这个是一个修饰的API，它展现的是进度条形式，用于显示训练进度，如下：

tqdm(total=train_total, desc=f'Epoch {epoch + 1} / {epochs}', mininterval=1, ncols=100)

total：预期迭代数目
mininterval：进度条更新的速度
ncols：控制条宽度

train_on_bacth

本人首先学的是pytorch，在pytorch中训练模型，比较灵活，需要自己去计算损失函数、去模拟训练过程D等，但是当我在学tensorflow的时候，发现在tensorflow中，有一个叫做model.fit()的函数，封装的很完善，但是这样对于本人先学pytorch的来说，感觉还是pytorch好用，==而今天这次案例用到了train_on_bacth==

，这个API也像pytorch一样，提供了更多的灵活性，本人更偏爱这种方法，但是还是本人更喜欢pytorch🤠🤠🤠🤠

train_on_batch(self, x, y=None, sample_weight=None, class_weight=None, reset_metrics=True, return_dict=False)

当然！train_on_batch 方法有几个重要的参数，下面是对这些参数的详细解释：

`train_on_batch` 方法签名

train_on_batch(self, x, y=None, sample_weight=None, class_weight=None, reset_metrics=True, return_dict=False)

x: 输入数据。
- 类型：可以是 Numpy 数组、列表、字典，或者任何其他类型的数据，具体取决于模型的输入层。
- 描述：这是模型的输入数据，通常是一个批次的数据。
y: 目标数据（标签）。
- 类型：Numpy 数组、列表、字典等。
- 描述：这是模型的输出目标，即你希望模型预测的标签。如果模型是无监督的，这个参数可以省略。
sample_weight: 样本权重。
- 类型：Numpy 数组。
- 描述：这是一个与 x 中每个样本相对应的权重数组，用于在计算损失时给不同样本分配不同的权重。形状应与 y 的第一个维度相同。
class_weight: 类别权重。
- 类型：字典。
- 描述：这是一个字典，键是类别索引（整数），值是对应的权重。用于在计算损失时给不同类别分配不同的权重。这对于不平衡数据集特别有用。
reset_metrics: 是否在每次调用后重置模型的指标。
- 类型：布尔值。
- 描述：如果设置为 True，则在每次调用 train_on_batch 后，模型的指标（如准确率）将被重置。如果设置为 False，指标将在多次调用之间累积。
return_dict: 是否以字典形式返回结果。
- 类型：布尔值。
- 描述：如果设置为 True，则返回一个包含损失和所有指标的字典。如果设置为 False，则返回一个列表，其中第一个元素是损失值，后续元素是各个指标的值。

核心：👀👀👀👀👀👀 关注x，y 即可

预测需要注意

以这个案例为示：

for images, labels in val_ds.take(1):
    for i in range(10):
        plt.subplot(1, 10, i + 1)
        
        plt.imshow(images[i].numpy())
        
        # 增加一个维度
        img_array = tf.expand_dims(images[i], 0)
        
        # 预测
        predictions = model.predict(img_array)
        plt.title(image_classnames[np.argmax(predictions)])
        
        plt.axis("off")

注意： 增加一个维度，因为在模型输入默认有一个批次层，这个需要注意的。

2、猫狗识别实现

1、数据处理

1、导入库

import tensorflow as tf 
from tensorflow.keras import layers, models, datasets 
import numpy as np 

gpus = tf.config.list_physical_devices("GPU")

if gpus:
    gpu0 = gpus[0]
    tf.config.experimental.set_memory_growth(gpu0, True)   # 输出存储在GPU
    tf.config.set_visible_devices([gpu0], "GPU")          # 选择第一块GPU
    
gpus

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

2、查看数据数量与类别

import os, pathlib
# 查看数据数量
data_dir = "./data/"
data_dir = pathlib.Path(data_dir)

images_path =  data_dir.glob('*/*')
images_path_list = [str(path) for path in images_path]

images_num = len(images_path_list)

image_classnames = [names for names in os.listdir(data_dir)]

print("images have number: ", images_num)
print("images have classes: ", image_classnames)

images have number:  3400
images have classes:  ['cat', 'dog']

3、导入数据

# 训练集 ：验证集 = 8 ：2

batch_size = 32
image_width = 224
image_height = 224

train_ds = tf.keras.preprocessing.image_dataset_from_directory(
    './data/',
    subset='training',
    validation_split=0.2,
    seed=42,
    batch_size=batch_size,
    shuffle=True,
    image_size=(image_width, image_height)
)

val_ds = tf.keras.preprocessing.image_dataset_from_directory(
    './data/',
    validation_split=0.2,
    subset='validation',
    seed=42,
    batch_size=batch_size,
    shuffle=True,
    image_size=(image_width, image_height)
)

Found 3400 files belonging to 2 classes.
Using 2720 files for training.

Found 3400 files belonging to 2 classes.
Using 680 files for validation.

展示数据格式

# 展示一批数据格式
batch_datas, data_labels = next(iter(train_ds))

print("[N, W, H, C]: ", batch_datas.shape)
print("data_classes: ", data_labels)

[N, W, H, C]:  (32, 224, 224, 3)
data_classes:  tf.Tensor([0 1 0 1 1 1 1 1 1 1 1 0 0 0 0 1 0 0 1 1 1 1 1 0 1 1 1 1 0 0 1 0], shape=(32,), dtype=int32)

4、数据图片展示

import matplotlib.pyplot as plt 

plt.figure(figsize=(20, 10))

for i in range(20):
    plt.subplot(5, 10, i + 1)
    
    plt.imshow(batch_datas[i].numpy().astype("uint8"))
    
    plt.title(image_classnames[data_labels[i]])
    
    plt.axis('off')

plt.show()

在这里插入图片描述

5、数据归一化

数据存储格式：图片数据 + 标签

# 像素归一化, ---> [0, 1]
normalization_layer = layers.experimental.preprocessing.Rescaling(1.0 / 255)

# 训练集、测试集像素归一化
train_ds = train_ds.map(lambda x, y : (normalization_layer(x), y))
val_ds = val_ds.map(lambda x, y : (normalization_layer(x), y))

6、设置内存加速

from tensorflow.data.experimental import AUTOTUNE 

AUTOTUNE = tf.data.experimental.AUTOTUNE 

# 打乱顺序加速
train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)

2、模型构建

一共类别只有两类，故VGG16模型全连接层中，不需要展开那么多层，运用上次案例的模型即可。

def VGG16(class_num, input_shape):
    inputs = layers.Input(input_shape)
    
     # 1st block
    x = layers.Conv2D(64, kernel_size=(3, 3), activation='relu', strides=(1, 1), padding='same')(inputs)
    x = layers.Conv2D(64, kernel_size=(3, 3), activation='relu', strides=(1, 1), padding='same')(x)
    x = layers.MaxPooling2D((2, 2), strides=(2, 2))(x)

    # 2nd block
    x = layers.Conv2D(128, kernel_size=(3, 3), activation='relu', strides=(1, 1), padding='same')(x)
    x = layers.Conv2D(128, kernel_size=(3, 3), activation='relu', strides=(1, 1), padding='same')(x)
    x = layers.MaxPooling2D((2, 2), strides=(2, 2))(x)

    # 3rd block
    x = layers.Conv2D(256, kernel_size=(3, 3), activation='relu', strides=(1, 1), padding='same')(x)
    x = layers.Conv2D(256, kernel_size=(3, 3), activation='relu', strides=(1, 1), padding='same')(x)
    x = layers.Conv2D(256, kernel_size=(3, 3), activation='relu', strides=(1, 1), padding='same')(x)
    x = layers.MaxPooling2D((2, 2), strides=(2, 2))(x)

    # 4th block
    x = layers.Conv2D(512, kernel_size=(3, 3), activation='relu', strides=(1, 1), padding='same')(x)
    x = layers.Conv2D(512, kernel_size=(3, 3), activation='relu', strides=(1, 1), padding='same')(x)
    x = layers.Conv2D(512, kernel_size=(3, 3), activation='relu', strides=(1, 1), padding='same')(x)
    x = layers.MaxPooling2D((2, 2), strides=(2, 2))(x)

    # 5th block
    x = layers.Conv2D(512, kernel_size=(3, 3), activation='relu', strides=(1, 1), padding='same')(x)
    x = layers.Conv2D(512, kernel_size=(3, 3), activation='relu', strides=(1, 1), padding='same')(x)
    x = layers.Conv2D(512, kernel_size=(3, 3), activation='relu', strides=(1, 1), padding='same')(x)
    x = layers.MaxPooling2D((2, 2), strides=(2, 2))(x)
    
    # 全连接层, 这里修改以下
    x = layers.Flatten()(x)
    x = layers.Dense(4096, activation='relu')(x)
    x = layers.Dense(4096, activation='relu')(x)
    # 最后一层用激活函数：softmax
    out_shape = layers.Dense(class_num, activation='softmax')(x)
    
    # 创建模型
    model = models.Model(inputs=inputs, outputs=out_shape)
    
    return model
    
model = VGG16(len(image_classnames), (image_width, image_height, 3))
model.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_1 (InputLayer)        [(None, 224, 224, 3)]     0         
                                                                 
 conv2d (Conv2D)             (None, 224, 224, 64)      1792      
                                                                 
 conv2d_1 (Conv2D)           (None, 224, 224, 64)      36928     
                                                                 
 max_pooling2d (MaxPooling2D  (None, 112, 112, 64)     0         
 )                                                               
                                                                 
 conv2d_2 (Conv2D)           (None, 112, 112, 128)     73856     
                                                                 
 conv2d_3 (Conv2D)           (None, 112, 112, 128)     147584    
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 56, 56, 128)      0         
 2D)                                                             
                                                                 
 conv2d_4 (Conv2D)           (None, 56, 56, 256)       295168    
                                                                 
 conv2d_5 (Conv2D)           (None, 56, 56, 256)       590080    
                                                                 
 conv2d_6 (Conv2D)           (None, 56, 56, 256)       590080    
                                                                 
 max_pooling2d_2 (MaxPooling  (None, 28, 28, 256)      0         
 2D)                                                             
                                                                 
 conv2d_7 (Conv2D)           (None, 28, 28, 512)       1180160   
                                                                 
 conv2d_8 (Conv2D)           (None, 28, 28, 512)       2359808   
                                                                 
 conv2d_9 (Conv2D)           (None, 28, 28, 512)       2359808   
                                                                 
 max_pooling2d_3 (MaxPooling  (None, 14, 14, 512)      0         
 2D)                                                             
                                                                 
 conv2d_10 (Conv2D)          (None, 14, 14, 512)       2359808   
                                                                 
 conv2d_11 (Conv2D)          (None, 14, 14, 512)       2359808   
                                                                 
 conv2d_12 (Conv2D)          (None, 14, 14, 512)       2359808   
                                                                 
 max_pooling2d_4 (MaxPooling  (None, 7, 7, 512)        0         
 2D)                                                             
                                                                 
 flatten (Flatten)           (None, 25088)             0         
                                                                 
 dense (Dense)               (None, 4096)              102764544 
                                                                 
 dense_1 (Dense)             (None, 4096)              16781312  
                                                                 
 dense_2 (Dense)             (None, 2)                 8194      
                                                                 
=================================================================
Total params: 134,268,738
Trainable params: 134,268,738
Non-trainable params: 0
_________________________________________________________________

3、模型训练

1、超参数设置

model.compile(
    optimizer = "adam",
    loss = 'sparse_categorical_crossentropy',
    metrics = ['accuracy']
)

2、模型训练

from tqdm import tqdm 
import tensorflow.keras.backend as K

learn_rate = 1e-4
epochs = 10

history_train_loss = []
history_train_accuracy = []
history_val_loss = []
histoty_val_accuracy = []

for epoch in range(epochs):
    train_total = len(train_ds)
    val_total = len(val_ds)
    
    with tqdm(total=train_total, desc=f'Epoch {epoch + 1} / {epochs}', mininterval=1, ncols=100) as pbar:
        learn_rate = learn_rate * 0.92   # 动态加载学习率
        K.set_value(model.optimizer.lr, learn_rate)
        
        # 创建存储的损失率、准确率
        train_loss, train_accuracy = 0, 0 
        batch_num = 0
        
        for image, label in train_ds:
            history = model.train_on_batch(image, label)   # 核心： 模型训练
            
            train_loss += history[0]
            train_accuracy += history[1]
            batch_num += 1
            
            pbar.set_postfix({"loss": "%.4f"%train_loss,
                              "accuracy":"%.4f"%train_accuracy,
                              "lr": K.get_value(model.optimizer.lr)})
            
            pbar.update(1)
        
        # 记录平均损失值、准确率
        history_train_loss.append(train_loss / batch_num)    
        history_train_accuracy.append(train_accuracy / batch_num)
        
    print("开始验证！！")
    
    with tqdm(total=val_total, desc=f'Epoch {epoch + 1} / {epochs}', mininterval=0.3, ncols=100) as pbar:
        
        val_loss, val_accuracy = 0, 0
        batch_num = 0
        
        for image, label in val_ds:
            history = model.train_on_batch(image, label)  # 核心： 训练
            
            val_loss += history[0] 
            val_accuracy += history[1]
            batch_num += 1   # 记录训练批次
            
            pbar.set_postfix({"loss": "%.4f"%val_loss,
                              "accuracy":"%.4f"%val_accuracy})
            
            pbar.update(1)
            
        # 记录 平均 损失值和准确率
        history_val_loss.append(val_loss / batch_num)
        histoty_val_accuracy.append(val_accuracy / batch_num)
        
    print('结束验证！')
    print("平均验证loss为：%.4f"%(val_loss / batch_num))
    print("平均验证准确率为：%.4f"%(val_accuracy / batch_num))

Epoch 1 / 10:   0%|                                                          | 0/85 [00:00<?, ?it/s]2024-11-15 17:02:53.513057: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8101
2024-11-15 17:02:56.690008: I tensorflow/stream_executor/cuda/cuda_blas.cc:1786] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
Epoch 1 / 10: 100%|██████| 85/85 [00:21<00:00,  3.98it/s, loss=51.2915, accuracy=55.0000, lr=9.2e-5]

开始验证！！

Epoch 1 / 10: 100%|██████████████████| 22/22 [00:05<00:00,  4.39it/s, loss=5.7842, accuracy=19.8125]

结束验证！
平均验证loss为：0.2629
平均验证准确率为：0.9006

Epoch 2 / 10: 100%|█████| 85/85 [00:13<00:00,  6.32it/s, loss=13.8654, accuracy=80.2500, lr=8.46e-5]

开始验证！！

Epoch 2 / 10: 100%|██████████████████| 22/22 [00:03<00:00,  6.17it/s, loss=1.9731, accuracy=21.3125]

结束验证！
平均验证loss为：0.0897
平均验证准确率为：0.9688

Epoch 3 / 10: 100%|██████| 85/85 [00:13<00:00,  6.29it/s, loss=6.9991, accuracy=82.6562, lr=7.79e-5]

开始验证！！

Epoch 3 / 10: 100%|██████████████████| 22/22 [00:03<00:00,  6.54it/s, loss=1.1199, accuracy=21.5938]

结束验证！
平均验证loss为：0.0509
平均验证准确率为：0.9815

Epoch 4 / 10: 100%|██████| 85/85 [00:13<00:00,  6.35it/s, loss=4.5513, accuracy=83.4688, lr=7.16e-5]

开始验证！！

Epoch 4 / 10: 100%|██████████████████| 22/22 [00:03<00:00,  6.66it/s, loss=0.7666, accuracy=21.7812]

结束验证！
平均验证loss为：0.0348
平均验证准确率为：0.9901

Epoch 5 / 10: 100%|██████| 85/85 [00:13<00:00,  6.49it/s, loss=4.4772, accuracy=83.7188, lr=6.59e-5]

开始验证！！

Epoch 5 / 10: 100%|██████████████████| 22/22 [00:03<00:00,  6.61it/s, loss=0.5379, accuracy=21.8125]

结束验证！
平均验证loss为：0.0245
平均验证准确率为：0.9915

Epoch 6 / 10: 100%|██████| 85/85 [00:13<00:00,  6.39it/s, loss=1.5206, accuracy=84.4375, lr=6.06e-5]

开始验证！！

Epoch 6 / 10: 100%|██████████████████| 22/22 [00:03<00:00,  6.73it/s, loss=0.2960, accuracy=21.9062]

结束验证！
平均验证loss为：0.0135
平均验证准确率为：0.9957

Epoch 7 / 10: 100%|██████| 85/85 [00:13<00:00,  6.33it/s, loss=1.9587, accuracy=84.4062, lr=5.58e-5]

开始验证！！

Epoch 7 / 10: 100%|██████████████████| 22/22 [00:03<00:00,  6.31it/s, loss=0.2803, accuracy=21.9375]

结束验证！
平均验证loss为：0.0127
平均验证准确率为：0.9972

Epoch 8 / 10: 100%|██████| 85/85 [00:13<00:00,  6.43it/s, loss=0.6910, accuracy=84.7812, lr=5.13e-5]

开始验证！！

Epoch 8 / 10: 100%|██████████████████| 22/22 [00:03<00:00,  6.47it/s, loss=0.1591, accuracy=21.9375]

结束验证！
平均验证loss为：0.0072
平均验证准确率为：0.9972

Epoch 9 / 10: 100%|██████| 85/85 [00:13<00:00,  6.51it/s, loss=0.6709, accuracy=84.7812, lr=4.72e-5]

开始验证！！

Epoch 9 / 10: 100%|██████████████████| 22/22 [00:03<00:00,  6.64it/s, loss=0.1658, accuracy=21.9688]

结束验证！
平均验证loss为：0.0075
平均验证准确率为：0.9986

Epoch 10 / 10: 100%|█████| 85/85 [00:13<00:00,  6.43it/s, loss=0.5763, accuracy=84.7500, lr=4.34e-5]

开始验证！！

Epoch 10 / 10: 100%|█████████████████| 22/22 [00:03<00:00,  6.44it/s, loss=0.1906, accuracy=21.8750]

结束验证！
平均验证loss为：0.0087
平均验证准确率为：0.9943

4、结果显示

epochs_range = range(epochs)

plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)

plt.plot(epochs_range, history_train_accuracy, label='Training Accuracy')
plt.plot(epochs_range, histoty_val_accuracy, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')

plt.subplot(1, 2, 2)
plt.plot(epochs_range, history_train_loss, label='Training Loss')
plt.plot(epochs_range, history_val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()

在这里插入图片描述

5、预测

# 随机预测几张
plt.figure(figsize=(18 ,3))

for images, labels in val_ds.take(1):
    for i in range(10):
        plt.subplot(1, 10, i + 1)
        
        plt.imshow(images[i].numpy())
        
        # 增加一个维度
        img_array = tf.expand_dims(images[i], 0)
        
        # 预测
        predictions = model.predict(img_array)
        plt.title(image_classnames[np.argmax(predictions)])
        
        plt.axis("off")
        
plt.show()

1/1 [==============================] - 0s 29ms/step
1/1 [==============================] - 0s 27ms/step
1/1 [==============================] - 0s 34ms/step
1/1 [==============================] - 0s 29ms/step
1/1 [==============================] - 0s 29ms/step
1/1 [==============================] - 0s 32ms/step
1/1 [==============================] - 0s 28ms/step
1/1 [==============================] - 0s 33ms/step
1/1 [==============================] - 0s 29ms/step
1/1 [==============================] - 0s 29ms/step

在这里插入图片描述