TensorFlow深度学习实战——构建卷积神经网络实现CIFAR-10图像分类

- 0. 前言
- 1. CIFAR-10 数据集介绍
- 2. CIFAR-10 图像分类
- 3. 提升模型性能
- - 3.1 增加网络深度
  - 3.2 数据增强
- 4. 模型测试
- 相关链接

0. 前言

我们已经学习了卷积神经网络 (Convolutional Neural Network, CNN) 的基本概念，并使用 TensorFlow 构建 CNN 实现了 MNIST 手写数字分类问题，本节将进一步介绍如何利用 CNN 对 CIFAR-10 数据集进行图像分类。首先构建一个简单的 CNN 模型来进行 CIFAR-10 图像的分类，接着，通过模型优化技术提高分类准确率。介绍了一个完整的从数据准备到模型评估的图像分类实践过程。

1. CIFAR-10 数据集介绍

与 MNIST 数据集简单的灰度图像不同，CIFAR-10 数据集包含 60,000 张彩色图像，每张图像的尺寸为 32 x 32 像素，每张图像有有三个通道，分为 10 个类别。每个类别包含 6,000 张图像，训练集包含 50,000 张图像，测试集包含 10,000 张图像。随机查看 CIFAR-10 数据集中的数据样本：

数据样本

模型训练的目标是预测 CIFAR-10 数据集中在训练过程中未见过的图像的类别。

2. CIFAR-10 图像分类

(1) 首先，我们导入所需库，定义常量，并加载数据集：

import tensorflow as tf
from tensorflow.keras import datasets, layers, models, optimizers

# CIFAR_10 is a set of 60K images 32x32 pixels on 3 channels
IMG_CHANNELS = 3
IMG_ROWS = 32
IMG_COLS = 32

#constant
BATCH_SIZE = 128
EPOCHS = 20
CLASSES = 10
VERBOSE = 2
VALIDATION_SPLIT = 0.2
OPTIM = tf.keras.optimizers.RMSprop()

(2) 卷积层使用 32 个卷积核，每个卷积核尺寸为 3 x 3。输出尺寸与输入形状相同，为 32 x 32，激活函数采用 ReLU 函数，引入非线性。池化层使用 2 x 2 的 MaxPooling 操作，并且使用概率为 25% 的 Dropout 操作：

#define the convnet 
def build(input_shape, classes):
    model = models.Sequential() 
    model.add(layers.Convolution2D(32, (3, 3), activation='relu',
              input_shape=input_shape))
    model.add(layers.MaxPooling2D(pool_size=(2, 2)))
    model.add(layers.Dropout(0.25))

接下来，在展平特征图后添加全连接层，全连接层包含 512 个神经元和 ReLU 激活函数，随后使用概率为 50% 的 Dropout 操作，最后添加一个使用 Softmax 激活函数的全连接层，此全连接层包含 10 个神经元对应 10 个类别，每个类别对应一个输出：

    model.add(layers.Flatten())
    model.add(layers.Dense(512, activation='relu'))
    model.add(layers.Dropout(0.5))
    model.add(layers.Dense(classes, activation='softmax'))
    return model

(3) 定义网络之后，训练模型。除了训练集和测试集之外，还将数据拆分得到一个验证集。训练集用于构建模型，验证集用于选择性能表现最佳的方法，而测试集用于检查最佳模型在未见过的数据上的表现：

# data: shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = datasets.cifar10.load_data()
# normalize
X_train, X_test = X_train / 255.0, X_test / 255.0
# convert to categorical
# convert class vectors to binary class matrices
y_train = tf.keras.utils.to_categorical(y_train, CLASSES)
y_test = tf.keras.utils.to_categorical(y_test, CLASSES)

model=build((IMG_ROWS, IMG_COLS, IMG_CHANNELS), CLASSES)
model.summary()

# use TensorBoard, princess Aurora!
callbacks = [
    # Write TensorBoard logs to `./logs` directory
    tf.keras.callbacks.TensorBoard(log_dir='./logs')
]

# train
model.compile(loss='categorical_crossentropy', optimizer=OPTIM,
              metrics=['accuracy'])
 
model.fit(X_train, y_train, batch_size=BATCH_SIZE,
          epochs=EPOCHS, validation_split=VALIDATION_SPLIT, 
          verbose=VERBOSE, callbacks=callbacks) 
score = model.evaluate(X_test, y_test, batch_size=BATCH_SIZE, verbose=VERBOSE)
print("\nTest score:", score[0])
print('Test accuracy:', score[1])

运行代码。模型在进行 20 个 epochs 的训练后在测试数据集上能够达到 67.8% 的准确率：

模型训练过程监测

模型在 CIFAR-10 数据集上训练过程的准确率和损失的变化情况如下：

模型训练过程监测

3. 提升模型性能

3.1 增加网络深度

提高性能的一种方式是使用多个卷积层定义更深的网络。接下来，我们将使用以下网络构建块：

$\text{1st block: CONV+CONV+MaxPool+Dropout}\\ \text{2nd block: CONV+CONV+MaxPool+Dropout}\\ \text{3rd block: CONV+CONV+MaxPool+Dropout}$
之后是添加全连接层作为输出层。除了输出层外，所有网络层使用的激活函数都是 ReLU 函数。网络中添加 BatchNormalization() 层用于引入正则化：

import tensorflow as tf
from tensorflow.keras import datasets, layers, models, regularizers, optimizers
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import numpy as np

def build_model(): 
    model = models.Sequential()
    
    #1st block
    model.add(layers.Conv2D(32, (3,3), padding='same', 
        input_shape=x_train.shape[1:], activation='relu'))
    model.add(layers.BatchNormalization())
    model.add(layers.Conv2D(32, (3,3), padding='same', activation='relu'))
    model.add(layers.BatchNormalization())
    model.add(layers.MaxPooling2D(pool_size=(2,2)))
    model.add(layers.Dropout(0.2))

    #2nd block
    model.add(layers.Conv2D(64, (3,3), padding='same', activation='relu'))
    model.add(layers.BatchNormalization())
    model.add(layers.Conv2D(64, (3,3), padding='same', activation='relu'))
    model.add(layers.BatchNormalization())
    model.add(layers.MaxPooling2D(pool_size=(2,2)))
    model.add(layers.Dropout(0.3))

    #3d block 
    model.add(layers.Conv2D(128, (3,3), padding='same', activation='relu'))
    model.add(layers.BatchNormalization())
    model.add(layers.Conv2D(128, (3,3), padding='same', activation='relu'))
    model.add(layers.BatchNormalization())
    model.add(layers.MaxPooling2D(pool_size=(2,2)))
    model.add(layers.Dropout(0.4))

    #dense  
    model.add(layers.Flatten())
    model.add(layers.Dense(NUM_CLASSES, activation='softmax'))
    return model

加载并归一化数据集：

EPOCHS=50
NUM_CLASSES = 10
BATCH_SIZE = 128
    

def load_data():
    (x_train, y_train), (x_test, y_test) = datasets.cifar10.load_data()
    x_train = x_train.astype('float32')
    x_test = x_test.astype('float32')
 
    #normalize 
    mean = np.mean(x_train,axis=(0,1,2,3))
    std = np.std(x_train,axis=(0,1,2,3))
    x_train = (x_train-mean)/(std+1e-7)
    x_test = (x_test-mean)/(std+1e-7)
 
    y_train =  tf.keras.utils.to_categorical(y_train,NUM_CLASSES)
    y_test =  tf.keras.utils.to_categorical(y_test,NUM_CLASSES)

    return x_train, y_train, x_test, y_test

网络定义完成后，训练 50 个 epochs，最后在测试数据集上能够达到 82.60% 的准确率：

(x_train, y_train, x_test, y_test) = load_data()
model = build_model()
model.compile(loss='categorical_crossentropy', 
              optimizer='RMSprop', 
              metrics=['accuracy'])

#train
batch_size = 64
model.fit(x_train, y_train, batch_size=batch_size,
          epochs=EPOCHS, validation_data=(x_test,y_test)) 
score = model.evaluate(x_test, y_test,
                       batch_size=BATCH_SIZE)
print("\nTest score:", score[0])
print('Test accuracy:', score[1])

相对于简单网络，模型性能取得了 14.74% 的提高。

3.2 数据增强

另一种提高性能的方法是使用更多数据训练模型。我们可以取标准的 CIFAR-10 训练集，并用多种类型的转换扩充训练集，包括旋转、水平或垂直翻转、缩放、平移等：

#image augmentation
datagen = ImageDataGenerator(
            rotation_range=30,
            width_shift_range=0.2,
            height_shift_range=0.2,
            horizontal_flip=True,
)
datagen.fit(x_train)

rotation_range 是一个在度数范围 (0-180) 内的值，用于随机旋转图片；width_shift 和 height_shift 是用于随机在垂直或水平方向上平移图片的范围；zoom_range 用于随机缩放图片；horizontal_flip 用于随机水平翻转一半的图片；fill_mode 是用于填充因旋转或移位后可能产生的空白区域。
更正式的讲，数据增强是通过对训练数据应用各种变换(如旋转、缩放、翻转、裁剪等)来生成更多样化的数据，从而提高模型的泛化能力和鲁棒性。
数据增强完成后，能够根据标准的 CIFAR-10 集合中生成更多的训练图像：

数据增强结果

训练模型。使用上一小节定义的 ConvNet，使用数据增强生成更多图像，然后进行训练：

#train
batch_size = 64
model.fit_generator(datagen.flow(x_train, y_train, batch_size=batch_size),
                    epochs=EPOCHS,
                    verbose=2,validation_data=(x_test,y_test))
#save to disk
model_json = model.to_json()
with open('model.json', 'w') as json_file:
    json_file.write(model_json)
model.save_weights('model.h5') 

#test
scores = model.evaluate(x_test, y_test, batch_size=128, verbose=1)
print('\nTest result: %.3f loss: %.3f' % (scores[1]*100,scores[0]))

训练过程需要更多时间，因为有更多的训练数据。因此，运行 50 个 epochs 的训练，模型在测试及上能够达到 84% 的准确率：

训练过程

模型在 CIFAR-10 数据集上训练过程的准确率和损失的变化情况如下：模型性能监测

在 CIFAR-10 数据集上训练的一系列深度学习模型性能能够在线查看。截至目前，最佳模型的准确率为 96.53%。

模型准确率

4. 模型测试

假设想要使用刚刚训练的 CIFAR-10 深度学习模型进行图像评估，由于我们保存了模型和权重，所以不需要每次都进行训练：

import numpy as np

from skimage.transform import resize
from imageio import imread

from tensorflow.keras.models import model_from_json
from tensorflow.keras.optimizers import SGD

model_architecture = "cifar10_architecture.json"
model_weights = "cifar10_weights.h5"
model = model_from_json(open(model_architecture).read())
model.load_weights(model_weights)

img_names = ["cat.jpg", "dog.jpg"]
imgs = [resize(imread(img_name), (32, 32)).astype("float32") for img_name in img_names]
imgs = np.array(imgs) / 255
print("imgs.shape:", imgs.shape)

optim = SGD()
model.compile(loss="categorical_crossentropy", optimizer=optim, metrics=["accuracy"])

predictions = np.argmax(model.predict(imgs), axis=1)
print("predictions:", predictions)

使用 imageio 的 imread 加载图像，然后将它们调整为 32 × 32 像素，得到的图像张量的维度为 (32, 32, 3)。之后，将图像张量列表合并成一个单一的张量，并将其归一化到 0 和 1.0 之间，最后构建模型，加载训练完成的模型权重进行预测。
为以下两张图像分类，预期输出为类别 3 (猫)和 5 (狗)。

示例图像