TensorFlow图像多标签分类实例

接下来，我们将从零开始讲解一个基于TensorFlow的图像多标签分类实例，这里以图片验证码为例进行讲解。

在我们访问某个网站的时候，经常会遇到图片验证码。图片验证码的主要目的是区分爬虫程序和人类，并将爬虫程序阻挡在外。

下面的程序就是模拟人类识别验证码，从而使网站无法区分是爬虫程序还是人类在网站登录。

以图10.5所示的图片验证码为例，将这幅验证码图片标记为label=[3,8,8,7]。我们知道分类网络一般一次只能识别出一个目标，那么如何识别这个多标签的序列数据呢?

通过下面的TFRecord结构可以构建多标签训练数据集，从而实现多标签数据识别。

图10.5 图片验证码

以下为构造TFRecord多标签训练数据集的代码：

import tensorflow as tf
# 定义对整型特征的处理    
def _int64_feature(value):
    return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
# 定义对字节特征的处理
def _bytes_feature(value):
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
# 定义对浮点型特征的处理
def _floats_feature(value):
    return tf.train_Feature(float_list=tf.train.floatList(value=[value]))
# 对数据进行转换    
def convert_to_record(name, image, label, map):
    filename = os.path.join(params.TRAINING_RECORDS_DATA_DIR,
        name + '.' + params.DATA_EXT)
    writer = tf.python_io.TFRecordWriter(filename)
    image_raw = image.tostring()
    map_raw = map.tostring()
    label_raw = label.tostring()
    example = tf.train.Example(feature=tf.train.Feature(feature={
        'image_raw': _bytes_feature(image_raw),
        'map_raw': _bytes_feature(map_raw),
        '1abel_raw': _bytes_feature(label_raw)
    }))
    writer.write(example.SerializeToString())
    writer.close()

通过上面的代码，我们构建了一条支持多标签的TFRecord记录，多幅验证码图片可以构建一个验证码的多标签数据集，用于后续的多标签分类训练。

10.4.2 构建多标签分类网络

通过前一步操作，我们得到了用于多标签分类的验证码数据集，现在需要构建多标签分类网络。

我们选择VGG网络作为特征提取网络骨架。通常越复杂的网络，对噪声的鲁棒性就越强。验证码中的噪声主要来自形变、粘连以及人工添加，VGG网络对这些噪声具有好的鲁棒性，代码如下：

import tensorflow as tf
tf.enable_eager_execution ()
def model_vgg(x, training = False):
# 第一组第一个卷积使用64个卷积核，核大小为3
conv1_1 = tf.layers.conv2d(inputs=x, filters=64,name="conv1_1",
    kernel_size=3, activation=tf.nn.relu, padding="same")
# 第一组第二个卷积使用64个卷积核，核大小为3
convl_2 = tf.layers.conv2d(inputs=conv1_1,filters=64, name="conv1_2",
    kernel_size=3, activation=tf.nn.relu,padding="same")
# 第一个pool操作核大小为2，步长为2
pooll = tf.layers.max_pooling2d(inputs=conv1_2, pool_size=[2, 2],
    strides=2, name= 'pool1')
# 第二组第一个卷积使用128个卷积核，核大小为3
conv2_1 = tf.layers.conv2d(inputs=pool1, filters=128, name="conv2_1",
    kernel_size=3, activation=tf.nn.relu, padding="same")
# 第二组第二个卷积使用64个卷积核，核大小为3
conv2_2 = tf.layers.conv2d(inputs=conv2_1, filters=128,name="conv2_2",
    kernel_size=3, activation=tf.nn.relu, padding="same")
# 第二个pool操作核大小为2，步长为2
pool2 = tf.layers.max_pooling2d(inputs=conv2_2, pool_size=[2， 2],
    strides=2, name="pool1")
# 第三组第一个卷积使用128个卷积核，核大小为3
conv3_1 = tf.layers.conv2d(inputs=pool2, filters=128, name="conv3_1", 
    kernel_size=3, activation=tf.nn.relu, padding="same")
# 第三组第二个卷积使用128个卷积核，核大小为3
conv3_2 = tf.layers.conv2d(inputs=conv3_1, filters=128, name="conv3_2", 
    kernel_size=3, activation=tf.nn.relu, padding="same")
# 第三组第三个卷积使用128个卷积核，核大小为3
conv3_3 = tf.layers.conv2d(inputs=conv3_2, filters=128, name="conv3_3", 
    kernel_size=3, activation=tf.nn.relu, padding=" same")
# 第三个pool 操作核大小为2，步长为2
pool3 = tf.layers.max_pooling2d(inputs=conv3_3, pool_size=[2, 2], 
    strides=2,name='pool3')
# 第四组第一个卷积使用256个卷积核，核大小为3
conv4_1 = tf.layers.conv2d(inputs-pool3, filters=256, name="conv4_1", 
    kernel_size=3, activation=tf.nn.relu, padding="same")
# 第四组第二个卷积使用128个卷积核，核大小为3
conv4_2 = tf.layers.conv2d(inputs=conv4_1, filters=128, name="conv4_2", 
    kernel_size=3, activation=tf.nn.relu, padding="same")
# 第四组第三个卷积使用128个卷积核，核大小为3
conv4_3 = tf.layers.conv2d(inputs=conv4_2, filters=128, name="cov4_3", 
    kernel_size=3, activation=tf.nn.relu, padding="same" )
# 第四个pool操作核大小为2，步长为2
pool4 = tf.layers.max.pooling2d(inputs=conv4_3, pool_size=[2,2], 
    strides=2, name='pool4')
# 第五组第一个卷积使用512个卷积核，核大小为3
conv5_1 = tf.layers.conv2d(inputs=pool4, filters=512, name="conv5_1", 
    kernel_size=3, activation=tf.nn.relu, padding=" same")
# 第五组第二个卷积使用512个卷积核，核大小为3
conv5_2 = t.layers.conv2d(inputs=conv5_1, filters=512, name="conv5_2", 
    kernel_size=3, activation=tf.nn.relu, padding="same")
# 第五组第三个卷积使用512个卷积核，核大小为3
conv5_3 = tf.layers.conv2d(inputs-conv5_2, filters=512, name="conv5_3", 
    kernel_size=3, activation=tf.nn.relu, padding="same"
    )
# 第五个pool操作核大小为2，步长为2
pool5 = tf.layers.max_pooling2d(inputs=conv5_3, pool_size=[2, 2], 
    strides=2, name='pool5')
flatten = tf.layers.flatten(inputs=poo15, name="flatten")

上面是VGG网络的单标签分类TensorFlow代码，但这里我们需要实现的是多标签分类，因此需要对VGG网络进行相应的改进，代码如下：

# 构建输出为4096的全连接层
fc6 = tf.layers.dense(inputs=flatten, units=4096,
activation=tf.nn.relu, name='fc6')
# 为了防止过拟合，引入dropout操作
drop1 = tf.layers.dropout(inputs=fc6,rate=0.5, training=training)
# 构建输出为4096的全连接层
fc7 = tf.layers.dense(inputs=drop1, units=4096,
activation=tf.nn.relu, name='fc7')
# 为了防止过报合，引入dropout操作
drop2 = tf.layers.dropout(inputs=fc7, rate=0.5, training=training)
# 为第一个标签构建分类器
fc8_1 = tf.layers.dense(inputs=drop2, units=10,
activation=tf.nn.sigmoid, name='fc8_1')
# 为第二个标签构建分类器
fc8_2 = tf.layers.dense(inputs=drop2, units=10,
activation=tf.nn.sigmoid, name='fc8_2')
# 为第三个标签构建分类器
fc8_3 = tf.layers.dense(inputs=drop2, units=10,
activation=tf.nn.sigmoid, name='fc8_3')
# 为第四个标签构建分类器
fc8_4 = tf.layers.dense(inputs=drop2,units=10,
activation=tf.nn.sigmoid, name='fc8_4')
# 将四个标签的结果进行拼接操作
fc8 = tf.concat([fc8_1,fc8_2,fc8_3,fc8_4], 0)

这里的fc6和fc7全连接层是对网络的卷积特征进行进一步的处理，在经过fc7层后，我们需要生成多标签的预测结果。由于一幅验证码图片中存在4个标签，因此需要构建4个子分类网络。这里假设图片验证码中只包含10 个数字，因此每个网络输出的预测类别就是10类，最后生成4个预测类别为10的子网络。如果每次训练时传入64幅验证码图片进行预测，那么通过4个子网络后，分别生成(64,10)、(64,10)、(64,10)、(64,10) 4个张量。如果使用Softmax分类器的话，就需要想办法将这4个张量进行组合，于是使用tf.concat函数进行张量拼接操作。

以下是TensorFlow中tf.concat函数的传参示例：

tf.concat (
values,
axis,
name='concat'
)

通过fc8=tf.concat([fc8_1,fc8_2,fc8_3,fc8_4], 0)的操作，可以将前面的4个(64.10)张量变换成(256.10)这样的单个张量，生成单个张量后就能进行后面的Softmax分类操作了。

10.4.3 多标签训练模型

模型训练的第一个步骤就是读取数据，读取方式有两种：一种是直接读取图片进行操作，另一种是转换为二进制文件格式后再进行操作。前者实现起来简单，但速度较慢；后者实现起来复杂，但读取速度快。这里我们以后者二进制的文件格式介绍如何实现多标签数据的读取操作，下面是相关代码。

首先读取TFRecord文件内容：

tfr = TFrecorder()
def input_fn_maker(path, data_info_path, shuffle=False, batch_size = 1,
epoch = 1, padding = None) :    
def input_fn():
    filenames = tfr.get_filenames(path=path, shuffle=shuffle)
    dataset=tfr.get_dataset(paths=filenames,
        data_info=data_info_path, shuffle = shuffle,
        batch_size = batch_size, epoch = epoch, padding = padding)
    iterator = dataset.make_one_shot_iterator ()
    return iterator.get_next()
return input_fn
# 原始图片信息
padding_info = ({'image':[30, 100,3,], 'label':[]})
# 测试集
test_input_fn = input_fn_maker('captcha_data/test/',
'captcha_tfrecord/data_info.csv',
batch_size = 512, padding = padding_info)
# 训练集
train_input_fn = input_fn_maker('captcha_data/train/',
'captcha_tfrecord/data_info.csv',
shuffle=True, batch_size = 128,padding = padding_info)
# 验证集
train_eval_fn = input_fn_maker('captcha_data/train/',
'captcha_tfrecord/data_info.csv',
batch_size = 512,adding = padding_info)

然后是模型训练部分：

def model_fn(features, net, mode):
features['image'] = tf.reshape(features['image'], [-1, 30, 100, 3])
# 获取基于net网络的模型预测结果
predictions = net(features['image'])
# 判断是预测模式还是训练模式
if mode == tf.estimator.ModeKeys.PREDICT:
    return tf.estimator.EstimatorSpec(mode=mode,
        predictions=predictions)
# 因为是多标签的Softmax，所以需要提前对标签的维度进行处理
lables = tf.reshape(features['label'], features['label'].shape[0]*4,))
# 初始化softmaxloss
loss = tf.losses.sparse_softmax_cross_entropy(labels=labels,
    logits=logits)
# 训练模式下的模型结果获取
if mode ==tf.estimator.ModeKeys.TRAIN:
    # 声明模型使用的优化器类型
    optimizer = tf.train.AdamOptimizer(learning_rate=1e-3)
        train_op = optimizer.minimize(
            loss=loss,global_step=tf.train.get_global_step())
    return tf.estimator.EstimatorSpec(mode=mode,
        loss=loss, train_op=train_op)
# 生成评价指标
eval_metric_ops = {"accuracy": tf.metrics.accuracy(
    labels=features['label'],predictions=predictions["classes"]) }
return tf.estimator.EstimatorSpec(mode=mode, loss=loss,
    eval_metric_ops= eval_metric_ops)

多标签的模型训练流程与普通单标签的模型训练流程非常相似，唯一的区别就是需要将多标签的标签值拼接成一个张量，以满足Softmax分类操作的维度要求。

本文节选自《Python深度学习原理、算法与案例》。