引入数据

MINST数据集是一个经典的手写数字识别数据集，由Yann LeCun等人创建。它包含了来自真实手写数字图片的70000个灰度图像，这些图像是由250个不同的人手写而成的，其中60000个图像被用作训练数据，10000个图像用作测试数据。

每个图像都是28x28像素大小的，并且已经被规范化和中心化处理，以便于输入到机器学习模型中。每个图像还带有一个标签，指示该图像所代表的数字是什么。

通过Keras可以方便地获取MINST数据集。Keras提供了一个简单的API，可以直接从官方网站下载并加载MNIST数据集：

下载加载数据集：

from tensorflow import keras
from tensorflow.keras.datasets import mnist

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

查看数据集大小：

print(x_train.shape)
# (60000, 28, 28)
print(x_test.shape)
# (10000, 28, 28)

- 60000张训练图片
- 每张图片 28 * 28 pixel
- 10000张测试图片
- 每张图片 28 * 28 pixel

在这里插入图片描述

使用matplotlib查看MINST数据集中图片：

import matplotlib.pyplot as plt

# 显示训练集中前25个图像
fig, axes = plt.subplots(nrows=5, ncols=5, figsize=(10,10))
axes = axes.ravel()

for i in range(25):
    axes[i].imshow(x_train[i], cmap='gray')
    axes[i].set_title(y_train[i])
    axes[i].axis('off')

plt.subplots_adjust(hspace=0.5)
plt.show()

cmap='gray' 参数使图像以灰度模式显示。
axis('off') 方法可以将坐标轴关闭，以便更好地查看图像。
plt.subplots_adjust(hspace=0.5) 将相邻子图之间的间距设置为一个子图高度的50%，以便更好地分隔每个子图。

展示效果如下图所示：

在这里插入图片描述

模型训练

在这里插入图片描述
模型结构：

模型采用输入层 + 两个隐藏层 + 输出层结构：

输入层：
x_train 包含60000张图片，每张图片 28 * 28 pixel；
隐藏层：
总共两层隐藏层，第一层包含25个神经元，第二层包含15个神经元；
隐藏层采用 ReLU 激活函数；
输出层
输出层总共10个神经元（因为输出结果 0~9 共十个）；
采用 linear 线性激活函数以及 Softmax 激活函数做对比；

模型参数：

layer1:
- $w^{[1]}$ .shape is (784, 25)
- $b^{[1]}$ .shape is (25,)
layer2:
- $w^{[2]}$ .shape is (25, 15)
- $b^{[2]}$ .shape is: (15,)
layer3:
- $w^{[3]}$ .shape is (15, 10)
- $b^{[3]}$ .shape is: (10,)

模型创建程序

from tensorflow.keras import layers
from tensorflow.keras.models import Sequential

model = Sequential([
    layers.Flatten(input_shape=(28, 28)),
    layers.Dense(25, activation='relu', name='layer1'),
    layers.Dense(15, activation='relu', name='layer2'),
    layers.Dense(10, activation='softmax', name='layer3')
], name='Minst_Model'
)

model.summary()

layers.Dense 是TensorFlow中的一个类，用于定义全连接层。全连接层也被称为密集层（Dense Layer），因为其中每个神经元都与前一层的每个神经元相连，可以将前一层的所有输入都与权重相乘并加上偏置，得到一组输出。
- units：全连接层中的神经元数量；
- activation：全连接层的激活函数。如果没有指定，默认使用线性激活函数；
- name：全连接层的名称；
model.summary() 用于打印模型的摘要信息，包括每一层的名称、形状和参数数量。它非常有用，可以让我们快速查看模型的架构和大小，以便进行调试和优化。

在这里插入图片描述

模型编译程序

model.compile(
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.001)
)

SparseCategoricalCrossentropy(from_logits=True) 损失函数
- Sparse 表示标签是稀疏编码（即单个整数），而不是独热编码；Categorical 表示标签是分类数据；Crossentropy 表示使用交叉熵来计算损失。
- from_logits=True 如果在输出层的Dense中选择Softmax作为激活函数，那么在使用交叉熵损失函数时，需要将from_logits参数设置为False；若未使用 Softmax 则需要设置 from_logits 参数为 True。
使用 Adam 优化算法来更新模型的权重参数，并且设置学习率 learning_rate 为0.001。

模型训练程序

history = model.fit(
    x_train, y_train,
    epochs=40
)

model.fit() 是Keras中用于训练模型的方法，其作用是对给定的训练数据进行模型训练，并返回训练过程中的历史记录，包括损失函数和指定的评价指标的值。在模型训练过程中，model.fit()方法将对模型的参数进行更新，使得模型的预测结果逐渐接近真实结果，从而提高模型的性能。

x_train 和 y_train 分别表示训练数据的输入特征和标签；
epochs 表示训练的轮数
history 记录历史记录信息，包含每次的损失值等

在这里插入图片描述
使用损失值构建损失值下降图像：

import matplotlib.pyplot as plt

# 获取训练历史中的 loss 值和验证集上的 loss 值
train_loss = history.history['loss']

# 绘制 loss 曲线
plt.plot(train_loss, label='Training Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.show()

在这里插入图片描述

模型预测程序

from sklearn.metrics import accuracy_score
import numpy as np

pred_y = model.predict(x_test)
pred_labels = np.argmax(pred_y, axis=1)
acc = accuracy_score(y_test, pred_labels)
print('Test accuracy:', acc)

model.predict() 是神经网络模型用于进行预测的方法；
np.argmax() 获取每个输入的最大值索引，以获取预测标签；
accuracy_score 是一个来自Scikit-learn库的函数，用于计算分类问题的准确率；

完整代码

import tensorflow as tf
from tensorflow import keras

# # # 输入数据
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
print(x_train.shape)

# # # 显示训练集中前25个图像
import matplotlib.pyplot as plt

fig, axes = plt.subplots(nrows=5, ncols=5, figsize=(10,10))
axes = axes.ravel()

for i in range(25):
    axes[i].imshow(x_train[i], cmap='gray')
    axes[i].set_title(y_train[i])
    axes[i].axis('off')

plt.subplots_adjust(hspace=0.5)
plt.show()

# # # 设定神经网络模型
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

model = Sequential([
    layers.Flatten(input_shape=(28, 28, 1)),
    layers.Dense(25, activation='relu', name='layer1'),
    layers.Dense(15, activation='relu', name='layer2'),
    layers.Dense(10, activation='softmax', name='layer3')
],name='MINST_Model'
)

model.summary()

# # # 神经网络编译部分
model.compile(
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
)

# # # 训练模型
history = model.fit(
    x_train, y_train,
    epochs=40
)

# # # 根据 Loss 绘制曲线
import matplotlib.pyplot as plt

train_loss = history.history['loss']

plt.plot(train_loss, label='Training Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.show()

# # # 进行预测
from sklearn.metrics import accuracy_score
import numpy as np

pred_y = model.predict(x_test)
pred_labels = np.argmax(pred_y, axis=1)
acc = accuracy_score(y_test, pred_labels)
print('Test accuracy:', acc)