案例目的
有一份EXCEL标注数据,如下,训练出合适的模型来预测儿童神经缺陷分类。
参考文章:机器学习——5.案例: 乳腺癌预测-CSDN博客
代码逻辑步骤
- 读取数据
- 训练集与测试集拆分
- 数据标准化
- 数据转化为Pytorch张量
- label维度转换
- 定义模型
- 定义损失计算函数
- 定义优化器
- 定义梯度下降函数
- 模型训练(正向传播、计算损失、反向传播、梯度清空)
- 模型测试
- 精度计算
代码实现
import numpy as np
import pandas as pd
import torch
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
df = pd.read_excel('/Users/guojun/Desktop/Learning/machine_learning/Preprocess_Without_WDE_Channels_Data.xlsx')
X = df[df.columns[0:8]].values
mapping = {"TD":0,"ADHD":1}
Y = df["Class"].replace(mapping)
# 数据集拆分
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.2,random_state=5)
Y_train = Y_train.to_numpy()
Y_test = Y_test.to_numpy()
# 数据标准化
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.fit_transform(X_test)
# 转化为张量
X_train = torch.from_numpy(X_train.astype(np.float32))
X_test = torch.from_numpy(X_test.astype(np.float32))
Y_train = torch.from_numpy(Y_train.astype(np.float32))
Y_test = torch.from_numpy(Y_test.astype(np.float32))
# 真值转为为二维数据
Y_train = Y_train.view(Y_train.shape[0],-1)
Y_test = Y_test.view(Y_test.shape[0],-1)
# 定义模型
class Model(torch.nn.Module):
def __init__(self,n_input_features):
super(Model,self).__init__()
self.linear = torch.nn.Linear(n_input_features,1)
def forward(self,x):
return torch.sigmoid(self.linear(x))
model = Model(X_train.shape[1])
# 定义损失函数
loss = torch.nn.BCELoss()
# 定义优化器
learning_rate = 0.001
optimizer = torch.optim.Adam(model.parameters(),lr=learning_rate)
# 梯度下降函数
def gradient_descent():
# 预测Y值
pre_y = model(X_train)
# 计算损失
l = loss(pre_y,Y_train)
# 反向传播
l.backward()
# 梯度更新
optimizer.step()
# 梯度清空
optimizer.zero_grad()
return l,list(model.parameters())
# 模型训练
for i in range(10000):
l,p = gradient_descent()
print(l,p)
# 模型测试
mapping = {0:"TD",1:"ADHD"}
index = np.random.randint(0,X_test.shape[0])
pre_y = model(X_test[index])
pre_y = mapping[int(pre_y.round().item())]
gt_y = mapping[int(Y_test[index].item())]
print(pre_y,gt_y)
# 计算模型准确率
pres_y = model(X_test).round()
result = np.where(pres_y==Y_test,1,0)
ac = np.sum(result)/result.size
print(ac)
即使调整参数后,损失在0.68左右就不会再下降了。
最终的准确率只有54%-60%,我会在后面的笔记中使用深度神经网络来重新训练,提升模型精度。