需求:
现在要根据学生的学习时间来预测学习成绩,给出现有数据,用来训练模型并预测新数据。
分析:
使用线性回归模型。
代码:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
dataset = pd.read_csv('studentscores.csv')
X = dataset.iloc[ : , :1 ].values
Y = dataset.iloc[ : , 1 ].values
X_train, X_test, Y_train, Y_test = train_test_split( X, Y, test_size = 1/4, random_state = 0)
regressor = LinearRegression()
regressor = regressor.fit(X_train, Y_train)
Y_pred = regressor.predict(X_test)
plt.scatter(X_train , Y_train, color = 'red')
plt.plot(X_train , regressor.predict(X_train), color ='blue')
plt.scatter(X_test , Y_test, color = 'blue')
plt.scatter(X_test , Y_pred, color = 'green')
plt.plot(X_test , Y_pred, color ='yellow')
plt.show()
运行结果:
结论:
如图,预测的绿色数据与蓝色的实际数据是接近的,符合拟合线的线性规律,预测是成功的。