🎯要点
- 回归模型评估指标
- 评估薪水预测模型
- 评估员工倦怠率模型
- 评估大气分析生成式对抗模型
- 目标对象缺失下,性能估算法追踪模型误差指标
- 降尺度大气学模拟模型准确性评估
- 蛋白染色质相互作用模型评估
Python回归误差指标
平均绝对误差表示数据集中实际值和预测值之间的绝对差的平均值。它测量数据集中残差的平均值。
M A E = 1 N ∑ i = 1 N ∣ y i − y ^ ∣ M A E=\frac{1}{N} \sum_{i=1}^N\left|y_i-\hat{y}\right| MAE=N1i=1∑N∣yi−y^∣
Python方法一:使用公式
actual = [2, 3, 5, 5, 9]
calculated = [3, 3, 8, 7, 6]
n = 5
sum = 0
# for loop for iteration
for i in range(n):
sum += abs(actual[i] - calculated[i])
error = sum/n
# display
print("Mean absolute error : " + str(error))
方法二:使用sklearn
from sklearn.metrics import mean_absolute_error as mae
actual = [2, 3, 5, 5, 9]
calculated = [3, 3, 8, 7, 6]
error = mae(actual, calculated)
print("Mean absolute error : " + str(error))
均方误差表示数据集中原始值和预测值之间的平方差的平均值。它测量残差的方差。
M S E = 1 N ∑ i = 1 N ( y i − y ^ ) 2 M S E=\frac{1}{N} \sum_{i=1}^N\left(y_i-\hat{y}\right)^2 MSE=N1i=1∑N(yi−y^)2
示例:给定的数据点:(1,1)、(2,1)、(3,2)、(4,2)、(5,4) ,回归线方程:Y = 0.7X – 0.1
X
Y
Y
^
i
1
1
0.6
2
1
1.29
3
2
1.99
4
2
2.69
5
4
3.4
\begin{array}{|l|l|l|} \hline X & Y & \hat{Y}_i \\ \hline 1 & 1 & 0.6 \\ \hline 2 & 1 & 1.29 \\ \hline 3 & 2 & 1.99 \\ \hline 4 & 2 & 2.69 \\ \hline 5 & 4 & 3.4 \\ \hline \end{array}
X12345Y11224Y^i0.61.291.992.693.4
方法一:
from sklearn.metrics import mean_squared_error
Y_true = [1,1,2,2,4]
Y_pred = [0.6,1.29,1.99,2.69,3.4]
mean_squared_error(Y_true,Y_pred)
输出: 0.21606
方法二:
import numpy as np
Y_true = [1,1,2,2,4]
Y_pred = [0.6,1.29,1.99,2.69,3.4]
MSE = np.square(np.subtract(Y_true,Y_pred)).mean()
输出: 0.21606
均方根误差是均方误差的平方根。它测量残差的标准偏差。
R M S E = M S E = 1 N ∑ i = 1 N ( y i − y ^ ) 2 R M S E=\sqrt{M S E}=\sqrt{\frac{1}{N} \sum_{i=1}^N\left(y_i-\hat{y}\right)^2} RMSE=MSE=N1i=1∑N(yi−y^)2
使用Scikit-learn
from sklearn.metrics import mean_squared_error
import numpy as np
# Example arrays (replace with your data)
y_true = np.array([3, -0.5, 2, 7])
y_pred = np.array([2.5, 0.0, 2, 8])
mse = mean_squared_error(y_true, y_pred)
rmse = np.sqrt(mse)
print(f"Root Mean Square Error (RMSE): {rmse}")
输出:0.6123724356957945
计算回归模型
import numpy as np
import pandas as pd
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
boston = fetch_openml(data_id=531)
data = pd.DataFrame(boston.data, columns=boston.feature_names)
data['PRICE'] = boston.target
X = data.drop('PRICE', axis=1).values # Convert to NumPy array
y = data['PRICE'].values # Convert to NumPy array
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize linear regression model
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Calculate RMSE (Root Mean Squared Error)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print(f"Root Mean Squared Error: {rmse}")
输出:4.928602182665333
判定系数或 R 平方表示线性回归模型解释的因变量方差的比例。它是一个无标度分数,即无论值是小还是大,R 平方的值都会小于一。
R 2 = 1 − ∑ ( y i − y ^ ) 2 ∑ ( y i − y ˉ ) 2 R^2=1-\frac{\sum\left(y_i-\hat{y}\right)^2}{\sum\left(y_i-\bar{y}\right)^2} R2=1−∑(yi−yˉ)2∑(yi−y^)2
from sklearn.metrics import r2_score
y =[10, 20, 30]
f =[10, 20, 30]
r2 = r2_score(y, f)
print('r2 score for perfect model is', r2)
r2 score for perfect model is 1.0
y =[10, 20, 30]
f =[20, 20, 20]
r2 = r2_score(y, f)
print('r2 score for a model which predicts mean value always is', r2)
r2 score for a model which predicts mean value always is 0.0
y = [10, 20, 30]
f = [30, 10, 20]
r2 = r2_score(y, f)
print('r2 score for a worse model is', r2)
r2 score for a worse model is -2.0