Logistic Regression 虽然被称为回归,但其实际上是分类模型,并常用于二分类。
Logistic 回归的本质是:假设数据服从这个分布,然后使用极大似然估计做参数的估计。
逻辑回归API介绍
sklearn.linear_model.LogisticRegression(solver-'liblinear', penalty='12', C=1.0)
solver可选参数:('liblinear', 'sag', 'saga','newton-cg', 'Ibfgs'),默认:'liblinear':用于优化问题的算法。
对于小数据集来说,“liblinear”是个不错的选择,而“sag”和'saga'对于大型数据集会更快。
对于多类问题,只有'newton-cg','sag','saga'和'lbfgs'可以处理多项损失;olinear”仅限于“one-versus-rest”分类。
penalty:正则化的种类
C:正则化力度
默认将类别数量少的当做正例
LogisticRegression方法相当于 SGDClassifier(loss="log",penalty="")、SGDClassifier实现了一个普通的随机梯度下降学习,而使用LogisticRegression(实现了SAG)。
逻辑回归API使用
使用【威斯康星州乳腺癌症数据集】,预测癌症分类,采用逻辑回归分析算法
数据集下载位置:UCI Machine Learning Repository
import joblib
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
cols = ['Sample-code-number', 'Clump-Thickness', 'Uniformity-of-Cell-Size', 'Uniformity-of-Cell-Shape', 'Marginal-Adhesion', 'Single-Epithelial-Cell-Size', 'Bare-Nuclei', 'Bland-Chromatin', 'Normal-Nucleoli', 'Mitoses', 'Class']
bcw = pd.read_csv('./data/breast-cancer-wisconsin-original/breast-cancer-wisconsin.data', names=cols)
bcw = bcw.replace(to_replace='?', value=np.NaN)
bcw = bcw.dropna()
x = bcw.iloc[:, 1:10]
y = bcw['Class']
# 数据分割x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=10)
# 标准化transfer = StandardScaler()
x_train = transfer.fit_transform(x_train)
x_test = transfer.fit_transform(x_test)
# 逻辑回归estimator = LogisticRegression()
estimator.fit(x_train, y_train)
print("score:\n", estimator.score(x_test, y_test))
# 保存估计器
joblib.dump(value=estimator, filename='./data/breast-cancer-wisconsin-original/bcw.pkl')
运行结果: