终于把AUC的计算方式搞懂了！

news2025/2/25 16:31:25

1. 横纵坐标

纵坐标：sensitivity或者TPR

在这里插入图片描述

横坐标：FPR 或者 1-Specificity

在这里插入图片描述

2. 计算方法

2.1 方法1

def get_roc_auc(y_true, y_score):
    """
    正样本得分大于负样本得分的概率，需要遍历每个正样本和每个负样本
    1. 选取所有正样本与负样本的两两组合
    2. 计算正样本预测值pos_score大于负样本预测值neg_score的概率：
        如果pos_score>neg_score，概率为1
        如果pos_score==neg_score，概率为0.5
        如果pos_score<neg_score，概率为0
    如果有M个正样本，N个负样本，则会产生M × N 个样本对，所以算法时间复杂度为O ( M × N ) 。
    :param y_true:
    :param y_score:
    :return:
    """
    gt_pred = list(zip(y_true, y_score))
    probs = []
    pos_samples = [x for x in gt_pred if x[0] == 1]
    neg_samples = [x for x in gt_pred if x[0] == 0]

    # 计算正样本大于负样本的概率
    for pos in pos_samples:
        for neg in neg_samples:
            if pos[1] > neg[1]:
                probs.append(1)
            elif pos[1] == neg[1]:
                probs.append(0.5)
            else:
                probs.append(0)
    return np.mean(probs)
y_true = [0, 0, 1, 1, 0, 1, 0, 1, 1, 1]
y_score = [0.1, 0.4, 0.6, 0.6, 0.7, 0.7, 0.8, 0.8, 0.9, 0.9]

print(get_roc_auc(y_true, y_score))

2.2 方法2

根据公式计算

在这里插入图片描述

def get_roc_auc(y_true, y_score):
    ranks = enumerate(sorted(zip(y_true, y_score), key=lambda x: x[-1]), start=1)
    pos_ranks = [x[0] for x in ranks if x[1][0] == 1]
    M = sum(y_true)
    N = len(y_true) - M
    auc = (sum(pos_ranks) - M * (M + 1) / 2) / (M * N)
    return auc

2.3 方法3

from sklearn.metrics import roc_auc_score
import numpy as np
y_true = np.array([1]*1693+[0]*8307)
y_score = np.random.rand(10000)
auc = roc_auc_score(y_true, y_score)
auc