《统计学习方法：李航》笔记从原理到实现（基于python）-- 第5章决策树（代码python实践）

news2025/4/17 15:55:00

文章目录

第5章决策树—python 实践
- 书上题目5.1
- 利用ID3算法生成决策树，例5.3
- scikit-learn实例

《统计学习方法：李航》笔记从原理到实现（基于python）-- 第5章决策树

第5章决策树—python 实践

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from collections import Counter
import math
from math import log
import pprint

书上题目5.1

在这里插入图片描述

def create_data():
    datasets = [['青年', '否', '否', '一般', '否'],
               ['青年', '否', '否', '好', '否'],
               ['青年', '是', '否', '好', '是'],
               ['青年', '是', '是', '一般', '是'],
               ['青年', '否', '否', '一般', '否'],
               ['中年', '否', '否', '一般', '否'],
               ['中年', '否', '否', '好', '否'],
               ['中年', '是', '是', '好', '是'],
               ['中年', '否', '是', '非常好', '是'],
               ['中年', '否', '是', '非常好', '是'],
               ['老年', '否', '是', '非常好', '是'],
               ['老年', '否', '是', '好', '是'],
               ['老年', '是', '否', '好', '是'],
               ['老年', '是', '否', '非常好', '是'],
               ['老年', '否', '否', '一般', '否'],
               ]
    labels = [u'年龄', u'有工作', u'有自己的房子', u'信贷情况', u'类别']
    # 返回数据集和每个维度的名称
    return datasets, labels

datasets, labels = create_data()
train_data = pd.DataFrame(datasets, columns=labels)

	# 熵
def calc_ent(datasets):
    data_length = len(datasets)
    label_count = {
   }
    for i in range(data_length):
        label = datasets[i][-1]
        if