引言
本篇是之前有一个需求,需要用python来画箱型图,但要求很多,所以我也不断再版,今天突然想起来这个东西可以总结一下,正好马上得思考下一步做啥了,有足够的空闲时间,所以准备把一些基础概念再好好复习一遍。
箱型图原理
关于原理,这里推荐两篇本站写得比较好的:
Matplotlib - 箱线图、箱型图 boxplot () 所有用法详解
Python 箱型图的绘制并提取特征值
我这里也是根据这两篇作为参考,箱型图的介绍如第二篇中画的那张原理图一样:
而如果用python来绘制箱型图,具体的源码字段为:
# Autogenerated by boilerplate.py. Do not edit as changes will be lost.
@_copy_docstring_and_deprecators(Axes.boxplot)
def boxplot(
x, notch=None, sym=None, vert=None, whis=None,
positions=None, widths=None, patch_artist=None,
bootstrap=None, usermedians=None, conf_intervals=None,
meanline=None, showmeans=None, showcaps=None, showbox=None,
showfliers=None, boxprops=None, labels=None, flierprops=None,
medianprops=None, meanprops=None, capprops=None,
whiskerprops=None, manage_ticks=True, autorange=False,
zorder=None, capwidths=None, *, data=None):
return gca().boxplot(
x, notch=notch, sym=sym, vert=vert, whis=whis,
positions=positions, widths=widths, patch_artist=patch_artist,
bootstrap=bootstrap, usermedians=usermedians,
conf_intervals=conf_intervals, meanline=meanline,
showmeans=showmeans, showcaps=showcaps, showbox=showbox,
showfliers=showfliers, boxprops=boxprops, labels=labels,
flierprops=flierprops, medianprops=medianprops,
meanprops=meanprops, capprops=capprops,
whiskerprops=whiskerprops, manage_ticks=manage_ticks,
autorange=autorange, zorder=zorder, capwidths=capwidths,
**({"data": data} if data is not None else {}))
(引用自:https://github.com/matplotlib/matplotlib/blob/v3.7.1/lib/matplotlib/pyplot.py#L2473-L2494)
而根据上述两篇中的解释,更改了一些介绍为:
参数 | 说明 | 参数 | 说明 |
---|---|---|---|
x | 指定要绘制箱线图的数据,可以是一组数据也可以是多组数据; | showcaps | 是否显示箱线图顶端和末端的两条线,默认显示; |
notch | 是否以凹口的形式展现箱线图,默认非凹口,即矩形 | showbox | 是否显示箱线图的箱体,默认显示; |
sym | 指定异常点的形状,默认为蓝色的+号显示; | showfliers | 是否显示异常值,默认显示; |
vert | 是否需要将箱线图垂直摆放,默认垂直摆放,False为水平; | boxprops | 设置箱体的属性,如边框色,填充色等; |
whis | 指定上下须与上下四分位的距离,默认为1.5倍的四分位差; | labels | 为箱线图添加标签,图例 |
positions | 指定箱线图的位置,默认为range(1, N+1),N为箱线图的数量; | filerprops | 设置异常值的属性,如异常点的形状、大小、填充色等; |
widths | 指定箱线图的宽度,默认为0.5; | medianprops | 设置中位数的属性,如线的类型、粗细等; |
patch_artist | 是否填充箱体的颜色,默认为False; | meanprops | 设置均值的属性,如点的大小、颜色等; |
meanline | 是否用线的形式表示均值,默认用点来表示; | capprops | 设置箱线图顶端和末端线条的属性,如颜色、粗细等; |
showmeans | 是否显示均值,默认不显示; | whiskerprops | 设置须的属性,如颜色、粗细、线的类型等; |
manage_ticks | 是否自适应标签位置,默认为True; | autorange | 是否自动调整范围,默认为False; |
那下面直接进入实战阶段。
箱型图的绘制
这里直接给出一个简版,因为我的点是从无人机视频流中的人提取出来的,所以就省略前面的细节,直接给出一个简版,首先是提取行人平均行动轨迹:
def throw_time(array,start_x,end_x,y):
indexs = []
index = 1
person_throw_time = []
for i in range(max(array[:,1])):
if i == 0:
continue
each_person_data = array[array[:,1] == i]
each_person_data = each_person_data[each_person_data[:,2]>start_x]
each_person_data = each_person_data[each_person_data[:,2]<end_x]
each_person_data = each_person_data[each_person_data[:,3]>y]
if each_person_data.shape[0] < 4:
continue
each_person_data[:,2] = each_person_data[:,2] + (each_person_data[:,4] / 2)
each_person_data[:,3] = each_person_data[:,3] + (each_person_data[:,5] / 2)
person_time = (each_person_data[-1,0] - each_person_data[0,0])*0.04
print("person time = ",person_time)
if person_time < 5:
continue
person_throw_time.append(person_time)
indexs.append(index)
index = index + 1
return indexs,person_throw_time
indexs1,person_throw_time1 = throw_time(array1,500,1400,400)
# print(person_throw_time1)
# [10.36, 9.76, 9.48, 9.56, 6.16, 8.36, 8.6, 8.76, 5.6000000000000005, 9.84, 8.0, 9.88, 8.36, 9.16, 8.0, 8.92, 8.32, 9.68, 7.6000000000000005, 8.24, 7.08, 8.8, 8.6, 9.88, 9.64, 9.36, 10.16, 9.56, 7.4, 9.32, 8.48, 9.88, 9.16, 9.48, 9.64, 8.76]
indexs2,person_throw_time2 = throw_time(array2,500,1400,400)
indexs3,person_throw_time3 = throw_time(array3,450,1300,400)
indexs4,person_throw_time4 = throw_time(array4,600,1400,400)
然后就会得到一系列的散点以及它们的索引坐标,这时候再根据这个去画图:
matplotlib.rc("font", family='Times New Roman')
plt.ylabel('time(s)', fontsize=18)
# # 绘图
ax = plt.subplot()
ax.boxplot([person_throw_time1, person_throw_time2, person_throw_time3, person_throw_time4], widths=0.4, patch_artist=True,showfliers=False,boxprops={'facecolor': 'skyblue', 'linewidth': 0.8,'edgecolor': 'black'},meanline=True,meanprops={'color': 'red', 'linewidth': 3})
# 设置轴坐标值刻度的标签
ax.set_xticklabels(['List 1', 'List 2', 'List 3', 'List 4'], fontsize=14)
plt.show()
这里我选用的代码创建了一个包含四个框的箱线图,每个框包含来自 [person_throw_time1、person_throw_time2、person_throw_time3、person_throw_time4] 列表之一的数据。方框填充了天蓝色,并在它们周围绘制了黑色边缘,在每个框的平均值处绘制一条红线,以及不显示离群值。
或许大部分人都是做到这就满足需求了,我开始也以为是的,因为上述是基于第一版的一些偏差颜色以及图例错误后的第二版改进,但最终我做到了第6版,并且又重新更改了画图逻辑。
根据原理绘制箱型图
有没有一种情况,需求给出了另一组不知道从哪里得来的数据,希望我产生一个对比图,而它的数据是直接给出了箱型图的5个点,没有做过多的掩饰,我也没有一丝丝防备,就这样出现,直接丢给了我一张Excel表,我。。。然后就整理好了数据,将上述我的[person_throw_time1、person_throw_time2、person_throw_time3、person_throw_time4]转化成dataframe并使用describe找到其对应的5等分点,这里因为真实数据涉及一些安全问题,以简单的数字代替,即:
import pandas as pd
# 假设这是您的四个列表
person_throw_time1 = [1, 2, 3, 4, 5]
person_throw_time2 = [6, 7, 8, 9, 10]
person_throw_time3 = [11, 12, 13, 14, 15]
person_throw_time4 = [16, 17, 18, 19, 20]
# 将四个列表合并成一个dataframe
data = pd.DataFrame({'data1': data1, 'data2': data2, 'data3': data3, 'data4': data4})
# 使用describe方法计算统计信息
statistics = data.describe()
print(statistics)
那么可以得到相对应的数据:
data1 data2 data3 data4
count 5.000000 5.000000 5.000000 5.000000
mean 3.000000 8.000000 13.000000 18.000000
std 1.581139 1.581139 1.581139 1.581139
min 1.000000 6.000000 11.000000 16.000000
25% 2.000000 7.000000 12.000000 17.000000
50% 3.000000 8.000000 13.000000 18.000000
75% 4.000000 9.000000 14.000000 19.000000
max 5.000000 10.000000 15.000000 20.000000
我这里重新整理了一下,三组实验结果放到一起为(PS:做了一些修改,所以非标准的五分位):
[
["List 1", 6.1, 9.15, 9.84, 10.44, 11.16],
["2epochs List 1", 7.0, 9.47, 10.05, 10.81, 12.02],
["3epochs List 1", 14.16, 18.41, 20.19, 21.08, 25.42],
["List 2", 6.54, 8.65, 9.1, 9.39, 10.08],
["2epochs List 2", 7.31, 9.1, 9.5, 10.31, 10.86],
["3epochs List 2", 10.32, 14.18, 15.42, 18.08, 20.72],
["List 3", 6.14, 8.1, 8.44, 9.1, 9.82],
["2epochs List 3", 6.22, 8.3, 8.7, 9.2, 10.12],
["3epochs List 3", 8.72, 10.61, 12.71, 16.11, 17.91],
["List 4", 7.1, 8.75, 8.84, 9.1, 10.96],
["2epochs List 4", 7.3, 8.85, 9.04, 9.1, 11.19],
["3epochs List 4", 7.6, 8.3, 8.4, 9.0, 12.55]
]
但影响不大,这里针对上面数据重新画图为:
import matplotlib.pyplot as plt
import matplotlib
data = [
["List 1", 6.1, 9.15, 9.84, 10.44, 11.16],
["2epochs List 1", 7.0, 9.47, 10.05, 10.81, 12.02],
["3epochs List 1", 14.16, 18.41, 20.19, 21.08, 25.42],
["List 2", 6.54, 8.65, 9.1, 9.39, 10.08],
["2epochs List 2", 7.31, 9.1, 9.5, 10.31, 10.86],
["3epochs List 2", 10.32, 14.18, 15.42, 18.08, 20.72],
["List 3", 6.14, 8.1, 8.44, 9.1, 9.82],
["2epochs List 3", 6.22, 8.3, 8.7, 9.2, 10.12],
["3epochs List 3", 8.72, 10.61, 12.71, 16.11, 17.91],
["List 4", 7.1, 8.75, 8.84, 9.1, 10.96],
["2epochs List 4", 7.3, 8.85, 9.04, 9.1, 11.19],
["3epochs List 4", 7.6, 8.3, 8.4, 9.0, 12.55]
]
# 提取数据和标签
labels = [row[0] for row in data]
box_data = [row[1:] for row in data]
# 设置字体
matplotlib.rc("font", family='Times New Roman')
# 绘制箱型图
fig, ax = plt.subplots()
ax.boxplot(box_data, widths=0.4, patch_artist=True, showfliers=False,
boxprops={'facecolor': 'skyblue', 'linewidth': 0.8, 'edgecolor': 'black'},
meanline=True, meanprops={'color': 'red', 'linewidth': 3})
# 设置轴标签
ax.set_ylabel('time(s)', fontsize=18)
ax.set_xticklabels(labels, rotation=45, fontsize=12)
plt.show()
但画完之后还有个问题,就是有些箱型图的上下界限没有了,不知道是什么原因,所以这里还需要把这个重新调试出来,这里就需要用python画箱型图的另一种格式,即将上面的data转化成字典的格式:
data = [
["List 1", 6.1, 9.15, 9.84, 10.44, 11.16],
["2epochs List 1", 7.0, 9.47, 10.05, 10.81, 12.02],
["3epochs List 1", 14.16, 18.41, 20.19, 21.08, 25.42],
["List 2", 6.54, 8.65, 9.1, 9.39, 10.08],
["2epochs List 2", 7.31, 9.1, 9.5, 10.31, 10.86],
["3epochs List 2", 10.32, 14.18, 15.42, 18.08, 20.72],
["List 3", 6.14, 8.1, 8.44, 9.1, 9.82],
["2epochs List 3", 6.22, 8.3, 8.7, 9.2, 10.12],
["3epochs List 3", 8.72, 10.61, 12.71, 16.11, 17.91],
["List 4", 7.1, 8.75, 8.84, 9.1, 10.96],
["2epochs List 4", 7.3, 8.85, 9.04, 9.1, 11.19],
["3epochs List 4", 7.6, 8.3, 8.4, 9.0, 12.55]
]
def convert_to_dict(data):
draw_data = []
for row in data:
draw_data.append({
"whislo": row[1],
"q1": row[2],
"med": row[3],
"q3": row[4],
"whishi": row[5]
})
return draw_data
draw_data = convert_to_dict(data)
print(draw_data)
# [{'whislo': 6.1, 'q1': 9.15, 'med': 9.84, 'q3': 10.44, 'whishi': 11.16}, {'whislo': 7.0, 'q1': 9.47, 'med': 10.05, 'q3': 10.81, 'whishi': 12.02}, {'whislo': 14.16, 'q1': 18.41, 'med': 20.19, 'q3': 21.08, 'whishi': 25.42}, {'whislo': 6.54, 'q1': 8.65, 'med': 9.1, 'q3': 9.39, 'whishi': 10.08}, {'whislo': 7.31, 'q1': 9.1, 'med': 9.5, 'q3': 10.31, 'whishi': 10.86}, {'whislo': 10.32, 'q1': 14.18, 'med': 15.42, 'q3': 18.08, 'whishi': 20.72}, {'whislo': 6.14, 'q1': 8.1, 'med': 8.44, 'q3': 9.1, 'whishi': 9.82}, {'whislo': 6.22, 'q1': 8.3, 'med': 8.7, 'q3': 9.2, 'whishi': 10.12}, {'whislo': 8.72, 'q1': 10.61, 'med': 12.71, 'q3': 16.11, 'whishi': 17.91}, {'whislo': 7.1, 'q1': 8.75, 'med': 8.84, 'q3': 9.1, 'whishi': 10.96}, {'whislo': 7.3, 'q1': 8.85, 'med': 9.04, 'q3': 9.1, 'whishi': 11.19}, {'whislo': 7.6, 'q1': 8.3, 'med': 8.4, 'q3': 9.0, 'whishi': 12.55}]
这里拿到列表转化成的字典后,同时对 ax.boxplot()
变成 ax.bxp()
,因为boxplot用于绘制单个箱线图,而bxp是多个,每个箱线图都可以由五个统计值(最小值、下四分位数、中位数、上四分位数和最大值)来描述。所以代码为:
import matplotlib.pyplot as plt
import matplotlib
data = [
["List 1", 6.1, 9.15, 9.84, 10.44, 11.16],
["2epochs List 1", 7.0, 9.47, 10.05, 10.81, 12.02],
["3epochs List 1", 14.16, 18.41, 20.19, 21.08, 25.42],
["List 2", 6.54, 8.65, 9.1, 9.39, 10.08],
["2epochs List 2", 7.31, 9.1, 9.5, 10.31, 10.86],
["3epochs List 2", 10.32, 14.18, 15.42, 18.08, 20.72],
["List 3", 6.14, 8.1, 8.44, 9.1, 9.82],
["2epochs List 3", 6.22, 8.3, 8.7, 9.2, 10.12],
["3epochs List 3", 8.72, 10.61, 12.71, 16.11, 17.91],
["List 4", 7.1, 8.75, 8.84, 9.1, 10.96],
["2epochs List 4", 7.3, 8.85, 9.04, 9.1, 11.19],
["3epochs List 4", 7.6, 8.3, 8.4, 9.0, 12.55]
]
def convert_to_dict(data):
draw_data = []
for row in data:
draw_data.append({
"whislo": row[1],
"q1": row[2],
"med": row[3],
"q3": row[4],
"whishi": row[5]
})
return draw_data
draw_data = convert_to_dict(data)
matplotlib.rc("font", family='Times New Roman')
plt.ylabel('time(s)', fontsize=18)
ax = plt.subplot()
# ax.boxplot([row1_data, row2_data, row3_data, row4_data, row5_data, row6_data, row7_data, row8_data], widths=0.4, patch_artist=True,showfliers=False,boxprops={'facecolor': 'skyblue', 'linewidth': 0.8,'edgecolor': 'black'},meanline=True,meanprops={'color': 'red', 'linewidth': 3})
ax.bxp(draw_data, widths=0.4, patch_artist=True,showfliers=False,boxprops={'facecolor': 'skyblue', 'linewidth': 0.8,'edgecolor': 'black'},meanline=True,meanprops={'color': 'red', 'linewidth': 3})
# boxplot
# ax.bxp(draw_data, showfliers=False)
ax.set_xticklabels(['List 1', '2epochs List 1', '3epochs List 1', 'List 2', '2epochs List 2', '3epochs List 2', 'List 3', '2epochs List 3', '3epochs List 3', 'List 4', '2epochs List 4', '3epochs List 4'], fontsize=14)
plt.show()