文章目录
- 四、实例
- 4.1 带有 Plotly Express 的直方图
- 4.1.1 基本直方图
- 4.1.2 使用一个包含分类数据的列
- 4.1.3 选择方箱的数量
- 4.1.4 日期数据直方图
- 4.1.5 分类数据的直方图
- 4.1.6 访问计数(y 轴)值
- 4.1.7 标准化类型
- 4.1.8 直方图的外观
- 4.1.9 一列不同值的几个直方图
- 4.1.10 与其他功能聚合count
- 4.1.11 适用于x轴上的分类和分箱数值
- 4.1.12 直方图使用图案
- 4.1.13 可视化分布
- 4.1.14 添加文本标签
- 4.1.15 Dash中的直方图
四、实例
在统计学中,直方图是数值数据分布的表示,其中数据被分箱并表示每个分箱的计数。更一般地说,在 Plotly 中,直方图是一个聚合条形图,具有几个可能的聚合函数(例如 sum、average、count…),可用于在分类轴和日期轴以及线性轴上可视化数据。
用于可视化分布的小提琴图的替代方法包括小提琴图、箱线图、ECDF 图和条形图。
如果您正在寻找条形图,即用矩形条表示原始的、未聚合的数据,请转到条形图教程。
4.1 带有 Plotly Express 的直方图
4.1.1 基本直方图
import plotly.express as px
df = px.data.tips()
print(df)
'''
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
.. ... ... ... ... ... ... ...
239 29.03 5.92 Male No Sat Dinner 3
240 27.18 2.00 Female Yes Sat Dinner 2
241 22.67 2.00 Male Yes Sat Dinner 2
242 17.82 1.75 Male No Sat Dinner 2
243 18.78 3.00 Female No Thur Dinner 2
[244 rows x 7 columns]
'''
fig = px.histogram(df, x="total_bill")
fig.show()
4.1.2 使用一个包含分类数据的列
import plotly.express as px
df = px.data.tips()
print(df)
'''
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
.. ... ... ... ... ... ... ...
239 29.03 5.92 Male No Sat Dinner 3
240 27.18 2.00 Female Yes Sat Dinner 2
241 22.67 2.00 Male Yes Sat Dinner 2
242 17.82 1.75 Male No Sat Dinner 2
243 18.78 3.00 Female No Thur Dinner 2
[244 rows x 7 columns]
'''
# 这里,我们使用一个包含分类数据的列
fig = px.histogram(df, x="day")
fig.show()
4.1.3 选择方箱的数量
默认情况下,选择 bin 的数量,以便此数字与 bin 中的典型样本数相当。可以自定义此数字以及值的范围。
import plotly.express as px
df = px.data.tips()
print(df)
'''
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
.. ... ... ... ... ... ... ...
239 29.03 5.92 Male No Sat Dinner 3
240 27.18 2.00 Female Yes Sat Dinner 2
241 22.67 2.00 Male Yes Sat Dinner 2
242 17.82 1.75 Male No Sat Dinner 2
243 18.78 3.00 Female No Thur Dinner 2
[244 rows x 7 columns]
'''
fig = px.histogram(df, x="total_bill", nbins=20)
fig.show()
4.1.4 日期数据直方图
除了数值数据之外,Plotly 直方图还会自动对日期数据进行分箱:
import plotly.express as px
df = px.data.stocks()
print(df)
'''
date GOOG AAPL AMZN FB NFLX MSFT
0 2018-01-01 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
1 2018-01-08 1.018172 1.011943 1.061881 0.959968 1.053526 1.015988
2 2018-01-15 1.032008 1.019771 1.053240 0.970243 1.049860 1.020524
3 2018-01-22 1.066783 0.980057 1.140676 1.016858 1.307681 1.066561
4 2018-01-29 1.008773 0.917143 1.163374 1.018357 1.273537 1.040708
.. ... ... ... ... ... ... ...
100 2019-12-02 1.216280 1.546914 1.425061 1.075997 1.463641 1.720717
101 2019-12-09 1.222821 1.572286 1.432660 1.038855 1.421496 1.752239
102 2019-12-16 1.224418 1.596800 1.453455 1.104094 1.604362 1.784896
103 2019-12-23 1.226504 1.656000 1.521226 1.113728 1.567170 1.802472
104 2019-12-30 1.213014 1.678000 1.503360 1.098475 1.540883 1.788185
[105 rows x 7 columns]
'''
fig = px.histogram(df, x="date")
fig.update_layout(bargap=0.2)
fig.show()
4.1.5 分类数据的直方图
Plotly 直方图将自动对数字或日期数据进行分箱,但也可用于原始分类数据,如下例所示,其中 X 轴值是分类“天”变量:
import plotly.express as px
df = px.data.tips()
print(df)
'''
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
.. ... ... ... ... ... ... ...
239 29.03 5.92 Male No Sat Dinner 3
240 27.18 2.00 Female Yes Sat Dinner 2
241 22.67 2.00 Male Yes Sat Dinner 2
242 17.82 1.75 Male No Sat Dinner 2
243 18.78 3.00 Female No Thur Dinner 2
[244 rows x 7 columns]
'''
fig = px.histogram(df, x="day", category_orders=dict(day=["Thur", "Fri", "Sat", "Sun"]))
fig.show()
4.1.6 访问计数(y 轴)值
JavaScript 在浏览器中动态计算 y 轴(计数)值,因此在fig. 您可以使用手动计算它np.histogram。
import plotly.express as px
import numpy as np
df = px.data.tips()
print(df)
'''
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
.. ... ... ... ... ... ... ...
239 29.03 5.92 Male No Sat Dinner 3
240 27.18 2.00 Female Yes Sat Dinner 2
241 22.67 2.00 Male Yes Sat Dinner 2
242 17.82 1.75 Male No Sat Dinner 2
243 18.78 3.00 Female No Thur Dinner 2
[244 rows x 7 columns]
'''
# 创建方箱
counts, bins = np.histogram(df.total_bill, bins=range(0, 60, 5))
bins = 0.5 * (bins[:-1] + bins[1:])
fig = px.bar(x=bins, y=counts, labels={'x':'total_bill', 'y':'count'})
fig.show()
4.1.7 标准化类型
默认模式是表示每个 bin 中的样本计数。使用该histnorm参数,还可以表示每个 bin (histnorm='percent’或probability) 中样本的百分比或分数,或密度直方图(所有条形区域的总和等于样本点的总数,density)或概率密度直方图 (所有条形区域的总和等于 1, probability density)。
import plotly.express as px
import numpy as np
df = px.data.tips()
print(df)
'''
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
.. ... ... ... ... ... ... ...
239 29.03 5.92 Male No Sat Dinner 3
240 27.18 2.00 Female Yes Sat Dinner 2
241 22.67 2.00 Male Yes Sat Dinner 2
242 17.82 1.75 Male No Sat Dinner 2
243 18.78 3.00 Female No Thur Dinner 2
[244 rows x 7 columns]
'''
fig = px.histogram(df, x="total_bill", histnorm='probability density')
fig.show()
4.1.8 直方图的外观
import plotly.express as px
import numpy as np
df = px.data.tips()
print(df)
'''
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
.. ... ... ... ... ... ... ...
239 29.03 5.92 Male No Sat Dinner 3
240 27.18 2.00 Female Yes Sat Dinner 2
241 22.67 2.00 Male Yes Sat Dinner 2
242 17.82 1.75 Male No Sat Dinner 2
243 18.78 3.00 Female No Thur Dinner 2
[244 rows x 7 columns]
'''
fig = px.histogram(df, x="total_bill",
title='账单柱状图',
labels={'total_bill':'total bill'}, # 可以为每个df列指定一个标签
opacity=0.8,
log_y=True, # 用对数刻度表示条形图
color_discrete_sequence=['indianred'] # 直方图条的颜色
)
fig.show()
4.1.9 一列不同值的几个直方图
import plotly.express as px
import numpy as np
df = px.data.tips()
print(df)
'''
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
.. ... ... ... ... ... ... ...
239 29.03 5.92 Male No Sat Dinner 3
240 27.18 2.00 Female Yes Sat Dinner 2
241 22.67 2.00 Male Yes Sat Dinner 2
242 17.82 1.75 Male No Sat Dinner 2
243 18.78 3.00 Female No Thur Dinner 2
[244 rows x 7 columns]
'''
fig = px.histogram(df, x="total_bill", color="sex")
fig.show()
4.1.10 与其他功能聚合count
对于 的每个 bin x,可以使用 计算数据的函数histfunc。的参数histfunc是作为参数给出的数据框列y。下图显示平均小费随着总账单的增加而增加。
import plotly.express as px
import numpy as np
df = px.data.tips()
print(df)
'''
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
.. ... ... ... ... ... ... ...
239 29.03 5.92 Male No Sat Dinner 3
240 27.18 2.00 Female Yes Sat Dinner 2
241 22.67 2.00 Male Yes Sat Dinner 2
242 17.82 1.75 Male No Sat Dinner 2
243 18.78 3.00 Female No Thur Dinner 2
[244 rows x 7 columns]
'''
fig = px.histogram(df, x="total_bill", y="tip", histfunc='avg')
fig.show()
4.1.11 适用于x轴上的分类和分箱数值
默认值histfunc是sumif y,并且适用于x轴上的分类和分箱数值数据:
import plotly.express as px
import numpy as np
df = px.data.tips()
print(df)
'''
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
.. ... ... ... ... ... ... ...
239 29.03 5.92 Male No Sat Dinner 3
240 27.18 2.00 Female Yes Sat Dinner 2
241 22.67 2.00 Male Yes Sat Dinner 2
242 17.82 1.75 Male No Sat Dinner 2
243 18.78 3.00 Female No Thur Dinner 2
[244 rows x 7 columns]
'''
fig = px.histogram(df, x="day", y="total_bill", category_orders=dict(day=["Thur", "Fri", "Sat", "Sun"]))
fig.show()
4.1.12 直方图使用图案
v5.0 中的新功能
除了颜色之外,直方图还可以使用图案(也称为影线或纹理) :
import plotly.express as px
import numpy as np
df = px.data.tips()
print(df)
'''
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
.. ... ... ... ... ... ... ...
239 29.03 5.92 Male No Sat Dinner 3
240 27.18 2.00 Female Yes Sat Dinner 2
241 22.67 2.00 Male Yes Sat Dinner 2
242 17.82 1.75 Male No Sat Dinner 2
243 18.78 3.00 Female No Thur Dinner 2
[244 rows x 7 columns]
'''
fig = px.histogram(df, x="sex", y="total_bill", color="sex", pattern_shape="smoker")
fig.show()
4.1.13 可视化分布
使用marginal关键字,在直方图旁边绘制边缘,可视化分布。有关组合统计表示的更多示例,请参见distplot 页面。
import plotly.express as px
import numpy as np
df = px.data.tips()
print(df)
'''
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
.. ... ... ... ... ... ... ...
239 29.03 5.92 Male No Sat Dinner 3
240 27.18 2.00 Female Yes Sat Dinner 2
241 22.67 2.00 Male Yes Sat Dinner 2
242 17.82 1.75 Male No Sat Dinner 2
243 18.78 3.00 Female No Thur Dinner 2
[244 rows x 7 columns]
'''
fig = px.histogram(df,
x="total_bill",
color="sex",
marginal="rug", # can be `box`, `violin`
hover_data=df.columns)
fig.show()
4.1.14 添加文本标签
v5.5 中的新功能
text_auto您可以使用该参数将文本添加到直方图条。将其设置为True将在条形图上显示值,并将其设置为d3-format格式化字符串将控制输出格式。
import plotly.express as px
import numpy as np
df = px.data.tips()
print(df)
'''
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
.. ... ... ... ... ... ... ...
239 29.03 5.92 Male No Sat Dinner 3
240 27.18 2.00 Female Yes Sat Dinner 2
241 22.67 2.00 Male Yes Sat Dinner 2
242 17.82 1.75 Male No Sat Dinner 2
243 18.78 3.00 Female No Thur Dinner 2
[244 rows x 7 columns]
'''
fig = px.histogram(df, x="total_bill", y="tip", histfunc="avg", nbins=8, text_auto=True)
fig.show()
4.1.15 Dash中的直方图
import dash
from dash import html, dcc
from dash.dependencies import Input, Output
import plotly.express as px
import numpy as np
np.random.seed(2020)
app = dash.Dash(__name__)
app.layout = html.Div([
dcc.Graph(id="graph"),
html.P("Mean:"),
dcc.Slider(id="mean", min=-3, max=3, value=0,
marks={-3: '-3', 3: '3'}),
html.P("Standard Deviation:"),
dcc.Slider(id="std", min=1, max=3, value=1,
marks={1: '1', 3: '3'}),
])
@app.callback(
Output("graph", "figure"),
[Input("mean", "value"),
Input("std", "value")])
def display_color(mean, std):
data = np.random.normal(mean, std, size=500)
print(data)
'''
[-1.76884571 0.07555227 -1.1306297 -0.65143017 -0.89311563 -1.27410098
-0.06115443 0.06451384 0.41011295 -0.57288249 -0.80133362 1.31203519
1.27469887 -1.2143576 0.31371941 -1.44482142 -0.3689613 -0.76922658
0.3926161 0.05729383 2.08997884 0.04197131 -0.04834072 -0.51315392
-0.08458928 -1.21545008 -1.41293073 -1.48691055 0.38222486 0.937673
1.77267804 0.87882801 0.33171912 -0.30603567 1.24026615 -0.21562684
0.15592948 0.09805553 0.83209585 2.04520542 -0.31681392 -1.31283291
-1.75445746 0.10209408 -1.36150208 0.48178488 -0.20832874 -0.09186351
0.70268816 0.10365506 0.62123638 0.95411497 2.03781352 -0.48445122
0.2071549 1.64424216 -0.4882074 -0.01782826 0.46891556 0.27987266
-0.64664972 -0.54406002 -0.16008985 0.03781172 1.03383296 -1.23096117
-1.24673665 0.29572055 2.1409624 -0.92020227 -0.06000238 0.27978391
-1.53126966 -0.30293101 -0.14601413 0.27746159 -0.13952066 0.69515966
-0.11338746 -1.233267 -0.79614131 -0.46739138 0.65890607 -0.41063115
0.17344356 0.28946174 1.03451736 1.22661712 1.71998252 0.40806834
0.32256894 1.04722748 -1.8196003 -0.42582157 0.12454883 2.31256634
-0.96557586 -0.34627486 0.96668378 -0.92550192 0.55144803 -1.15101182
-0.24530373 -0.16916885 0.54191968 -0.21796285 -0.27542638 0.12781997
-2.49328625 -0.11911366 1.66002349 0.44126608 -0.50099824 0.08545319
0.57885604 -1.061335 -1.01834268 1.15932373 0.6565008 -0.49106094
0.15943828 -0.25876601 -0.42051825 1.75008892 0.12919678 -0.33769453
1.23868247 0.22110608 -1.11308408 -0.60637155 0.27054316 0.00648499
2.52096753 -0.02435339 -0.260059 -0.94771623 -0.90213488 2.30356905
-0.38385218 -0.25652637 -1.1030378 0.33635408 -1.23259035 0.99796808
-0.25929974 -0.57258596 -0.39169753 -1.50435625 -0.76343336 -0.53281724
0.5619683 -0.18503558 -1.43271471 0.45665952 0.93274206 1.01679573
0.48786271 -0.7973298 -0.44064289 2.11272115 -0.7155943 1.7002807
-0.68800421 -0.86238661 2.19907196 -0.72256117 0.25075839 0.30001165
0.18127071 1.071563 -1.85778995 0.45358816 -0.15548118 -1.52880099
-0.25048291 0.22842104 2.4674465 0.63412382 -0.41913593 1.01488326
-1.82601609 -0.36064249 -1.08556908 0.03536548 0.31788188 -1.33243052
-0.19294459 -0.43933791 0.52872057 0.30489197 0.55634975 1.01838504
2.09581386 1.03971742 -0.46627811 0.26572919 -1.04170983 -2.0575446
-1.1405287 0.496132 -0.48524813 0.27222136 0.48290065 0.58505128
0.41736976 -0.31801094 -0.61906389 0.29703342 -0.50498446 0.28185463
-1.01757369 -2.30410713 -0.02812763 0.83306734 0.28551652 -1.05919402
0.06789323 -0.32109167 -0.93615035 1.04726177 -1.20284464 -1.57058391
-2.38416397 0.23843218 -2.28548874 0.74121585 -0.34934096 -1.36798989
0.92309987 -0.37640029 -0.71918783 0.15205127 -2.51132144 0.52241109
-0.86781256 1.05892019 -0.71582238 -0.82110228 -0.08442147 -0.37512665
-0.99143158 0.27387108 -1.25969713 -0.38907366 1.184678 -0.07459525
1.04960715 1.35802732 -0.30328797 -0.67975921 -0.34482172 1.32034251
-1.52612846 1.92860773 0.03071204 -1.61745347 0.24268661 0.40446596
-1.68850291 -1.58095471 -0.24019816 -2.70302208 0.45718562 -0.6151557
1.26050653 -0.3769568 -0.78898093 -0.71363544 -0.37740956 0.2238449
0.31469732 0.27064476 -0.61508574 0.57438377 -0.86977711 1.25909051
0.03318371 0.11861784 -0.42789813 0.37999048 2.5000827 1.50979141
-1.00087918 0.07359666 0.04309351 -0.84202161 -0.99471125 -1.6496145
1.52400039 0.10824926 1.07246633 0.47185274 2.07442097 -1.15445494
0.246371 -0.12223891 -1.4875469 1.58951267 1.77961438 -0.72445041
0.15346975 0.72422553 1.40228648 -1.53555829 -1.26995143 -1.59985938
-0.34571996 -0.98950834 0.92286235 1.77485815 -0.64191168 -0.41606534
-1.37238408 1.57239281 -0.24634457 -0.20659118 0.89954282 0.31743305
-0.36936696 0.44512672 -0.55275271 0.4662884 0.05750969 0.72986472
-0.18953635 0.35014856 -0.57641758 -0.17468383 -0.18285256 0.37901468
-0.47650198 -1.75101687 1.26340948 -2.87034123 0.94044485 0.01905991
-1.64852106 -0.37796136 -1.69736615 -0.86232098 1.40221569 -0.54233551
-1.13016516 0.72196603 -0.46492813 0.98329629 0.43128675 -0.97455242
1.15038435 0.01706441 -1.32837772 1.14729761 -0.77742592 1.05157387
2.64802417 0.27595959 0.70919231 0.35016045 -0.38811666 0.3758303
1.00834896 -1.15367331 0.94293499 -1.26934889 -0.93157746 0.8022372
-0.70082233 -0.14930599 -1.07110563 0.99582143 -0.34208301 0.71572118
-0.80998904 1.69096408 1.27574716 -0.21142844 -1.37539339 -0.8502035
0.43438055 -0.52467054 1.33994765 0.21338504 -0.55991531 0.97121591
-0.26988833 0.58076904 1.14069511 -0.76255226 0.16048897 -1.94762225
0.12513038 -0.6338724 0.33976181 -0.24346247 -1.21814074 0.14807447
-0.27490804 1.84320761 -0.92259349 0.52593316 -0.48479632 0.31100457
-0.77416293 0.7747672 -0.52738694 0.23459817 -1.98905341 -0.82547387
1.21016302 0.17244297 -1.08534312 0.24996856 0.85538668 -0.18647622
0.66771694 0.06095295 0.33555189 0.68792248 1.35139928 0.91950181
-2.03114743 0.43673059 0.34333295 0.37931662 -0.6853927 -0.46698399
-1.37053665 -1.45826412 1.36395122 -1.01369214 2.3026622 -1.25301358
-0.38345168 1.80926625 -0.23606366 -0.12346229 -0.39179094 1.55665742
0.88742566 0.54548675 1.6052379 1.59443347 -0.6818478 1.08560928
2.28789788 -0.40813683 -0.41364415 -0.55037213 -1.61368814 0.4829432
0.14735385 -0.0132927 0.67143169 -2.28892133 0.20487939 1.24339661
-0.73819426 -1.16224814 -0.16571971 0.09690321 -0.09569239 -0.68228275
-0.23529758 1.17768576 0.87646783 -0.37298425 0.73947025 -0.6045528
1.38175008 -0.4345362 0.45950568 1.63738216 0.97865257 0.95008135
-0.27623372 -1.29363247 0.13697648 0.44227469 1.08506402 -0.56716795
1.62543657 -0.02298661 0.43390142 -1.22863387 -1.48813202 0.39938081
-0.31413005 -2.19299644 0.51981685 -0.35272957 -1.03361907 -0.21769588
1.86168826 -0.08303813 -2.02058196 0.95987801 -0.16052446 -0.50895042
0.40324351 -2.30760879]
'''
fig = px.histogram(data, nbins=30, range_x=[-10, 10])
return fig
app.run_server(debug=True)