课程学习来源:b站up:【蚂蚁学python】
【课程链接:【【数据可视化】Python数据图表可视化入门到实战】】
【课程资料链接:【链接】】
Python绘制散点图查看BMI与保险费的关系
散点图:
- 用两组数据构成多个坐标点,考察坐标点的分布,判断两变量之间是否存在某种关联或总结坐标点的分布模式
- 散点图核心的价值在于发现变量之间的关系,然后进行预测分析,做出科学的决策
实例:医疗费用个人数据集中,"身体质量指数BMI"与"个人医疗费用"两者之间的关系
数据集原地址:https://www.kaggle.com/mirichoi0218/insurance/home
1.读取保险费数据集
import pandas as pd
df = pd.read_csv("../DATA_POOL/PY_DATA/ant-learn-visualization-master/datas/insurance/insurance.csv")
df.head(10)
age | sex | bmi | children | smoker | region | charges | |
---|---|---|---|---|---|---|---|
0 | 19 | female | 27.900 | 0 | yes | southwest | 16884.92400 |
1 | 18 | male | 33.770 | 1 | no | southeast | 1725.55230 |
2 | 28 | male | 33.000 | 3 | no | southeast | 4449.46200 |
3 | 33 | male | 22.705 | 0 | no | northwest | 21984.47061 |
4 | 32 | male | 28.880 | 0 | no | northwest | 3866.85520 |
5 | 31 | female | 25.740 | 0 | no | southeast | 3756.62160 |
6 | 46 | female | 33.440 | 1 | no | southeast | 8240.58960 |
7 | 37 | female | 27.740 | 3 | no | northwest | 7281.50560 |
8 | 37 | male | 29.830 | 2 | no | northeast | 6406.41070 |
9 | 60 | female | 25.840 | 0 | no | northwest | 28923.13692 |
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1338 entries, 0 to 1337
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 age 1338 non-null int64
1 sex 1338 non-null object
2 bmi 1338 non-null float64
3 children 1338 non-null int64
4 smoker 1338 non-null object
5 region 1338 non-null object
6 charges 1338 non-null float64
dtypes: float64(2), int64(2), object(3)
memory usage: 73.3+ KB
2.pyecharts绘制散点图
# 将数据按照bmi升序排列
df.sort_values(by = "bmi", inplace = True)# inplace =true 表示直接更改df本身的数据
df.head()
age | sex | bmi | children | smoker | region | charges | |
---|---|---|---|---|---|---|---|
172 | 18 | male | 15.960 | 0 | no | northeast | 1694.79640 |
428 | 21 | female | 16.815 | 1 | no | northeast | 3167.45585 |
1226 | 38 | male | 16.815 | 2 | no | northeast | 6640.54485 |
412 | 26 | female | 17.195 | 2 | yes | northeast | 14455.64405 |
1286 | 28 | female | 17.290 | 0 | no | northeast | 3732.62510 |
bmi = df["bmi"].to_list()
charges = df["charges"].to_list()
import pyecharts.options as opts
from pyecharts.charts import Scatter
scatter = (
Scatter()
.add_xaxis(
xaxis_data = bmi
)
.add_yaxis(
series_name = "",
y_axis = charges,
symbol_size = 4,
label_opts = opts.LabelOpts(is_show = False)
)
.set_global_opts(
xaxis_opts = opts.AxisOpts(type_ = "value"),
yaxis_opts = opts.AxisOpts(type_ = "value"),
title_opts = opts.TitleOpts(title = "(BMI-保险费)关系图", pos_left = "center")
)
)
from IPython.display import HTML
# 同上,读取 HTML 文件内容
# bar.render()的值是一个路径,以字符串形式表示
with open(scatter.render(), 'r', encoding='utf-8') as file:
html_content = file.read()
# 直接在 JupyterLab 中渲染 HTML
HTML(html_content)