数据处理的思路:
1 各表使用情况:
- 汽车分厂商每月销售表,该表主要分析展示top10销量的厂商销量、占比变化情况(柱形图、饼图);
- 中国汽车分车型每月销售量表,该表主要分析展示top20销量的车型销量变化情况以及平均售价(散点图,折线图,柱形图);
- 中国汽车总体销量表,该表主要分析展示整体销量及变化的趋势(折线图、柱形图)
2 模块使用情况
- 此次数据集相关可视化展示,均使用pyecharts绘制
1 包导入
In [1]:
import pandas as pd from pyecharts import options as opts from pyecharts.charts import Bar, Pie, Line, Scatter, Timeline, Grid from pyecharts.options import GridOpts import warnings warnings.filterwarnings("ignore")
2 分析中国汽车分厂商每月销售表
In [2]:
df_manufacturer = pd.read_excel('/home/mw/input/car3784/中国汽车分厂商每月销售表.xlsx') df_manufacturer.head()
年份 | 月份 | 排名 | 厂商LOGO | 厂商 | 销量 | 占销量份额 | |
---|---|---|---|---|---|---|---|
0 | 2023 | 1 | 1 | https://i.img16888.com/dealer/flogo/57329.gif | 比亚迪 | 133317 | 10.29% |
1 | 2023 | 1 | 2 | https://i.img16888.com/dealer/flogo/57379.gif | 长安汽车 | 90067 | 6.95% |
2 | 2023 | 1 | 3 | https://i.img16888.com/dealer/flogo/57412.gif | 上汽大众 | 78000 | 6.02% |
3 | 2023 | 1 | 4 | https://i.img16888.com/dealer/flogo/57420.gif | 一汽-大众 | 70004 | 5.41% |
4 | 2023 | 1 | 5 | https://i.img16888.com/dealer/flogo/57605.gif | 吉利汽车 | 67479 | 5.21% |
In [3]:
df_manufacturer.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 11104 entries, 0 to 11103 Data columns (total 7 columns): 年份 11104 non-null int64 月份 11104 non-null int64 排名 11104 non-null int64 厂商LOGO 11104 non-null object 厂商 11099 non-null object 销量 11104 non-null int64 占销量份额 11104 non-null object dtypes: int64(4), object(3) memory usage: 607.3+ KB
In [4]:
# 空值直接删除 df_manufacturer = df_manufacturer.dropna()
In [5]:
# 先组合一个date日期字段,便于后续的可视化 df_manufacturer['日期'] = df_manufacturer['年份'].astype(str) + '-' + df_manufacturer['月份'].astype(str)
In [6]:
df_manufacturer_top10 = df_manufacturer[df_manufacturer['排名']<11] df_manufacturer_top10 = df_manufacturer_top10.sort_values(by=['日期','排名'])
In [7]:
df_manufacturer_top10['占销量份额'] = df_manufacturer_top10['占销量份额'].apply(lambda x:x[:-1]).astype('float')
In [8]:
# 数据分别获取 dates = df_manufacturer_top10['日期'].unique().tolist() groups = {date:[] for date in dates} sales = {date:[] for date in dates} percentage = {date:[] for date in dates} for d in dates: date = d sales[date] = df_manufacturer_top10[df_manufacturer_top10['日期']==d]['销量'].tolist() groups[date] = df_manufacturer_top10[df_manufacturer_top10['日期']==d]['厂商'].tolist() percentage[date] = df_manufacturer_top10[df_manufacturer_top10['日期']==d]['占销量份额'].tolist()
In [9]:
def create_bar(i): bar = Bar() bar.add_xaxis(groups[dates[i]]) bar.add_yaxis("",sales[dates[i]]) bar.set_series_opts(label_opts=opts.LabelOpts(is_show=True)) bar.set_global_opts( title_opts=opts.TitleOpts(title="每月top10厂商销量"), xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(rotate=-30)), ) return bar
In [10]:
def create_pie(i): pie = Pie() pie.add("", [list(z) for z in zip(groups[dates[i]], percentage[dates[i]])]) pie.set_global_opts( title_opts=opts.TitleOpts(title="每月top10厂商销量占比"), legend_opts=opts.LegendOpts(orient="vertical", pos_bottom="5%", pos_left="left") ) pie.set_series_opts(label_opts=opts.LabelOpts(formatter="{b}: {c}%")) return pie
In [11]:
timeline = Timeline() timeline.add_schema( orient="vertical", is_auto_play=True, # 设置自动播放 play_interval=1000, # 播放间隔(毫秒) is_loop_play=True, # 是否循环播放 pos_right="2%", width="70", height="500", label_opts=opts.LabelOpts(is_show=True,position='left') ) for i in range(len(dates)): bar = create_bar(i) timeline.add(bar, dates[i]) timeline.render_notebook()
- 上述x轴没变,不知道为啥,而且用组合图饼图会无法显示出错,这里分开展示。
In [12]:
timeline = Timeline() timeline.add_schema( orient="vertical", is_auto_play=True, # 设置自动播放 play_interval=1000, # 播放间隔(毫秒) is_loop_play=True, # 是否循环播放 pos_right="2%", width="70", height="500", label_opts=opts.LabelOpts(is_show=True,position='left') ) for i in range(len(dates)): pie = create_pie(i) timeline.add(pie, dates[i]) timeline.render_notebook()
3 分析中国汽车分车型每月销售量表
In [13]:
df = pd.read_excel('/home/mw/input/car3784/中国汽车分车型每月销售量.xlsx')
In [14]:
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 49344 entries, 0 to 49343 Data columns (total 7 columns): 年份 49344 non-null int64 月份 49344 non-null int64 排名 49344 non-null int64 车型 49318 non-null object 厂商 49318 non-null object 销量 49344 non-null int64 售价(万元) 49344 non-null object dtypes: int64(4), object(3) memory usage: 2.6+ MB
In [15]:
# 空值直接删除 df = df.dropna()
In [16]:
# 先组合一个date日期字段,便于后续的可视化 df['日期'] = df['年份'].astype(str) + '-' + df['月份'].astype(str)
In [17]:
# 售价拆分 df['售价max'] = df['售价(万元)'].apply(lambda x: float(x.split('-')[1])) df['售价min'] = df['售价(万元)'].apply(lambda x: float(x.split('-')[0])) df['均价'] = (df['售价max']+df['售价min'])/2
In [34]:
# 先看下整体销量车型的排列,top20 df_mode_sales = df.groupby('车型').agg({'销量':'sum','售价max':'mean','售价min':'mean','均价':'mean'} )[['销量','售价max','售价min','均价']].sort_values('销量',ascending=False).reset_index() df_mode_sales.head()
车型 | 销量 | 售价max | 售价min | 均价 | |
---|---|---|---|---|---|
0 | RAV4荣放 | 4032667 | 26.38 | 17.58 | 21.980 |
1 | 轩逸 | 3752787 | 17.49 | 9.98 | 13.735 |
2 | 朗逸 | 3734558 | 15.19 | 9.40 | 12.295 |
3 | 哈弗H6 | 3487282 | 15.70 | 9.89 | 12.795 |
4 | 五菱宏光 | 3446316 | 5.99 | 4.60 | 5.295 |
In [19]:
bar = Bar() bar.add_xaxis(df_mode_sales['车型'].tolist()[:20]) bar.add_yaxis("",df_mode_sales['销量'].tolist()[:20]) bar.set_series_opts(label_opts=opts.LabelOpts(is_show=True)) bar.set_global_opts( title_opts=opts.TitleOpts(title="各车型累计销量"), xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(rotate=-30)), ) bar.render_notebook()
In [39]:
# 上述有售价为不公布的,我们也直接删掉 df_mode_sales = df_mode_sales[df_mode_sales['均价']>0]
In [41]:
x_data = df_mode_sales['销量'].tolist() y_data = df_mode_sales['均价'].tolist() names = df_mode_sales['车型'].tolist()
In [43]:
scatter = Scatter() scatter.add_xaxis(x_data) scatter.add_yaxis("销量与均价", y_data, label_opts=opts.LabelOpts(is_show=False), symbol_size=8) scatter.set_global_opts( xaxis_opts=opts.AxisOpts(name="销量"), yaxis_opts=opts.AxisOpts(name="均价") ) scatter.render_notebook()
- 国内车型的 均价级基本处于30w以下,部分车型虽然价格较高,但是销量却不低。
In [67]:
line = Line() line.add_xaxis(names[:20]) line.add_yaxis("均价", y_data[:20], label_opts=opts.LabelOpts(is_show=False)) line.set_global_opts( xaxis_opts = opts.AxisOpts(is_show=False), yaxis_opts=opts.AxisOpts(name="均价"), legend_opts=opts.LegendOpts(pos_left="40%") ) bar = Bar() bar.add_xaxis(names[:20]) bar.add_yaxis("销量", x_data[:20], label_opts=opts.LabelOpts(is_show=False), yaxis_index=1) bar.set_global_opts( yaxis_opts=opts.AxisOpts(name="销量", position="right"), ) grid = Grid() grid.add(line, grid_opts=opts.GridOpts()) grid.add(bar, grid_opts=opts.GridOpts()) grid.render_notebook()
- 整体销量看,RAV4荣芳的合计销量最大;
- 从均价看,宝马5系虽然均价较高,但是销量还是比较考前的;
- 对比RAV4荣放和五菱宏光,虽然两者整体销量差异不大,但是均价差异却很大,说明消费者购买车辆,价格只是其中考虑的一部分。
4 分析中国汽车总体销量表
In [87]:
data = pd.read_excel('/home/mw/input/car3784/中国汽车总体销量.xlsx')
In [88]:
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 201 entries, 0 to 200 Data columns (total 3 columns): 时间 201 non-null datetime64[ns] 销量 201 non-null int64 同比 201 non-null object dtypes: datetime64[ns](1), int64(1), object(1) memory usage: 4.8+ KB
In [89]:
data['时间'] = data['时间'].dt.date
In [90]:
data = data.sort_values(by='时间')
In [91]:
line = Line() line.add_xaxis(data['时间'].tolist()) line.add_yaxis("销量", data['销量'].tolist(),markline_opts=opts.MarkLineOpts(data=[opts.MarkLineItem(type_="average")])) line.set_global_opts( yaxis_opts=opts.AxisOpts(name="销量"), datazoom_opts=[ opts.DataZoomOpts(type_="inside"), opts.DataZoomOpts(type_="slider")] ) line.render_notebook()
In [94]:
data['月'] = pd.to_datetime(data['时间']).dt.month data_month = data.groupby('月').mean().sort_values('销量',ascending=False) data_month
销量 | |
---|---|
月 | |
12 | 1.921240e+06 |
11 | 1.829474e+06 |
9 | 1.730767e+06 |
10 | 1.730715e+06 |
1 | 1.722805e+06 |
3 | 1.666897e+06 |
6 | 1.522088e+06 |
5 | 1.486374e+06 |
4 | 1.482906e+06 |
8 | 1.479686e+06 |
7 | 1.377379e+06 |
2 | 1.146462e+06 |
- 汽车销量整体呈上升趋势;
- 每年的1-8月是淡季,9-12月份是旺季,12月份是销量最好的月份;
- 2020年2月,汽车销量受疫情影响比较大;