在进行数据的可视化分析时,可能我们分析的两个指标,他们的取值区间相差很大,如果采用一个y轴,就不利于我们观察另一个指标。所以,记录一下我在项目的处理过程中采用的方法:
首先观察一下,我用到的数据源
# 首先进行数据清洗
# 获取国家名称
country = match_data['Home Team Name'].unique().tolist()
# 创建两个列表,用于存储主客国家的进球数
Home_goals,Away_goals =[],[]
for i in country:
h_goals = match_data.loc[match_data['Home Team Name'] == i ,'Home Team Goals'].sum()
Home_goals.append(h_goals)
a_goals = match_data.loc[match_data['Away Team Name'] == i ,'Away Team Goals'].sum()
Away_goals.append(a_goals)
goals_data =pd.DataFrame({'Country':country,'Home_goals':Home_goals,'Away_goals':Away_goals})
goals_data['total_goals'] = goals_data['Home_goals']+goals_data['Away_goals']
goals_data['Home_rate'] = goals_data['Home_goals']/goals_data['total_goals']
# 按照总进球数进行降序排列,取前10个国家进行分析
top_goals_data= goals_data.sort_values(by = 'total_goals',ascending = False)[:10]
print(top_goals_data)
画图
fig,ax1= plt.subplots(figsize = (15,5))
ax2 = ax1.twinx() # 创建具有双y轴的子图
top_goals_data.plot(x= 'Country' ,y= ['Home_goals','Away_goals','total_goals'],kind = 'bar',ax=ax1)
ax1.set_xlabel("Country")
ax1.set_ylabel("Goals")
# 绘制第二个图形
ax2.plot(top_goals_data.Country,top_goals_data.Home_rate,marker ='o')
ax2.set_ylabel("Home_rate")
# 图形上面显示数值标签
for i,v in zip(top_goals_data.Country,top_goals_data.Home_rate):
plt.text(i,v,round(v,4))
plt.show()
效果呈现如下:
完工~