报错内容:
ValueError: ('Lengths must match to compare', (19,), (1,))
案例,自定义数据集,并求每个顾问有成单日的近三天累计成交金额,如下:
求近3天累计成交额:
import pandas as pd
from datetime import date
from datetime import timedelta
import numpy as np
today=str(date.today())
np.random.seed(666)
data=np.random.randint(60,100,[20,2])
index=pd.date_range('2024-01-01',periods=20,freq='D')
column=['销售额','其他']
tb=pd.DataFrame(data=data,index=index,columns=column)
tb.index.name='日期'
xm=list(np.repeat('张三',8))+list(np.repeat('李四',12))
tb['姓名']=xm
tb.index=tb.index.astype('str')
tb.rename(index={'2024-01-04':'2024-01-02','2024-01-12':'2024-01-05'},inplace=True)
print(tb)
tb=tb.reset_index()
tb1=pd.DataFrame(tb.groupby(['姓名','日期'])['销售额'].sum())
tb1=tb1.reset_index()
# print(tb1)
tb1['日期']=pd.to_datetime(tb1['日期'])
tb1['日期1']=tb1['日期']+timedelta(-2)
print(tb1)
# for (i,j) in tuple(tb1.loc[:,['姓名','日期']].values):
# idx=tb1[(tb1['姓名']==i)&(tb1['日期']==j)].index.values
# riqi=tb1.loc[idx,'日期1'].values
# print(i,j,idx,riqi)
# tb1.loc[idx,'统计']=tb1[(tb1['姓名']==i)&(tb1['日期'].between(riqi,j,inclusive='both'))]['销售额'].sum()
# print(tb1)
for (i,j) in tuple(tb1.loc[:,['姓名','日期']].values):
idx=tb1[(tb1['姓名']==i)&(tb1['日期']==j)].index.values
riqi=tb1.loc[idx,'日期1'].values
print(i,j,idx,riqi[0])
tb1.loc[idx,'统计']=tb1[(tb1['姓名']==i)&(tb1['日期'].between(riqi[0],j,inclusive='both'))]['销售额'].sum()
print(tb1)
代码中引掉部分替换,即可得到如下结果:
报错相同案例:python - ValueError: ('Lengths must match to compare', (229025,), (1,)) - Stack Overflow
报错原因:
tb.loc[index=0,'riqi'].values出来的是一个只有一个数据的series,需要加上【0】来引用第一个数据,即把【】去掉,再去用between或>=,<等条件来判断取数
该方法可以自定义求和汇总。