pandas 根据选定的星期几分组
pandas grooup by according to group of days of the week selected
我有这个数据框:
rng = pd.date_range(start='2018-01-01', end='2018-01-21')
rnd_values = np.random.rand(len(rng))+3
df = pd.DataFrame({'time':rng.to_list(),'value':rnd_values})
假设我想根据星期几对其进行分组并计算平均值:
df['span'] = np.where((df['time'].dt.day_of_week <= 2 , 'Th-Sn', 'Mn-Wd')
df['wkno'] = df['time'].dt.isocalendar().week.shift(fill_value=0)
df.groupby(['wkno','span']).mean()
不过,我想让这个程序更通用。
假设我定义下一天是星期:
days=['Monday','Thursday']
是否有任何选项允许我使用“天”来完成我所做的事情。我想我必须计算 'Monday'、'Thursday' 之间的天数,然后我应该使用该数字。那么
的情况呢
days=['Monday','Thursday','Friday']
我正在考虑将字典设置为:
days={'Monday':0,'Thursday':3,'Friday':4}
然后
idays = list(days.values())[:]
如何在 np.where 中使用 now idays?确实我有3个区间。
谢谢
如果你想使用多个阈值,你需要 np.searchsorted
结果函数看起来像
def groupby_daysspan_week(dfc,days):
df = dfc.copy()
day_to_dayofweek = {'Monday':0,'Tuesday':1,'Wednesday':2,
'Thursday':3,'Friday':4,'Saturday':5,'Sunday':6}
short_dict = {0:'Mn',1:'Tu',2:'Wd',3:'Th',4:'Fr',5:'St',6:'Sn'}
day_split = [day_to_dayofweek[d] for d in days]
df['wkno'] = df['time'].dt.isocalendar().week
df['dow'] = df['time'].dt.day_of_week
df['span'] = np.searchsorted(day_split,df['dow'],side='right')
span_name_dict = {i+1:short_dict[day_split[i]]+'-'+short_dict[(day_split+[6])[i+1]]
for i in range(len(day_split))}
df_agg = df.groupby(['wkno','span'])['value'].mean()
df_agg = df_agg.rename(index=span_name_dict,level=1)
return df_agg
我有这个数据框:
rng = pd.date_range(start='2018-01-01', end='2018-01-21')
rnd_values = np.random.rand(len(rng))+3
df = pd.DataFrame({'time':rng.to_list(),'value':rnd_values})
假设我想根据星期几对其进行分组并计算平均值:
df['span'] = np.where((df['time'].dt.day_of_week <= 2 , 'Th-Sn', 'Mn-Wd')
df['wkno'] = df['time'].dt.isocalendar().week.shift(fill_value=0)
df.groupby(['wkno','span']).mean()
不过,我想让这个程序更通用。
假设我定义下一天是星期:
days=['Monday','Thursday']
是否有任何选项允许我使用“天”来完成我所做的事情。我想我必须计算 'Monday'、'Thursday' 之间的天数,然后我应该使用该数字。那么
的情况呢days=['Monday','Thursday','Friday']
我正在考虑将字典设置为:
days={'Monday':0,'Thursday':3,'Friday':4}
然后
idays = list(days.values())[:]
如何在 np.where 中使用 now idays?确实我有3个区间。
谢谢
如果你想使用多个阈值,你需要 np.searchsorted
结果函数看起来像
def groupby_daysspan_week(dfc,days):
df = dfc.copy()
day_to_dayofweek = {'Monday':0,'Tuesday':1,'Wednesday':2,
'Thursday':3,'Friday':4,'Saturday':5,'Sunday':6}
short_dict = {0:'Mn',1:'Tu',2:'Wd',3:'Th',4:'Fr',5:'St',6:'Sn'}
day_split = [day_to_dayofweek[d] for d in days]
df['wkno'] = df['time'].dt.isocalendar().week
df['dow'] = df['time'].dt.day_of_week
df['span'] = np.searchsorted(day_split,df['dow'],side='right')
span_name_dict = {i+1:short_dict[day_split[i]]+'-'+short_dict[(day_split+[6])[i+1]]
for i in range(len(day_split))}
df_agg = df.groupby(['wkno','span'])['value'].mean()
df_agg = df_agg.rename(index=span_name_dict,level=1)
return df_agg