pandas 根据选定的星期几分组

pandas grooup by according to group of days of the week selected

我有这个数据框:

rng = pd.date_range(start='2018-01-01', end='2018-01-21')
rnd_values = np.random.rand(len(rng))+3

df = pd.DataFrame({'time':rng.to_list(),'value':rnd_values})

假设我想根据星期几对其进行分组并计算平均值:

df['span'] = np.where((df['time'].dt.day_of_week <= 2 , 'Th-Sn', 'Mn-Wd')
df['wkno'] = df['time'].dt.isocalendar().week.shift(fill_value=0) 
df.groupby(['wkno','span']).mean()

不过,我想让这个程序更通用。

假设我定义下一天是星期:

days=['Monday','Thursday']

是否有任何选项允许我使用“天”来完成我所做的事情。我想我必须计算 'Monday'、'Thursday' 之间的天数,然后我应该使用该数字。那么

的情况呢
days=['Monday','Thursday','Friday']

我正在考虑将字典设置为:

days={'Monday':0,'Thursday':3,'Friday':4}

然后

idays = list(days.values())[:]

如何在 np.where 中使用 now idays?确实我有3个区间。

谢谢

如果你想使用多个阈值,你需要 np.searchsorted 结果函数看起来像

def groupby_daysspan_week(dfc,days):
    df = dfc.copy()
    day_to_dayofweek = {'Monday':0,'Tuesday':1,'Wednesday':2,
                        'Thursday':3,'Friday':4,'Saturday':5,'Sunday':6}
    short_dict = {0:'Mn',1:'Tu',2:'Wd',3:'Th',4:'Fr',5:'St',6:'Sn'}
    day_split = [day_to_dayofweek[d] for d in days]
    df['wkno'] = df['time'].dt.isocalendar().week
    df['dow'] = df['time'].dt.day_of_week
    df['span'] = np.searchsorted(day_split,df['dow'],side='right')
    span_name_dict = {i+1:short_dict[day_split[i]]+'-'+short_dict[(day_split+[6])[i+1]] 
                      for i in range(len(day_split))}
    df_agg = df.groupby(['wkno','span'])['value'].mean()
    df_agg = df_agg.rename(index=span_name_dict,level=1)
    return df_agg