在 python pandas 中创建按周滚动分组的列的滚动总和

Question

我有以下 pandas 数据框，

   status1  status2 location1   datetime1           grouping    service capacity 
0   xx      xx      xx          01-01-2020 11:50:00 xx          xx       150
1   xx      xx      xx          01-01-2020 11:57:00 xx          xx       200
2   xx      xx      xx          01-01-2020 11:59:00 xx          xx       200
3   xx      xx      xx          01-01-2020 13:59:00 xx          xx       200
...
x   xx      xx      xx          01-02-2020 13:59:00 xx          xx       300
x   xx      xx      xx          01-03-2020 13:04:00 xx          xx       300
...
x   xx      xx      xx          07-03-2021 13:04:00 xx          xx       400
x   xx      xx      xx          07-03-2021 13:04:00 xx          xx       300
x   xx      xx      xx          07-03-2021 13:04:00 xx          xx       300

我想滚动总结每周的产能。

比如我想要

   
   WeekStartingSunday   countofstatus1      sumofcapacity
0                   1               50               3000
1                   2               30               2000
2                   3              ...                ...
3                   4              ...                ...
...

因此第 1 周包含 2020 年第一周内所有日期的总和。该周将从星期日开始。我还想为其他日子创建表格，例如星期一星期二等。

我试过了df.groupby('capacity').rolling(7).sum() but it just sums up every 7 rows i think.

我也试过了，

group = pd.pivot_table(df,columns='capacity', index='datetime1')
group2 = group.resample('D').sum().rolling(7).sum()
group2.sort_index().head(15)

但是看起来是这样的，

capacity    1.0   2.0   2.25 2.40 3.0....
datetime1
2020-01-01  NaN   NaN   NaN  NaN NaN ...
2021-01-02  NaN   NaN   NaN  NaN NaN ...
...
2021-01-07  322.1 326.5 117  0.0 275.2 ...
...

这可以在 pandas 中完成吗？

Answer 1

你可以试试：

将日期字符串转换为日期时间格式。假设日期在 dd-mm-YYYY

# Use dayfirst=True for dates in dd-mm-YYYY
df['datetimea1'] = pd.to_datetime(df['datetimea1'], dayfirst=True)

用 iso 日历周定义列 .dt.isocalendar().week :

df['WeekStartingMonday'] = df['datetimea1'].dt.isocalendar().week

按.groupby and aggregate entries for a week by .agg()在新列WeekStartingMonday上分组，如下：

df_out = (df.groupby('WeekStartingMonday', as_index=False)
            .agg(countofstatus1=('status1', 'count'), sumofcapacity=('capacity', 'sum'))
         )

输入：

  status1 status2 location1           datetimea1 grouping service  capacity
0      xx      xx        xx  01-01-2020 11:50:00       xx      xx       150
1      xx      xx        xx  01-01-2020 11:57:00       xx      xx       200
2      xx      xx        xx  01-01-2020 11:59:00       xx      xx       200
3      xx      xx        xx  01-01-2020 13:59:00       xx      xx       200
4      xx      xx        xx  01-02-2020 13:59:00       xx      xx       300
5      xx      xx        xx  01-03-2020 13:04:00       xx      xx       300
6      xx      xx        xx  07-03-2021 13:04:00       xx      xx       400
7      xx      xx        xx  07-03-2021 13:04:00       xx      xx       300
8      xx      xx        xx  07-03-2021 13:04:00       xx      xx       300

输出：

print(df_out)

   WeekStartingMonday  countofstatus1  sumofcapacity
0                   1               4            750
1                   5               1            300
2                   9               4           1300

编辑

1) Sunday/Monday作为一周的第一天的其他选项：

您可以使用 .dt.strftime() 和各种格式字符串来获取周数，如下所示：

%U week number of year, with Sunday as first day of week (00..53).

%V ISO week number, with Monday as first day of week (01..53).

%W week number of year, with Monday as first day of week (00..53).

df['WeekStartingSunday'] = df['datetimea1'].dt.strftime('%U')

或：

df['WeekStartingMonday'] = df['datetimea1'].dt.strftime('%V')

或：

df['WeekStartingMonday'] = df['datetimea1'].dt.strftime('%W')

更多信息可以参考official document of strftime() format codes.

星期二到星期六没有对应的开始一周的选项。如果您绝对需要获得这样的自定义周，您可以通过下面的一些变通解决方案来实现。

2) 将一周中的任意一天定义为一周的第一天：

您可以将日期转换为 Period，表示在一周中的特定一天结束的周。例如：要获取从周二开始（到周一结束）的一周的不同周数，您可以使用：

df['Period'] = df['datetimea1'].dt.to_period('W-MON')     # W-MON is the for a custom week ending on MON (i.e. starting on TUE)
df['WeekStartingTuesday'] = df['Period'].dt.week

注意这样，一年的第一天的第一周，例如2020-01-01 的周数可能为 1 或 2。无论如何，对于连续 7 天需要不同周数的用例，它应该仍能很好地服务。

没有直接的方法可以在一周中的特定日期将双周频率设置为 start/end。另一种解决方法可能是

df['Period'] = df['datetimea1'].dt.to_period('W-MON')       # W-MON is the for a custom week ending on MON (i.e. starting on TUE)
df['BiWeeklyStartingTuesday'] = df['Period'].dt.week // 2   # Get bi-weekly number

在 python pandas 中创建按周滚动分组的列的滚动总和

Create a rolling sum of a column which is grouped by weeks on a rolling basis in python pandas

python

dataframe

pandas

rolling-computation

编辑