在子数据框上使用移动平均线？

Question

我有一个子数据框，它由数据框中的一组日期组成。我想计算同一子数据框中的移动平均值，并将其绘制在我已有的同一张图上（在子框中显示每天的案例数）。移动平均线需要从 3 月 7 日到 7 月 10 日，windows 需要 =7（一周）。

示例数据：

sex         country      date_report
M           Canada       03-01-2020
F           Canada       03-01-2020
M           Canada       03-02-2020
F           Canada       03-02-2020
M           Canada       03-02-2020
M           Canada       03-03-2020
F           Canada       03-03-2020
M           Canada       03-04-2020
F           Canada       03-04-2020
M           Canada       03-04-2020

我已有的代码

day_first=datetime.date(2020, 3, 1)
day_last=datetime.date(2020, 7, 10)
delta = (day_last - day_first)
print(delta.days)

for i in range(delta.days + 1):
  all_dates = day_first + datetime.timedelta(+i)
  print(all_dates)    # This gives me the range of dates I am looking for. 

date_count=df.groupby('date_report').date_report.count()
sub_df = df.loc[df['date_report'].between(day_first,day_last), :]
date_count = sub_df.groupby('date_report').date_report.count()
ax=date_count.plot(kind='line')
ax.xaxis.set_major_locator(months)
plt.xlabel("March 1/2020 to July 10/2020")
plt.ylabel("Number of Cases")
plt.show()

这会创建如下图：

我只需要计算同一子数据框中的每周移动平均值，然后将其绘制在同一图表上。预先感谢您的帮助，对于屏幕截图感到抱歉 - 我是 Whosebug 的新手，无法以其他方式添加图片！

Answer 1

我认为您希望使用 rolling，它可用于创建移动平均线。这是一个试图模仿您的示例，其中 rolling 用于添加另一条曲线：

import pandas as pd
import numpy as np

#the range of dates you mention
dr = pd.date_range('3-7-2020','7-10-2020',freq='D')

#df with a date_report column which is 1000 randomly chosen dates from dr
df = pd.DataFrame(np.random.choice(dr, size=1000), columns=['date_report'])
 
#your groupby operation and plotting
counts = df.groupby('date_report').date_report.count()
counts.index = counts.index.date
ax = counts.plot(kind='line')
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))

#now creating a rolling 7 day window and plotting
counts.index = pd.to_datetime(counts.index)
rolling = counts.rolling('7D').mean()
rolling.plot()

我编辑了一些 x 轴的日期格式。可能有更好的方法，但您可以看到我对 counts 进行了一些烦人的日期转换以使其工作 (see my other post here)

在子数据框上使用移动平均线？

Using moving average on a sub-data frame?

python

datetime

matplotlib

moving-average

pandas