如何计算日期前 x 天的索引平均值(如果该天不是假期)并将其合并到数据框?
How to compute avg of an index for x days before a date (if the day is not a holiday) and merge it to dataframe?
我有一个数据集,其中包含给定日期某个位置的交通指数。
对于给定日期,我想计算给定日期前 30 天的所有流量指数的平均值,如果该天不是假期,则只考虑这 30 天子集中的天数。
我想使用 python 进行此计算。我在下面有一个屏幕截图,直观地代表了我的要求。
Explanation of the screenshot
On April 1, 2019:
I want to calculate the 30 Day Non-Holiday traffic Index Average,
for a given location and map it to a new column with a similar column name.
The column weekend_holiday is a boolean column that is true (1) for days that are public holidays or weekends.
We must ignore such entries in the computation of Average Location's Traffic index.
Link 到示例数据集:https://gist.github.com/skwolvie/f01c027de0816c28337870286ee61a9d
请建议 python pandas 技巧来实现此结果。
您可以使用 pandas' rolling 计算滚动平均值,它接受 windows 和基于时间的长度。
以下代码计算数据帧每一行的平均值:
# Set date as index because it is needed if you want to do time-based rolling
df.Date = pd.to_datetime(df.Date)
df = df.set_index('Date')
# Drop weekends/holidays and then compute the average of the previous 30 days
df['DELHI'] = df.where(df.weekend_or_holiday == 0).rolling('30D').mean()['New Delhi']
df['MUMBAI'] = df.where(df.weekend_or_holiday == 0).rolling('30D').mean()['Mumbai']
# Get back Date column
df = df.reset_index()
我有一个数据集,其中包含给定日期某个位置的交通指数。 对于给定日期,我想计算给定日期前 30 天的所有流量指数的平均值,如果该天不是假期,则只考虑这 30 天子集中的天数。
我想使用 python 进行此计算。我在下面有一个屏幕截图,直观地代表了我的要求。
Explanation of the screenshot
On April 1, 2019:
I want to calculate the 30 Day Non-Holiday traffic Index Average,
for a given location and map it to a new column with a similar column name.
The column weekend_holiday is a boolean column that is true (1) for days that are public holidays or weekends.
We must ignore such entries in the computation of Average Location's Traffic index.
Link 到示例数据集:https://gist.github.com/skwolvie/f01c027de0816c28337870286ee61a9d
请建议 python pandas 技巧来实现此结果。
您可以使用 pandas' rolling 计算滚动平均值,它接受 windows 和基于时间的长度。
以下代码计算数据帧每一行的平均值:
# Set date as index because it is needed if you want to do time-based rolling
df.Date = pd.to_datetime(df.Date)
df = df.set_index('Date')
# Drop weekends/holidays and then compute the average of the previous 30 days
df['DELHI'] = df.where(df.weekend_or_holiday == 0).rolling('30D').mean()['New Delhi']
df['MUMBAI'] = df.where(df.weekend_or_holiday == 0).rolling('30D').mean()['Mumbai']
# Get back Date column
df = df.reset_index()