Pandas：如何计算一个值到另一个值的平均值（连续平均值）

Question

想象一个如下所示的数据集：


result  country          start      end
5       A              2/14/2022    2/21/2022
10      A              2/21/2022    2/28/2022
30      B              2/28/2022    3/7/2022
50      C               1/3/2022    1/10/2022
60      C              1/10/2022    1/17/2022
70      D              1/17/2022    1/24/2022
40      E              1/24/2022    1/31/2022
20      E              1/31/2022    2/7/2022
30      A              2/7/2022     2/14/2022
20      B              2/14/2022    2/21/2022

预期输出

我需要进行 groupby（国家、开始和结束），结果列应将现有值与上述值相加，并需要填充平均值列。例如：

groupby country, start, and end with result and average column is nothing but 5, 5+10/2, 10+30/2, 30+50/2, 50+60/2

result   average 
5         5            eg: (5)     
10        7.5           (5+10/2)  #resultcol of existingvalue + abovevalue divided by 2 = average
30        20            (10+30/2)
50        40            (30+50/2)
60        55            (50+60/2)
70        65            ...
40        55            ...
20        30            ...
30        25            ...
20        25            ...

Answer 1

尝试此解决方案按国家和日期分组，但如果子集中没有足够的数据（即大于 2），它可能会引发错误：

df_data['average'] = df_data.groupby(['country', 'date'])['result'].rolling(2, min_periods=1).mean().reset_index(0, drop=True)

如果您想仅按国家/地区分组

df_data['average'] = df_data.groupby(['country'])['result'].rolling(2, min_periods=1).mean().reset_index(0, drop=True)

df_data
  country       date  result  average
0       A  2/14/2022       5      5.0
1       A  2/21/2022      10      7.5
2       B  2/28/2022      30     30.0
3       C   1/3/2022      50     50.0
4       C  1/10/2022      60     55.0
5       D  1/17/2022      70     70.0
6       E  1/24/2022      40     40.0
7       E  1/31/2022      20     30.0
8       A   2/7/2022      30     20.0
9       B  2/14/2022      20     25.0

Pandas：如何计算一个值到另一个值的平均值（连续平均值）

Pandas: How to calculate average one value to after another (succeeding average)

python

average

dataframe

python-3.x

pandas