Pandas:如何计算一个值到另一个值的平均值(连续平均值)
Pandas: How to calculate average one value to after another (succeeding average)
想象一个如下所示的数据集:
result country start end
5 A 2/14/2022 2/21/2022
10 A 2/21/2022 2/28/2022
30 B 2/28/2022 3/7/2022
50 C 1/3/2022 1/10/2022
60 C 1/10/2022 1/17/2022
70 D 1/17/2022 1/24/2022
40 E 1/24/2022 1/31/2022
20 E 1/31/2022 2/7/2022
30 A 2/7/2022 2/14/2022
20 B 2/14/2022 2/21/2022
预期输出
我需要进行 groupby(国家、开始和结束),结果列应将现有值与上述值相加,并需要填充平均值列。
例如:
groupby country, start, and end with result and average column is nothing but 5, 5+10/2, 10+30/2, 30+50/2, 50+60/2
result average
5 5 eg: (5)
10 7.5 (5+10/2) #resultcol of existingvalue + abovevalue divided by 2 = average
30 20 (10+30/2)
50 40 (30+50/2)
60 55 (50+60/2)
70 65 ...
40 55 ...
20 30 ...
30 25 ...
20 25 ...
尝试此解决方案按国家和日期分组,但如果子集中没有足够的数据(即大于 2),它可能会引发错误:
df_data['average'] = df_data.groupby(['country', 'date'])['result'].rolling(2, min_periods=1).mean().reset_index(0, drop=True)
如果您想仅按国家/地区分组
df_data['average'] = df_data.groupby(['country'])['result'].rolling(2, min_periods=1).mean().reset_index(0, drop=True)
df_data
country date result average
0 A 2/14/2022 5 5.0
1 A 2/21/2022 10 7.5
2 B 2/28/2022 30 30.0
3 C 1/3/2022 50 50.0
4 C 1/10/2022 60 55.0
5 D 1/17/2022 70 70.0
6 E 1/24/2022 40 40.0
7 E 1/31/2022 20 30.0
8 A 2/7/2022 30 20.0
9 B 2/14/2022 20 25.0
想象一个如下所示的数据集:
result country start end
5 A 2/14/2022 2/21/2022
10 A 2/21/2022 2/28/2022
30 B 2/28/2022 3/7/2022
50 C 1/3/2022 1/10/2022
60 C 1/10/2022 1/17/2022
70 D 1/17/2022 1/24/2022
40 E 1/24/2022 1/31/2022
20 E 1/31/2022 2/7/2022
30 A 2/7/2022 2/14/2022
20 B 2/14/2022 2/21/2022
预期输出
我需要进行 groupby(国家、开始和结束),结果列应将现有值与上述值相加,并需要填充平均值列。 例如:
groupby country, start, and end with result and average column is nothing but 5, 5+10/2, 10+30/2, 30+50/2, 50+60/2
result average
5 5 eg: (5)
10 7.5 (5+10/2) #resultcol of existingvalue + abovevalue divided by 2 = average
30 20 (10+30/2)
50 40 (30+50/2)
60 55 (50+60/2)
70 65 ...
40 55 ...
20 30 ...
30 25 ...
20 25 ...
尝试此解决方案按国家和日期分组,但如果子集中没有足够的数据(即大于 2),它可能会引发错误:
df_data['average'] = df_data.groupby(['country', 'date'])['result'].rolling(2, min_periods=1).mean().reset_index(0, drop=True)
如果您想仅按国家/地区分组
df_data['average'] = df_data.groupby(['country'])['result'].rolling(2, min_periods=1).mean().reset_index(0, drop=True)
df_data
country date result average
0 A 2/14/2022 5 5.0
1 A 2/21/2022 10 7.5
2 B 2/28/2022 30 30.0
3 C 1/3/2022 50 50.0
4 C 1/10/2022 60 55.0
5 D 1/17/2022 70 70.0
6 E 1/24/2022 40 40.0
7 E 1/31/2022 20 30.0
8 A 2/7/2022 30 20.0
9 B 2/14/2022 20 25.0