按列/年的移动平均数 - python、pandas
Moving average by column / year - python, pandas
我需要按国家 [noc] 在列 "total_medals" 上建立一个移动平均线,用于所有前几年 - 我的数据如下:
medal Bronze Gold Medal Silver **total_medals**
noc year
ALG 1984 2.0 NaN NaN NaN 2.0
1992 4.0 2.0 NaN NaN 6.0
1996 2.0 1.0 4.0 7.0
ANZ 1984 2.0 15.0 NaN 2.0 19.0
1992 3.0 5.0 NaN 2.0 10.0
1996 1.0 2.0 2.0 5.0
ARG 1984 2.0 6.0 NaN 3.0 11.0
1992 5.0 3.0 NaN 24.0 32.0
1992 3.0 7.0 NaN 5.0 15.0
我想要每个国家/地区和年份的移动平均值(即对于 ALG:1984 年平均 (total_medals)=2.0;1992 年平均(total_medals) = (2.0+6.0)/2 = 4.0; 1996 Acg(total_medals) = (2.0+6.0+7.0)/3 = 5.0) - 移动平均线应出现在新列中(total_medals 旁边)。
此外,对于每个国家和年份的组合,名为 "performance" 的新列应该是 "total_medals" 除以 "moving average"
的分数
示例数据帧:
print(df)
medal Bronze Gold Medal Silver
noc year
ALG 1984 2.0 NaN NaN NaN 2.0
1992 4.0 2.0 NaN NaN 6.0
1996 2.0 1.0 NaN 4.0 7.0
ANZ 1984 2.0 15.0 NaN 2.0 19.0
1992 3.0 5.0 NaN 2.0 10.0
1996 1.0 2.0 NaN 2.0 5.0
ARG 1984 2.0 6.0 NaN 3.0 11.0
1992 5.0 3.0 NaN 24.0 32.0
1992 3.0 7.0 NaN 5.0 15.0
使用DataFrame.groupby
+ expanding
:
df['total_mean']=df.groupby(level=0,sort=False).Silver.apply(lambda x: x.expanding(1).mean())
print(df)
medal Bronze Gold Medal Silver total_medals
noc year
ALG 1984 2.0 NaN NaN NaN 2.0 2.000000
1992 4.0 2.0 NaN NaN 6.0 4.000000
1996 2.0 1.0 NaN 4.0 7.0 5.000000
ANZ 1984 2.0 15.0 NaN 2.0 19.0 19.000000
1992 3.0 5.0 NaN 2.0 10.0 14.500000
1996 1.0 2.0 NaN 2.0 5.0 11.333333
ARG 1984 2.0 6.0 NaN 3.0 11.0 11.000000
1992 5.0 3.0 NaN 24.0 32.0 21.500000
1992 3.0 7.0 NaN 5.0 15.0 19.333333
bonze 滞后
s=df.groupby('noc').apply(lambda x: x['Bronze']/x['total_medals'].shift())
s.index=s.index.droplevel()
df['bronze_lagged']=s
您可以为此创建一个函数...
def lagged_medals(type_of_medal):
s=df.groupby('noc').apply(lambda x: x[type_of_medal]/x['total_medals'].shift())
s.index=s.index.droplevel()
df[f'{type_of_medal}_lagged']=s
lagged_medals('Silver')
#print(df)
我需要按国家 [noc] 在列 "total_medals" 上建立一个移动平均线,用于所有前几年 - 我的数据如下:
medal Bronze Gold Medal Silver **total_medals**
noc year
ALG 1984 2.0 NaN NaN NaN 2.0
1992 4.0 2.0 NaN NaN 6.0
1996 2.0 1.0 4.0 7.0
ANZ 1984 2.0 15.0 NaN 2.0 19.0
1992 3.0 5.0 NaN 2.0 10.0
1996 1.0 2.0 2.0 5.0
ARG 1984 2.0 6.0 NaN 3.0 11.0
1992 5.0 3.0 NaN 24.0 32.0
1992 3.0 7.0 NaN 5.0 15.0
我想要每个国家/地区和年份的移动平均值(即对于 ALG:1984 年平均 (total_medals)=2.0;1992 年平均(total_medals) = (2.0+6.0)/2 = 4.0; 1996 Acg(total_medals) = (2.0+6.0+7.0)/3 = 5.0) - 移动平均线应出现在新列中(total_medals 旁边)。
此外,对于每个国家和年份的组合,名为 "performance" 的新列应该是 "total_medals" 除以 "moving average"
的分数示例数据帧:
print(df)
medal Bronze Gold Medal Silver
noc year
ALG 1984 2.0 NaN NaN NaN 2.0
1992 4.0 2.0 NaN NaN 6.0
1996 2.0 1.0 NaN 4.0 7.0
ANZ 1984 2.0 15.0 NaN 2.0 19.0
1992 3.0 5.0 NaN 2.0 10.0
1996 1.0 2.0 NaN 2.0 5.0
ARG 1984 2.0 6.0 NaN 3.0 11.0
1992 5.0 3.0 NaN 24.0 32.0
1992 3.0 7.0 NaN 5.0 15.0
使用DataFrame.groupby
+ expanding
:
df['total_mean']=df.groupby(level=0,sort=False).Silver.apply(lambda x: x.expanding(1).mean())
print(df)
medal Bronze Gold Medal Silver total_medals
noc year
ALG 1984 2.0 NaN NaN NaN 2.0 2.000000
1992 4.0 2.0 NaN NaN 6.0 4.000000
1996 2.0 1.0 NaN 4.0 7.0 5.000000
ANZ 1984 2.0 15.0 NaN 2.0 19.0 19.000000
1992 3.0 5.0 NaN 2.0 10.0 14.500000
1996 1.0 2.0 NaN 2.0 5.0 11.333333
ARG 1984 2.0 6.0 NaN 3.0 11.0 11.000000
1992 5.0 3.0 NaN 24.0 32.0 21.500000
1992 3.0 7.0 NaN 5.0 15.0 19.333333
bonze 滞后
s=df.groupby('noc').apply(lambda x: x['Bronze']/x['total_medals'].shift())
s.index=s.index.droplevel()
df['bronze_lagged']=s
您可以为此创建一个函数...
def lagged_medals(type_of_medal):
s=df.groupby('noc').apply(lambda x: x[type_of_medal]/x['total_medals'].shift())
s.index=s.index.droplevel()
df[f'{type_of_medal}_lagged']=s
lagged_medals('Silver')
#print(df)