分别为每个组插入年度数据
interpolate annual data for each group separately
我有几个站的 20 年数据框。我想对每个站点的观测值进行插值。我使用了以下行,但它不起作用。
df.groupby('stations')['observations'].interpolate(method='linear')
这是我的数据样本
station year observations
0 3939 2000 0.346518
1 3939 2001 0.278250
2 3939 2002 1.096147
3 3939 2003 0.423948
4 3939 2004 0.000000
5 3939 2005 0.000000
6 3939 2006 0.000000
7 3939 2007 0.663922
8 3939 2008 0.000000
9 3939 2009 0.000000
10 3939 2010 0.000000
11 3939 2011 2.921322
12 3939 2012 1.463399
13 3939 2013 1.402697
14 3939 2014 0.000000
15 3939 2015 0.000000
16 3939 2016 0.000000
17 3939 2017 0.000000
18 3939 2018 0.000000
19 3939 2019 2.599236
20 3939 2020 1.428136
21 3953 2000 5.893202
22 3953 2001 7.227092
23 3953 2002 6.489147
24 3953 2003 4.961213
25 3953 2004 0.000000
26 3953 2005 0.000000
27 3953 2006 5.273121
28 3953 2007 0.000000
29 3953 2008 0.000000
30 3953 2009 0.000000
31 3953 2010 5.591221
32 3953 2011 0.000000
33 3953 2012 0.000000
34 3953 2013 4.797106
35 3953 2014 8.109661
36 3953 2015 0.000000
37 3953 2016 1.798583
38 3953 2017 0.000000
39 3953 2018 0.000000
40 3953 2019 0.000000
41 3953 2020 6.440142
42 3977 2000 14.236954
43 3977 2001 17.216910
44 3977 2002 10.210559
45 3977 2003 0.000000
46 3977 2004 0.000000
47 3977 2005 10.463710
48 3977 2006 0.000000
49 3977 2007 0.000000
谢谢
感谢Henry Ecker, he answered my question in the comment of my 。
df['observations'] = (
df['observations']
.mask(df['observations'].eq(0)) # Replace 0 with NaN
.groupby(df['station']) # Groupby Station
.transform(pd.Series.interpolate, method='linear') # interpolate
)
他还建议 了解更多信息。
您也可以使用 .apply()
编写 lambda 函数
interp = lambda g: g.replace(0, np.nan).interpolate(method='linear')
df.groupby('station')['observations'].apply(interp)
我有几个站的 20 年数据框。我想对每个站点的观测值进行插值。我使用了以下行,但它不起作用。
df.groupby('stations')['observations'].interpolate(method='linear')
这是我的数据样本
station year observations
0 3939 2000 0.346518
1 3939 2001 0.278250
2 3939 2002 1.096147
3 3939 2003 0.423948
4 3939 2004 0.000000
5 3939 2005 0.000000
6 3939 2006 0.000000
7 3939 2007 0.663922
8 3939 2008 0.000000
9 3939 2009 0.000000
10 3939 2010 0.000000
11 3939 2011 2.921322
12 3939 2012 1.463399
13 3939 2013 1.402697
14 3939 2014 0.000000
15 3939 2015 0.000000
16 3939 2016 0.000000
17 3939 2017 0.000000
18 3939 2018 0.000000
19 3939 2019 2.599236
20 3939 2020 1.428136
21 3953 2000 5.893202
22 3953 2001 7.227092
23 3953 2002 6.489147
24 3953 2003 4.961213
25 3953 2004 0.000000
26 3953 2005 0.000000
27 3953 2006 5.273121
28 3953 2007 0.000000
29 3953 2008 0.000000
30 3953 2009 0.000000
31 3953 2010 5.591221
32 3953 2011 0.000000
33 3953 2012 0.000000
34 3953 2013 4.797106
35 3953 2014 8.109661
36 3953 2015 0.000000
37 3953 2016 1.798583
38 3953 2017 0.000000
39 3953 2018 0.000000
40 3953 2019 0.000000
41 3953 2020 6.440142
42 3977 2000 14.236954
43 3977 2001 17.216910
44 3977 2002 10.210559
45 3977 2003 0.000000
46 3977 2004 0.000000
47 3977 2005 10.463710
48 3977 2006 0.000000
49 3977 2007 0.000000
谢谢
感谢Henry Ecker, he answered my question in the comment of my
df['observations'] = (
df['observations']
.mask(df['observations'].eq(0)) # Replace 0 with NaN
.groupby(df['station']) # Groupby Station
.transform(pd.Series.interpolate, method='linear') # interpolate
)
他还建议
您也可以使用 .apply()
interp = lambda g: g.replace(0, np.nan).interpolate(method='linear')
df.groupby('station')['observations'].apply(interp)