分别为每个组插入年度数据

Question

我有几个站的 20 年数据框。我想对每个站点的观测值进行插值。我使用了以下行，但它不起作用。

df.groupby('stations')['observations'].interpolate(method='linear')

这是我的数据样本

station  year    observations
0   3939    2000    0.346518
1   3939    2001    0.278250
2   3939    2002    1.096147
3   3939    2003    0.423948
4   3939    2004    0.000000
5   3939    2005    0.000000
6   3939    2006    0.000000
7   3939    2007    0.663922
8   3939    2008    0.000000
9   3939    2009    0.000000
10  3939    2010    0.000000
11  3939    2011    2.921322
12  3939    2012    1.463399
13  3939    2013    1.402697
14  3939    2014    0.000000
15  3939    2015    0.000000
16  3939    2016    0.000000
17  3939    2017    0.000000
18  3939    2018    0.000000
19  3939    2019    2.599236
20  3939    2020    1.428136
21  3953    2000    5.893202
22  3953    2001    7.227092
23  3953    2002    6.489147
24  3953    2003    4.961213
25  3953    2004    0.000000
26  3953    2005    0.000000
27  3953    2006    5.273121
28  3953    2007    0.000000
29  3953    2008    0.000000
30  3953    2009    0.000000
31  3953    2010    5.591221
32  3953    2011    0.000000
33  3953    2012    0.000000
34  3953    2013    4.797106
35  3953    2014    8.109661
36  3953    2015    0.000000
37  3953    2016    1.798583
38  3953    2017    0.000000
39  3953    2018    0.000000
40  3953    2019    0.000000
41  3953    2020    6.440142
42  3977    2000    14.236954
43  3977    2001    17.216910
44  3977    2002    10.210559
45  3977    2003    0.000000
46  3977    2004    0.000000
47  3977    2005    10.463710
48  3977    2006    0.000000
49  3977    2007    0.000000

谢谢

Answer 1

感谢Henry Ecker, he answered my question in the comment of my 。

df['observations'] = (
    df['observations']
        .mask(df['observations'].eq(0))  # Replace 0 with NaN
        .groupby(df['station'])  # Groupby Station
        .transform(pd.Series.interpolate, method='linear')  # interpolate
)

他还建议了解更多信息。

Answer 2

您也可以使用 .apply()

编写 lambda 函数

interp = lambda g: g.replace(0, np.nan).interpolate(method='linear')
df.groupby('station')['observations'].apply(interp)

分别为每个组插入年度数据

interpolate annual data for each group separately

python

interpolation

dataframe

pandas

pandas-groupby