添加缺失的行并插入它们的值
Adding Missing Rows and Interpolating their Values
我正在使用以下数据框,
altitude density east_wind north_wind
0 5 0.020567 39.714397 6.795392
1 7 0.016871 41.171996 6.852655
2 9 0.013839 42.629594 6.909918
3 11 0.011351 44.087193 6.967182
4 13 0.009311 45.544791 7.024445
并且我希望在 altitude
中有一个连续的值而不仅仅是奇数,然后使用 SciPy 的 .interpolate(method='linear')
填充缺失值并扩展插值altitude
值 20
预期输出
altitude density east_wind north_wind
0 5 0.020567 39.714397 6.795392
1 6 0.018871 41.171996 6.852655
2 7 0.015839 42.629594 6.909918
3 8 0.013351 44.087193 6.967182
4 9 0.010311 45.544791 7.024445
...
...
9 19 0.000351 50.087193 11.967182
10 20 0.000311 51.544791 12.024445
请指教
Pandas中的内插相对容易,外推有点难。所以我们“作弊”,手动计算altitude=21行,然后调用reindex
和interpolate
首先我们加载数据
from io import StringIO
data = StringIO(
"""
altitude density east_wind north_wind
0 5 0.020567 39.714397 6.795392
1 7 0.016871 41.171996 6.852655
2 9 0.013839 42.629594 6.909918
3 11 0.011351 44.087193 6.967182
4 13 0.009311 45.544791 7.024445
""")
df = pd.read_csv(data, sep='\s+', index_col=0)
df
然后
last_index = 21
df2 = df.set_index('altitude')
df2.loc[last_index] = df2.loc[df2.index[-1]] + (last_index - df2.index[-1])*(df2.loc[df2.index[-1]] - df2.loc[df2.index[-2]])/(df2.index[-1] - df2.index[-2])
df2.reindex(range(5,22)).interpolate().reset_index()
获得
altitude density east_wind north_wind
-- ---------- --------- ----------- ------------
0 5 0.020567 39.7144 6.79539
1 6 0.018719 40.4432 6.82402
2 7 0.016871 41.172 6.85266
3 8 0.015355 41.9008 6.88129
4 9 0.013839 42.6296 6.90992
5 10 0.012595 43.3584 6.93855
6 11 0.011351 44.0872 6.96718
7 12 0.010331 44.816 6.99581
8 13 0.009311 45.5448 7.02445
9 14 0.008291 46.2736 7.05308
10 15 0.007271 47.0024 7.08171
11 16 0.006251 47.7312 7.11034
12 17 0.005231 48.46 7.13897
13 18 0.004211 49.1888 7.1676
14 19 0.003191 49.9176 7.19623
15 20 0.002171 50.6464 7.22487
16 21 0.001151 51.3752 7.2535
我正在使用以下数据框,
altitude density east_wind north_wind
0 5 0.020567 39.714397 6.795392
1 7 0.016871 41.171996 6.852655
2 9 0.013839 42.629594 6.909918
3 11 0.011351 44.087193 6.967182
4 13 0.009311 45.544791 7.024445
并且我希望在 altitude
中有一个连续的值而不仅仅是奇数,然后使用 SciPy 的 .interpolate(method='linear')
填充缺失值并扩展插值altitude
值 20
预期输出
altitude density east_wind north_wind
0 5 0.020567 39.714397 6.795392
1 6 0.018871 41.171996 6.852655
2 7 0.015839 42.629594 6.909918
3 8 0.013351 44.087193 6.967182
4 9 0.010311 45.544791 7.024445
...
...
9 19 0.000351 50.087193 11.967182
10 20 0.000311 51.544791 12.024445
请指教
Pandas中的内插相对容易,外推有点难。所以我们“作弊”,手动计算altitude=21行,然后调用reindex
和interpolate
首先我们加载数据
from io import StringIO
data = StringIO(
"""
altitude density east_wind north_wind
0 5 0.020567 39.714397 6.795392
1 7 0.016871 41.171996 6.852655
2 9 0.013839 42.629594 6.909918
3 11 0.011351 44.087193 6.967182
4 13 0.009311 45.544791 7.024445
""")
df = pd.read_csv(data, sep='\s+', index_col=0)
df
然后
last_index = 21
df2 = df.set_index('altitude')
df2.loc[last_index] = df2.loc[df2.index[-1]] + (last_index - df2.index[-1])*(df2.loc[df2.index[-1]] - df2.loc[df2.index[-2]])/(df2.index[-1] - df2.index[-2])
df2.reindex(range(5,22)).interpolate().reset_index()
获得
altitude density east_wind north_wind
-- ---------- --------- ----------- ------------
0 5 0.020567 39.7144 6.79539
1 6 0.018719 40.4432 6.82402
2 7 0.016871 41.172 6.85266
3 8 0.015355 41.9008 6.88129
4 9 0.013839 42.6296 6.90992
5 10 0.012595 43.3584 6.93855
6 11 0.011351 44.0872 6.96718
7 12 0.010331 44.816 6.99581
8 13 0.009311 45.5448 7.02445
9 14 0.008291 46.2736 7.05308
10 15 0.007271 47.0024 7.08171
11 16 0.006251 47.7312 7.11034
12 17 0.005231 48.46 7.13897
13 18 0.004211 49.1888 7.1676
14 19 0.003191 49.9176 7.19623
15 20 0.002171 50.6464 7.22487
16 21 0.001151 51.3752 7.2535