用线性插值填充时间戳 NaT

Fill Timestamp NaT with a linear interpolation

我有一个这样的 DataFrame df

                                  t        pos
frame
0     2015-11-21 14:46:32.843517000   0.000000
1                               NaT   0.000000
2                               NaT   0.000000
3                               NaT   0.000000
4                               NaT   0.000000
5                               NaT   0.000000
6                               NaT   0.000000
7                               NaT   0.000000
8                               NaT   0.000000
9                               NaT   0.000000
10                              NaT   0.000000
11                              NaT   0.000000
12                              NaT   0.000000
13                              NaT   0.000000
14                              NaT   0.000000
15                              NaT   0.000000
16                              NaT   0.000000
17                              NaT   0.000000
18                              NaT   0.000000
19                              NaT   0.000000
...                             ...        ...
304   2015-11-21 14:46:54.255383750  12.951807
305   2015-11-21 14:46:54.312271250   5.421687
306   2015-11-21 14:46:54.343288000   3.614458
307   2015-11-21 14:46:54.445307000   1.204819
308   2015-11-21 14:46:54.477091000   0.000000
309                             NaT   0.000000
310                             NaT   0.000000
311                             NaT   0.000000
312                             NaT   0.000000
313                             NaT   0.000000
314   2015-11-21 14:46:54.927361000   1.204819
315   2015-11-21 14:46:55.003917250   4.819277
316   2015-11-21 14:46:55.058081500  12.048193
317   2015-11-21 14:46:55.112070500  24.698795
318   2015-11-21 14:46:55.167366000  34.538153
319   2015-11-21 14:46:55.252116750  29.718876
320   2015-11-21 14:46:55.325177750  16.064257
321   2015-11-21 14:46:55.396772000   6.927711
322   2015-11-21 14:46:55.448250000   3.614458
323   2015-11-21 14:46:55.559872500   0.602410

我想用 pandas.tslib.Timestamp 填充 NaT

我找到了http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.fillna.html

但我找不到 method 这个。

但可能有一个解决方法。

你说得对 interpolate 方法目前不适用于 Timestamp。一种解决方案是将其转换为浮点数,对其进行插值并将其转换回 Timestamp:

In [63]:

print df
   pos                             t
0    0 2015-11-21 14:46:54.445307000
1    1 2015-11-21 14:46:54.477091000
2    2                           NaT
3    3                           NaT
4    4                           NaT
5    5                           NaT
6    6 2015-11-21 14:46:54.927361000
7    7 2015-11-21 14:46:55.003917250
In [64]:

pd.to_datetime(pd.to_numeric(df.t).interpolate())
Out[64]:
0   2015-11-21 14:46:54.445306880
1   2015-11-21 14:46:54.477091072
2   2015-11-21 14:46:54.567144960
3   2015-11-21 14:46:54.657199104
4   2015-11-21 14:46:54.747252992
5   2015-11-21 14:46:54.837307136
6   2015-11-21 14:46:54.927361024
7   2015-11-21 14:46:55.003917312
Name: t, dtype: datetime64[ns]
In [65]:

print df
df.ix[df.t.isnull(), 't'] = pd.to_datetime(pd.to_numeric(df.t).interpolate())[df.t.isnull()]
print df
   pos                             t
0    0 2015-11-21 14:46:54.445307000
1    1 2015-11-21 14:46:54.477091000
2    2 2015-11-21 14:46:54.567144960
3    3 2015-11-21 14:46:54.657199104
4    4 2015-11-21 14:46:54.747252992
5    5 2015-11-21 14:46:54.837307136
6    6 2015-11-21 14:46:54.927361000
7    7 2015-11-21 14:46:55.003917250

但是请注意,由于精度丢失(我想这可能是原因),数字有点偏差(正负 ~1e-6 秒)。只用内插值填充 nan 并让非 nans 保持原样可能是明智的。