用线性插值填充时间戳 NaT
Fill Timestamp NaT with a linear interpolation
我有一个这样的 DataFrame df
:
t pos
frame
0 2015-11-21 14:46:32.843517000 0.000000
1 NaT 0.000000
2 NaT 0.000000
3 NaT 0.000000
4 NaT 0.000000
5 NaT 0.000000
6 NaT 0.000000
7 NaT 0.000000
8 NaT 0.000000
9 NaT 0.000000
10 NaT 0.000000
11 NaT 0.000000
12 NaT 0.000000
13 NaT 0.000000
14 NaT 0.000000
15 NaT 0.000000
16 NaT 0.000000
17 NaT 0.000000
18 NaT 0.000000
19 NaT 0.000000
... ... ...
304 2015-11-21 14:46:54.255383750 12.951807
305 2015-11-21 14:46:54.312271250 5.421687
306 2015-11-21 14:46:54.343288000 3.614458
307 2015-11-21 14:46:54.445307000 1.204819
308 2015-11-21 14:46:54.477091000 0.000000
309 NaT 0.000000
310 NaT 0.000000
311 NaT 0.000000
312 NaT 0.000000
313 NaT 0.000000
314 2015-11-21 14:46:54.927361000 1.204819
315 2015-11-21 14:46:55.003917250 4.819277
316 2015-11-21 14:46:55.058081500 12.048193
317 2015-11-21 14:46:55.112070500 24.698795
318 2015-11-21 14:46:55.167366000 34.538153
319 2015-11-21 14:46:55.252116750 29.718876
320 2015-11-21 14:46:55.325177750 16.064257
321 2015-11-21 14:46:55.396772000 6.927711
322 2015-11-21 14:46:55.448250000 3.614458
323 2015-11-21 14:46:55.559872500 0.602410
我想用 pandas.tslib.Timestamp
填充 NaT
。
我找到了http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.fillna.html
但我找不到 method
这个。
但可能有一个解决方法。
你说得对 interpolate
方法目前不适用于 Timestamp
。一种解决方案是将其转换为浮点数,对其进行插值并将其转换回 Timestamp
:
In [63]:
print df
pos t
0 0 2015-11-21 14:46:54.445307000
1 1 2015-11-21 14:46:54.477091000
2 2 NaT
3 3 NaT
4 4 NaT
5 5 NaT
6 6 2015-11-21 14:46:54.927361000
7 7 2015-11-21 14:46:55.003917250
In [64]:
pd.to_datetime(pd.to_numeric(df.t).interpolate())
Out[64]:
0 2015-11-21 14:46:54.445306880
1 2015-11-21 14:46:54.477091072
2 2015-11-21 14:46:54.567144960
3 2015-11-21 14:46:54.657199104
4 2015-11-21 14:46:54.747252992
5 2015-11-21 14:46:54.837307136
6 2015-11-21 14:46:54.927361024
7 2015-11-21 14:46:55.003917312
Name: t, dtype: datetime64[ns]
In [65]:
print df
df.ix[df.t.isnull(), 't'] = pd.to_datetime(pd.to_numeric(df.t).interpolate())[df.t.isnull()]
print df
pos t
0 0 2015-11-21 14:46:54.445307000
1 1 2015-11-21 14:46:54.477091000
2 2 2015-11-21 14:46:54.567144960
3 3 2015-11-21 14:46:54.657199104
4 4 2015-11-21 14:46:54.747252992
5 5 2015-11-21 14:46:54.837307136
6 6 2015-11-21 14:46:54.927361000
7 7 2015-11-21 14:46:55.003917250
但是请注意,由于精度丢失(我想这可能是原因),数字有点偏差(正负 ~1e-6 秒)。只用内插值填充 nan 并让非 nans 保持原样可能是明智的。
我有一个这样的 DataFrame df
:
t pos
frame
0 2015-11-21 14:46:32.843517000 0.000000
1 NaT 0.000000
2 NaT 0.000000
3 NaT 0.000000
4 NaT 0.000000
5 NaT 0.000000
6 NaT 0.000000
7 NaT 0.000000
8 NaT 0.000000
9 NaT 0.000000
10 NaT 0.000000
11 NaT 0.000000
12 NaT 0.000000
13 NaT 0.000000
14 NaT 0.000000
15 NaT 0.000000
16 NaT 0.000000
17 NaT 0.000000
18 NaT 0.000000
19 NaT 0.000000
... ... ...
304 2015-11-21 14:46:54.255383750 12.951807
305 2015-11-21 14:46:54.312271250 5.421687
306 2015-11-21 14:46:54.343288000 3.614458
307 2015-11-21 14:46:54.445307000 1.204819
308 2015-11-21 14:46:54.477091000 0.000000
309 NaT 0.000000
310 NaT 0.000000
311 NaT 0.000000
312 NaT 0.000000
313 NaT 0.000000
314 2015-11-21 14:46:54.927361000 1.204819
315 2015-11-21 14:46:55.003917250 4.819277
316 2015-11-21 14:46:55.058081500 12.048193
317 2015-11-21 14:46:55.112070500 24.698795
318 2015-11-21 14:46:55.167366000 34.538153
319 2015-11-21 14:46:55.252116750 29.718876
320 2015-11-21 14:46:55.325177750 16.064257
321 2015-11-21 14:46:55.396772000 6.927711
322 2015-11-21 14:46:55.448250000 3.614458
323 2015-11-21 14:46:55.559872500 0.602410
我想用 pandas.tslib.Timestamp
填充 NaT
。
我找到了http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.fillna.html
但我找不到 method
这个。
但可能有一个解决方法。
你说得对 interpolate
方法目前不适用于 Timestamp
。一种解决方案是将其转换为浮点数,对其进行插值并将其转换回 Timestamp
:
In [63]:
print df
pos t
0 0 2015-11-21 14:46:54.445307000
1 1 2015-11-21 14:46:54.477091000
2 2 NaT
3 3 NaT
4 4 NaT
5 5 NaT
6 6 2015-11-21 14:46:54.927361000
7 7 2015-11-21 14:46:55.003917250
In [64]:
pd.to_datetime(pd.to_numeric(df.t).interpolate())
Out[64]:
0 2015-11-21 14:46:54.445306880
1 2015-11-21 14:46:54.477091072
2 2015-11-21 14:46:54.567144960
3 2015-11-21 14:46:54.657199104
4 2015-11-21 14:46:54.747252992
5 2015-11-21 14:46:54.837307136
6 2015-11-21 14:46:54.927361024
7 2015-11-21 14:46:55.003917312
Name: t, dtype: datetime64[ns]
In [65]:
print df
df.ix[df.t.isnull(), 't'] = pd.to_datetime(pd.to_numeric(df.t).interpolate())[df.t.isnull()]
print df
pos t
0 0 2015-11-21 14:46:54.445307000
1 1 2015-11-21 14:46:54.477091000
2 2 2015-11-21 14:46:54.567144960
3 3 2015-11-21 14:46:54.657199104
4 4 2015-11-21 14:46:54.747252992
5 5 2015-11-21 14:46:54.837307136
6 6 2015-11-21 14:46:54.927361000
7 7 2015-11-21 14:46:55.003917250
但是请注意,由于精度丢失(我想这可能是原因),数字有点偏差(正负 ~1e-6 秒)。只用内插值填充 nan 并让非 nans 保持原样可能是明智的。