正确替换 pandas 中的 NaT 的问题

Problems with replacing NaT in pandas correctly

我有一个包含一些 NaT 值的数据框。

                       Date Value
6312957 2012-01-01 23:58:00   -49
6312958 2012-01-01 23:59:00   -49
6312959                 NaT   -48
6312960 2012-01-02 00:01:00   -47
6312961 2012-01-02 00:02:00   -46

我尝试通过在之前的条目上加一分钟来替换这些 NAT。

indices_of_NAT = np.flatnonzero(pd.isna(df.loc[:, "Date"]))
df.loc[indices_of_NAT, "Date"] = df.loc[indices_of_NAT - 1, "Date"] + pd.Timedelta(minutes=1)

这会生成正确的时间戳和索引,这是我手动检查的。唯一的问题是无论出于何种原因,它们都不会替换 NaT 值。我想知道我最后一行代码中的索引是否有问题。有什么明显的我想念的吗?

您可以 fillna 使用偏移值 + 1 分钟:

df['Date'] = df['Date'].fillna(df['Date'].shift().add(pd.Timedelta('1min')))

另一种方法是interpolate。为此,您需要暂时转换为数字。这样你可以填充多个间隙,增量会自动计算,还有很多不错的插值方法(见文档):

df['Date'] = (pd.to_datetime(pd.to_numeric(df['Date'])
                               .mask(df['Date'].isna())
                               .interpolate('linear'))
              )

示例:

                 Date  Value               shift         interpolate
0 2012-01-01 23:58:00    -49 2012-01-01 23:58:00 2012-01-01 23:58:00
1 2012-01-01 23:59:00    -49 2012-01-01 23:59:00 2012-01-01 23:59:00
2                 NaT    -48 2012-01-02 00:00:00 2012-01-02 00:00:00
3 2012-01-02 00:01:00    -47 2012-01-02 00:01:00 2012-01-02 00:01:00
4                 NaT    -48 2012-01-02 00:02:00 2012-01-02 00:01:20
5                 NaT    -48                 NaT 2012-01-02 00:01:40
6 2012-01-02 00:02:00    -46 2012-01-02 00:02:00 2012-01-02 00:02:00

Series.fillna 与移位值一起使用并增加 1 分钟:

df['Date'] = df['Date'].fillna(df['Date'].shift() + pd.Timedelta(minutes=1))

或使用前向填充缺失值并添加 1 分钟:

df['Date'] = df['Date'].fillna(df['Date'].ffill() + pd.Timedelta(minutes=1))

您可以看到与另一个数据的不同之处:

df['Date'] = pd.to_datetime(df['Date'])

df['Date1'] = df['Date'].fillna(df['Date'].shift() + pd.Timedelta(minutes=1))
df['Date2'] = df['Date'].fillna(df['Date'].ffill() + pd.Timedelta(minutes=1))
print (df)
                       Date  Value               Date1               Date2
6312957 2012-01-01 23:58:00    -49 2012-01-01 23:58:00 2012-01-01 23:58:00
6312958 2012-01-01 23:59:00    -49 2012-01-01 23:59:00 2012-01-01 23:59:00
6312959                 NaT    -48 2012-01-02 00:00:00 2012-01-02 00:00:00
6312960 2012-01-02 00:01:00    -47 2012-01-02 00:01:00 2012-01-02 00:01:00
6312961 2012-01-02 00:02:00    -46 2012-01-02 00:02:00 2012-01-02 00:02:00
6312962                 NaT    -47 2012-01-02 00:03:00 2012-01-02 00:03:00
6312963                 NaT    -47                 NaT 2012-01-02 00:03:00
6312967 2012-01-02 00:01:00    -47 2012-01-02 00:01:00 2012-01-02 00:01:00