正确替换 pandas 中的 NaT 的问题
Problems with replacing NaT in pandas correctly
我有一个包含一些 NaT 值的数据框。
Date Value
6312957 2012-01-01 23:58:00 -49
6312958 2012-01-01 23:59:00 -49
6312959 NaT -48
6312960 2012-01-02 00:01:00 -47
6312961 2012-01-02 00:02:00 -46
我尝试通过在之前的条目上加一分钟来替换这些 NAT。
indices_of_NAT = np.flatnonzero(pd.isna(df.loc[:, "Date"]))
df.loc[indices_of_NAT, "Date"] = df.loc[indices_of_NAT - 1, "Date"] + pd.Timedelta(minutes=1)
这会生成正确的时间戳和索引,这是我手动检查的。唯一的问题是无论出于何种原因,它们都不会替换 NaT 值。我想知道我最后一行代码中的索引是否有问题。有什么明显的我想念的吗?
您可以 fillna
使用偏移值 + 1 分钟:
df['Date'] = df['Date'].fillna(df['Date'].shift().add(pd.Timedelta('1min')))
另一种方法是interpolate
。为此,您需要暂时转换为数字。这样你可以填充多个间隙,增量会自动计算,还有很多不错的插值方法(见文档):
df['Date'] = (pd.to_datetime(pd.to_numeric(df['Date'])
.mask(df['Date'].isna())
.interpolate('linear'))
)
示例:
Date Value shift interpolate
0 2012-01-01 23:58:00 -49 2012-01-01 23:58:00 2012-01-01 23:58:00
1 2012-01-01 23:59:00 -49 2012-01-01 23:59:00 2012-01-01 23:59:00
2 NaT -48 2012-01-02 00:00:00 2012-01-02 00:00:00
3 2012-01-02 00:01:00 -47 2012-01-02 00:01:00 2012-01-02 00:01:00
4 NaT -48 2012-01-02 00:02:00 2012-01-02 00:01:20
5 NaT -48 NaT 2012-01-02 00:01:40
6 2012-01-02 00:02:00 -46 2012-01-02 00:02:00 2012-01-02 00:02:00
将 Series.fillna
与移位值一起使用并增加 1 分钟:
df['Date'] = df['Date'].fillna(df['Date'].shift() + pd.Timedelta(minutes=1))
或使用前向填充缺失值并添加 1 分钟:
df['Date'] = df['Date'].fillna(df['Date'].ffill() + pd.Timedelta(minutes=1))
您可以看到与另一个数据的不同之处:
df['Date'] = pd.to_datetime(df['Date'])
df['Date1'] = df['Date'].fillna(df['Date'].shift() + pd.Timedelta(minutes=1))
df['Date2'] = df['Date'].fillna(df['Date'].ffill() + pd.Timedelta(minutes=1))
print (df)
Date Value Date1 Date2
6312957 2012-01-01 23:58:00 -49 2012-01-01 23:58:00 2012-01-01 23:58:00
6312958 2012-01-01 23:59:00 -49 2012-01-01 23:59:00 2012-01-01 23:59:00
6312959 NaT -48 2012-01-02 00:00:00 2012-01-02 00:00:00
6312960 2012-01-02 00:01:00 -47 2012-01-02 00:01:00 2012-01-02 00:01:00
6312961 2012-01-02 00:02:00 -46 2012-01-02 00:02:00 2012-01-02 00:02:00
6312962 NaT -47 2012-01-02 00:03:00 2012-01-02 00:03:00
6312963 NaT -47 NaT 2012-01-02 00:03:00
6312967 2012-01-02 00:01:00 -47 2012-01-02 00:01:00 2012-01-02 00:01:00
我有一个包含一些 NaT 值的数据框。
Date Value
6312957 2012-01-01 23:58:00 -49
6312958 2012-01-01 23:59:00 -49
6312959 NaT -48
6312960 2012-01-02 00:01:00 -47
6312961 2012-01-02 00:02:00 -46
我尝试通过在之前的条目上加一分钟来替换这些 NAT。
indices_of_NAT = np.flatnonzero(pd.isna(df.loc[:, "Date"]))
df.loc[indices_of_NAT, "Date"] = df.loc[indices_of_NAT - 1, "Date"] + pd.Timedelta(minutes=1)
这会生成正确的时间戳和索引,这是我手动检查的。唯一的问题是无论出于何种原因,它们都不会替换 NaT 值。我想知道我最后一行代码中的索引是否有问题。有什么明显的我想念的吗?
您可以 fillna
使用偏移值 + 1 分钟:
df['Date'] = df['Date'].fillna(df['Date'].shift().add(pd.Timedelta('1min')))
另一种方法是interpolate
。为此,您需要暂时转换为数字。这样你可以填充多个间隙,增量会自动计算,还有很多不错的插值方法(见文档):
df['Date'] = (pd.to_datetime(pd.to_numeric(df['Date'])
.mask(df['Date'].isna())
.interpolate('linear'))
)
示例:
Date Value shift interpolate
0 2012-01-01 23:58:00 -49 2012-01-01 23:58:00 2012-01-01 23:58:00
1 2012-01-01 23:59:00 -49 2012-01-01 23:59:00 2012-01-01 23:59:00
2 NaT -48 2012-01-02 00:00:00 2012-01-02 00:00:00
3 2012-01-02 00:01:00 -47 2012-01-02 00:01:00 2012-01-02 00:01:00
4 NaT -48 2012-01-02 00:02:00 2012-01-02 00:01:20
5 NaT -48 NaT 2012-01-02 00:01:40
6 2012-01-02 00:02:00 -46 2012-01-02 00:02:00 2012-01-02 00:02:00
将 Series.fillna
与移位值一起使用并增加 1 分钟:
df['Date'] = df['Date'].fillna(df['Date'].shift() + pd.Timedelta(minutes=1))
或使用前向填充缺失值并添加 1 分钟:
df['Date'] = df['Date'].fillna(df['Date'].ffill() + pd.Timedelta(minutes=1))
您可以看到与另一个数据的不同之处:
df['Date'] = pd.to_datetime(df['Date'])
df['Date1'] = df['Date'].fillna(df['Date'].shift() + pd.Timedelta(minutes=1))
df['Date2'] = df['Date'].fillna(df['Date'].ffill() + pd.Timedelta(minutes=1))
print (df)
Date Value Date1 Date2
6312957 2012-01-01 23:58:00 -49 2012-01-01 23:58:00 2012-01-01 23:58:00
6312958 2012-01-01 23:59:00 -49 2012-01-01 23:59:00 2012-01-01 23:59:00
6312959 NaT -48 2012-01-02 00:00:00 2012-01-02 00:00:00
6312960 2012-01-02 00:01:00 -47 2012-01-02 00:01:00 2012-01-02 00:01:00
6312961 2012-01-02 00:02:00 -46 2012-01-02 00:02:00 2012-01-02 00:02:00
6312962 NaT -47 2012-01-02 00:03:00 2012-01-02 00:03:00
6312963 NaT -47 NaT 2012-01-02 00:03:00
6312967 2012-01-02 00:01:00 -47 2012-01-02 00:01:00 2012-01-02 00:01:00