Pandas str.fullmatch NaN 异常行为

Pandas str.fullmatch unusual behaviour with NaN

当 pandas 数据框的一列仅包含 NaN 时,str.fullmatch 抛出:

AttributeError: Can only use .str accessor with string values!

以下 2 个行为符合预期:

data1 = [ ['2022-03-15 00:00:00'], [np.NaN] ]
df = pd.DataFrame(data1, columns = ['Date'] )
df = df.loc[ df.Date.str.fullmatch( '[0-9]{4}-[0-9]{2}-[0-9]{2}\s00:00:00', na=True ) ]
print(df)

data1 = [ [np.NaN], ['2022-03-15 00:00:00'] ]
df = pd.DataFrame(data1, columns = ['Date'] )
df = df.loc[ df.Date.str.fullmatch( '[0-9]{4}-[0-9]{2}-[0-9]{2}\s00:00:00', na=True ) ]
print(df)

只有当列完全为 NaN 时才会抛出错误:

data1 = [ [np.NaN], [np.NaN] ]
df = pd.DataFrame(data1, columns = ['Date'] )
dateRegex = '[0-9]{4}-[0-9]{2}-[0-9]{2}\s00:00:00'
df = df.loc[ df.Date.str.fullmatch(dateRegex, na=True) ]

它不应该将 NaN 填充为 True 并因此像上面其他两个一样被 loc 接受吗?

当你创建一个只有 NaN 值的系列时,系列的 dtype 是 float 因为 NaN 是一个 float:

>>> s = pd.Series([np.nan, np.nan])
>>> s.dtype
dtype('float64')

>>> s.str
...
AttributeError: Can only use .str accessor with string values!

您需要将其转换为 object(不一定是 str ;)数据类型,然后才能使用 .str:

>>> s.astype(object).str
<pandas.core.strings.accessor.StringMethods at 0x122deb1c0>

所以...

data1 = [ [np.NaN], [np.NaN] ]
df = pd.DataFrame(data1, columns = ['Date'])
dateRegex = '[0-9]{4}-[0-9]{2}-[0-9]{2}\s00:00:00'
df.Date = df.Date.astype(object)  # <--- Add this line
df = df.loc[ df.Date.str.fullmatch(dateRegex, na=True) ]

输出:

>>> df
  Date
0  NaN
1  NaN