Pandas str.fullmatch NaN 异常行为
Pandas str.fullmatch unusual behaviour with NaN
当 pandas 数据框的一列仅包含 NaN 时,str.fullmatch 抛出:
AttributeError: Can only use .str accessor with string values!
以下 2 个行为符合预期:
data1 = [ ['2022-03-15 00:00:00'], [np.NaN] ]
df = pd.DataFrame(data1, columns = ['Date'] )
df = df.loc[ df.Date.str.fullmatch( '[0-9]{4}-[0-9]{2}-[0-9]{2}\s00:00:00', na=True ) ]
print(df)
data1 = [ [np.NaN], ['2022-03-15 00:00:00'] ]
df = pd.DataFrame(data1, columns = ['Date'] )
df = df.loc[ df.Date.str.fullmatch( '[0-9]{4}-[0-9]{2}-[0-9]{2}\s00:00:00', na=True ) ]
print(df)
只有当列完全为 NaN 时才会抛出错误:
data1 = [ [np.NaN], [np.NaN] ]
df = pd.DataFrame(data1, columns = ['Date'] )
dateRegex = '[0-9]{4}-[0-9]{2}-[0-9]{2}\s00:00:00'
df = df.loc[ df.Date.str.fullmatch(dateRegex, na=True) ]
它不应该将 NaN 填充为 True 并因此像上面其他两个一样被 loc 接受吗?
当你创建一个只有 NaN 值的系列时,系列的 dtype 是 float
因为 NaN
是一个 float
:
>>> s = pd.Series([np.nan, np.nan])
>>> s.dtype
dtype('float64')
>>> s.str
...
AttributeError: Can only use .str accessor with string values!
您需要将其转换为 object
(不一定是 str
;)数据类型,然后才能使用 .str
:
>>> s.astype(object).str
<pandas.core.strings.accessor.StringMethods at 0x122deb1c0>
所以...
data1 = [ [np.NaN], [np.NaN] ]
df = pd.DataFrame(data1, columns = ['Date'])
dateRegex = '[0-9]{4}-[0-9]{2}-[0-9]{2}\s00:00:00'
df.Date = df.Date.astype(object) # <--- Add this line
df = df.loc[ df.Date.str.fullmatch(dateRegex, na=True) ]
输出:
>>> df
Date
0 NaN
1 NaN
当 pandas 数据框的一列仅包含 NaN 时,str.fullmatch 抛出:
AttributeError: Can only use .str accessor with string values!
以下 2 个行为符合预期:
data1 = [ ['2022-03-15 00:00:00'], [np.NaN] ]
df = pd.DataFrame(data1, columns = ['Date'] )
df = df.loc[ df.Date.str.fullmatch( '[0-9]{4}-[0-9]{2}-[0-9]{2}\s00:00:00', na=True ) ]
print(df)
data1 = [ [np.NaN], ['2022-03-15 00:00:00'] ]
df = pd.DataFrame(data1, columns = ['Date'] )
df = df.loc[ df.Date.str.fullmatch( '[0-9]{4}-[0-9]{2}-[0-9]{2}\s00:00:00', na=True ) ]
print(df)
只有当列完全为 NaN 时才会抛出错误:
data1 = [ [np.NaN], [np.NaN] ]
df = pd.DataFrame(data1, columns = ['Date'] )
dateRegex = '[0-9]{4}-[0-9]{2}-[0-9]{2}\s00:00:00'
df = df.loc[ df.Date.str.fullmatch(dateRegex, na=True) ]
它不应该将 NaN 填充为 True 并因此像上面其他两个一样被 loc 接受吗?
当你创建一个只有 NaN 值的系列时,系列的 dtype 是 float
因为 NaN
是一个 float
:
>>> s = pd.Series([np.nan, np.nan])
>>> s.dtype
dtype('float64')
>>> s.str
...
AttributeError: Can only use .str accessor with string values!
您需要将其转换为 object
(不一定是 str
;)数据类型,然后才能使用 .str
:
>>> s.astype(object).str
<pandas.core.strings.accessor.StringMethods at 0x122deb1c0>
所以...
data1 = [ [np.NaN], [np.NaN] ]
df = pd.DataFrame(data1, columns = ['Date'])
dateRegex = '[0-9]{4}-[0-9]{2}-[0-9]{2}\s00:00:00'
df.Date = df.Date.astype(object) # <--- Add this line
df = df.loc[ df.Date.str.fullmatch(dateRegex, na=True) ]
输出:
>>> df
Date
0 NaN
1 NaN