Pandas:使用滚动函数检查 NaN
Pandas: Checking for NaN using rolling function
我有一个带有变量 "A" 的数据框,我想创建一个滚动 Nan 检查器,这样新变量 "rolling_nan" = 1 if ALL 3 (seconds) cells (current cell 和前两个)是 NaN,否则 "rolling_nan" = 0.
我正在应用一个函数,因为 .rolling
pandas 函数不支持 isna()
。但是我得到以下信息。我也不确定如何在 NaN 检查器中包含相同的行值。
import pandas as pd
import numpy as np
idx = pd.date_range('2018-01-01', periods=10, freq='S')
df = pd.DataFrame({"A":[1,2,3,np.nan,np.nan,np.nan,6,7,8,9]}, index = idx)
df
def isna_func(x):
return 1 if pd.isna(x).all() == True else 0
df['rolling_nan'] = df['A'].rolling(3).apply(isna_func)
df
A rolling_nan
2018-01-01 00:00:00 1.0 NaN
2018-01-01 00:00:01 2.0 NaN
2018-01-01 00:00:02 3.0 0.0
2018-01-01 00:00:03 NaN NaN
2018-01-01 00:00:04 NaN NaN
2018-01-01 00:00:05 NaN NaN
2018-01-01 00:00:06 6.0 NaN
2018-01-01 00:00:07 7.0 NaN
2018-01-01 00:00:08 8.0 0.0
2018-01-01 00:00:09 9.0 0.0
在上面的示例中,rolling_nan
应仅在时间戳 2018-01-01 00:00:05
处等于 1,否则应等于 0。
你可以用不同的方式思考标记所有 notna
,然后找到 max
df.A.notna().rolling(3).max()==0
Out[316]:
2018-01-01 00:00:00 False
2018-01-01 00:00:01 False
2018-01-01 00:00:02 False
2018-01-01 00:00:03 False
2018-01-01 00:00:04 False
2018-01-01 00:00:05 True
2018-01-01 00:00:06 False
2018-01-01 00:00:07 False
2018-01-01 00:00:08 False
2018-01-01 00:00:09 False
Freq: S, Name: A, dtype: bool
将其分配回去
df['rollingnan']=(df.A.notna().rolling(3).max()==0).astype(int)
df
Out[320]:
A rollingnan
2018-01-01 00:00:00 1.0 0
2018-01-01 00:00:01 2.0 0
2018-01-01 00:00:02 3.0 0
2018-01-01 00:00:03 NaN 0
2018-01-01 00:00:04 NaN 0
2018-01-01 00:00:05 NaN 1
2018-01-01 00:00:06 6.0 0
2018-01-01 00:00:07 7.0 0
2018-01-01 00:00:08 8.0 0
2018-01-01 00:00:09 9.0 0
或根据自己的想法使用 all
df['A'].isna().rolling(3).apply(lambda x : x.all(),raw=True)
Out[323]:
2018-01-01 00:00:00 NaN
2018-01-01 00:00:01 NaN
2018-01-01 00:00:02 0.0
2018-01-01 00:00:03 0.0
2018-01-01 00:00:04 0.0
2018-01-01 00:00:05 1.0
2018-01-01 00:00:06 0.0
2018-01-01 00:00:07 0.0
2018-01-01 00:00:08 0.0
2018-01-01 00:00:09 0.0
Freq: S, Name: A, dtype: float64
我有一个带有变量 "A" 的数据框,我想创建一个滚动 Nan 检查器,这样新变量 "rolling_nan" = 1 if ALL 3 (seconds) cells (current cell 和前两个)是 NaN,否则 "rolling_nan" = 0.
我正在应用一个函数,因为 .rolling
pandas 函数不支持 isna()
。但是我得到以下信息。我也不确定如何在 NaN 检查器中包含相同的行值。
import pandas as pd
import numpy as np
idx = pd.date_range('2018-01-01', periods=10, freq='S')
df = pd.DataFrame({"A":[1,2,3,np.nan,np.nan,np.nan,6,7,8,9]}, index = idx)
df
def isna_func(x):
return 1 if pd.isna(x).all() == True else 0
df['rolling_nan'] = df['A'].rolling(3).apply(isna_func)
df
A rolling_nan
2018-01-01 00:00:00 1.0 NaN
2018-01-01 00:00:01 2.0 NaN
2018-01-01 00:00:02 3.0 0.0
2018-01-01 00:00:03 NaN NaN
2018-01-01 00:00:04 NaN NaN
2018-01-01 00:00:05 NaN NaN
2018-01-01 00:00:06 6.0 NaN
2018-01-01 00:00:07 7.0 NaN
2018-01-01 00:00:08 8.0 0.0
2018-01-01 00:00:09 9.0 0.0
在上面的示例中,rolling_nan
应仅在时间戳 2018-01-01 00:00:05
处等于 1,否则应等于 0。
你可以用不同的方式思考标记所有 notna
,然后找到 max
df.A.notna().rolling(3).max()==0
Out[316]:
2018-01-01 00:00:00 False
2018-01-01 00:00:01 False
2018-01-01 00:00:02 False
2018-01-01 00:00:03 False
2018-01-01 00:00:04 False
2018-01-01 00:00:05 True
2018-01-01 00:00:06 False
2018-01-01 00:00:07 False
2018-01-01 00:00:08 False
2018-01-01 00:00:09 False
Freq: S, Name: A, dtype: bool
将其分配回去
df['rollingnan']=(df.A.notna().rolling(3).max()==0).astype(int)
df
Out[320]:
A rollingnan
2018-01-01 00:00:00 1.0 0
2018-01-01 00:00:01 2.0 0
2018-01-01 00:00:02 3.0 0
2018-01-01 00:00:03 NaN 0
2018-01-01 00:00:04 NaN 0
2018-01-01 00:00:05 NaN 1
2018-01-01 00:00:06 6.0 0
2018-01-01 00:00:07 7.0 0
2018-01-01 00:00:08 8.0 0
2018-01-01 00:00:09 9.0 0
或根据自己的想法使用 all
df['A'].isna().rolling(3).apply(lambda x : x.all(),raw=True)
Out[323]:
2018-01-01 00:00:00 NaN
2018-01-01 00:00:01 NaN
2018-01-01 00:00:02 0.0
2018-01-01 00:00:03 0.0
2018-01-01 00:00:04 0.0
2018-01-01 00:00:05 1.0
2018-01-01 00:00:06 0.0
2018-01-01 00:00:07 0.0
2018-01-01 00:00:08 0.0
2018-01-01 00:00:09 0.0
Freq: S, Name: A, dtype: float64