很难检查 NaN 值,isna/isnull 坏了吗?

Having hard time checking for NaN values, did isna/isnull break?

所以我有这个代码p.s。 (抱歉,由于保密原因无法提供数据框)但也许我在这里遗漏了一些东西

new_df = None
new_fn = None
prev_df = None
prev_fn = None
while 1:
    msg = conn.recv()
    if len(msg) > 1:
        df = msg[0]
        file_name = msg[1]

        df['col2'] = ''
        df['col2'] = df['col2'].apply(pd.to_numeric).astype('Int64')

        if new_fn is None:
            new_df = df
            new_fn = file_name
            new_df['col2'] = new_df['col1']
        else:
            prev_df = new_df
            prev_fn = new_fn
            new_df = df
            new_fn = file_name

            new_df = prev_df.merge(new_df, on='main', how='outer', suffixes=('_prev', '_new'))
            new_df = new_df.assign(**{col: new_df[col].fillna(new_df[col.replace("_new", "_prev")])
                                      for col in new_df.columns if "_new" in col})

比代码到达下面的这个块,它适用于我测试过的具有相同特征的随机数据帧,但当与上面的代码绑定时就不行了

np.where(new_df['col2_new'].isna(),
                     new_df['col2_new'].fillna(new_df['col1_new']), new_df['col2_new'])

出于某种原因,fillna 不起作用,col2_new 留下了许多 NA

print(new_df.isna().sum())                        print(new_df.dtypes)

main                            0          main                          object
col1_prev                     158          col1_prev                     Int64
col2_prev                     158          col2_prev                     Int64
col1_new                        0          col1_new                      Int64
col2_new                      158          col2_new                      Int64

dtype: int64                               dtype: object

我也遇到了 isna/isnull 的一些问题,这似乎是问题所在:

df = pd.DataFrame({'col1': str(randint(10, 100)), 'col2': randint(10, 100), 'col3': ""}, index=range(0, 3))
np.where(df['col3'].isna, df['col3'].fillna(df['col1']), df['col3'])

它直到最近才给出正确的输出,但现在感觉好像有什么东西坏了:

print(df.count())            print(df.isna().sum())         print(df)

col1    3                    col1    0                        col1  col2 col3
col2    3                    col2    0                      0   33    38
col3    3                    col3    0                      1   33    38
dtype: int64                 dtype: int64                   2   33    38

只有我吗?难道我做错了什么?是口译员吗?

感谢任何帮助,谢谢!

np.where() 不会就地更改任何内容。结果需要赋值回new_df['col2_new']:

new_df['col2_new'] = np.where(
    new_df['col2_new'].isna(),
    new_df['col2_new'].fillna(new_df['col1_new']),
    new_df['col2_new'])

另外,我相信您可以将其简化为仅使用 fillna()

new_df['col2_new'] = new_df['col2_new'].fillna(new_df['col1_new'])

你也可以使用inplace=True就地改变数据

new_df['col2_new'].fillna(new_df['col1_new'], inplace=True)