很难检查 NaN 值,isna/isnull 坏了吗?
Having hard time checking for NaN values, did isna/isnull break?
所以我有这个代码p.s。 (抱歉,由于保密原因无法提供数据框)但也许我在这里遗漏了一些东西
new_df = None
new_fn = None
prev_df = None
prev_fn = None
while 1:
msg = conn.recv()
if len(msg) > 1:
df = msg[0]
file_name = msg[1]
df['col2'] = ''
df['col2'] = df['col2'].apply(pd.to_numeric).astype('Int64')
if new_fn is None:
new_df = df
new_fn = file_name
new_df['col2'] = new_df['col1']
else:
prev_df = new_df
prev_fn = new_fn
new_df = df
new_fn = file_name
new_df = prev_df.merge(new_df, on='main', how='outer', suffixes=('_prev', '_new'))
new_df = new_df.assign(**{col: new_df[col].fillna(new_df[col.replace("_new", "_prev")])
for col in new_df.columns if "_new" in col})
比代码到达下面的这个块,它适用于我测试过的具有相同特征的随机数据帧,但当与上面的代码绑定时就不行了
np.where(new_df['col2_new'].isna(),
new_df['col2_new'].fillna(new_df['col1_new']), new_df['col2_new'])
出于某种原因,fillna
不起作用,col2_new
留下了许多 NA
值
print(new_df.isna().sum()) print(new_df.dtypes)
main 0 main object
col1_prev 158 col1_prev Int64
col2_prev 158 col2_prev Int64
col1_new 0 col1_new Int64
col2_new 158 col2_new Int64
dtype: int64 dtype: object
我也遇到了 isna/isnull
的一些问题,这似乎是问题所在:
df = pd.DataFrame({'col1': str(randint(10, 100)), 'col2': randint(10, 100), 'col3': ""}, index=range(0, 3))
np.where(df['col3'].isna, df['col3'].fillna(df['col1']), df['col3'])
它直到最近才给出正确的输出,但现在感觉好像有什么东西坏了:
print(df.count()) print(df.isna().sum()) print(df)
col1 3 col1 0 col1 col2 col3
col2 3 col2 0 0 33 38
col3 3 col3 0 1 33 38
dtype: int64 dtype: int64 2 33 38
只有我吗?难道我做错了什么?是口译员吗?
感谢任何帮助,谢谢!
np.where()
不会就地更改任何内容。结果需要赋值回new_df['col2_new']
:
new_df['col2_new'] = np.where(
new_df['col2_new'].isna(),
new_df['col2_new'].fillna(new_df['col1_new']),
new_df['col2_new'])
另外,我相信您可以将其简化为仅使用 fillna()
:
new_df['col2_new'] = new_df['col2_new'].fillna(new_df['col1_new'])
你也可以使用inplace=True就地改变数据
new_df['col2_new'].fillna(new_df['col1_new'], inplace=True)
所以我有这个代码p.s。 (抱歉,由于保密原因无法提供数据框)但也许我在这里遗漏了一些东西
new_df = None
new_fn = None
prev_df = None
prev_fn = None
while 1:
msg = conn.recv()
if len(msg) > 1:
df = msg[0]
file_name = msg[1]
df['col2'] = ''
df['col2'] = df['col2'].apply(pd.to_numeric).astype('Int64')
if new_fn is None:
new_df = df
new_fn = file_name
new_df['col2'] = new_df['col1']
else:
prev_df = new_df
prev_fn = new_fn
new_df = df
new_fn = file_name
new_df = prev_df.merge(new_df, on='main', how='outer', suffixes=('_prev', '_new'))
new_df = new_df.assign(**{col: new_df[col].fillna(new_df[col.replace("_new", "_prev")])
for col in new_df.columns if "_new" in col})
比代码到达下面的这个块,它适用于我测试过的具有相同特征的随机数据帧,但当与上面的代码绑定时就不行了
np.where(new_df['col2_new'].isna(),
new_df['col2_new'].fillna(new_df['col1_new']), new_df['col2_new'])
出于某种原因,fillna
不起作用,col2_new
留下了许多 NA
值
print(new_df.isna().sum()) print(new_df.dtypes)
main 0 main object
col1_prev 158 col1_prev Int64
col2_prev 158 col2_prev Int64
col1_new 0 col1_new Int64
col2_new 158 col2_new Int64
dtype: int64 dtype: object
我也遇到了 isna/isnull
的一些问题,这似乎是问题所在:
df = pd.DataFrame({'col1': str(randint(10, 100)), 'col2': randint(10, 100), 'col3': ""}, index=range(0, 3))
np.where(df['col3'].isna, df['col3'].fillna(df['col1']), df['col3'])
它直到最近才给出正确的输出,但现在感觉好像有什么东西坏了:
print(df.count()) print(df.isna().sum()) print(df)
col1 3 col1 0 col1 col2 col3
col2 3 col2 0 0 33 38
col3 3 col3 0 1 33 38
dtype: int64 dtype: int64 2 33 38
只有我吗?难道我做错了什么?是口译员吗?
感谢任何帮助,谢谢!
np.where()
不会就地更改任何内容。结果需要赋值回new_df['col2_new']
:
new_df['col2_new'] = np.where(
new_df['col2_new'].isna(),
new_df['col2_new'].fillna(new_df['col1_new']),
new_df['col2_new'])
另外,我相信您可以将其简化为仅使用 fillna()
:
new_df['col2_new'] = new_df['col2_new'].fillna(new_df['col1_new'])
你也可以使用inplace=True就地改变数据
new_df['col2_new'].fillna(new_df['col1_new'], inplace=True)