当另一行有缺失数据时 Dropna 或 drop_duplicates with NaN 匹配所有数据

Question

我有如下数据：

Index  ID    data1  data2 ...
0      123   0      NaN   ...
1      123   0      1     ...
2      456   NaN    0     ...
3      456   NaN    0     ...
...

我需要删除小于或等于其他相同行中可用信息的行。

在上面的示例中，应删除第 0 行和 2 xor 3。

到目前为止我最好的尝试是相当慢，而且无法正常工作：

df.groupby(by='ID').fillna(method='ffill',inplace=True).fillna(method='bfill',inplace=True)
df.drop_duplicates(inplace=True)

我怎样才能最好地实现这个目标？

Answer 1

你的方法看起来不错，只是使用就地分配在这里不起作用（因为你分配给数据的副本），使用：

df = df.groupby(by='ID', as_index=False).fillna(method='ffill').fillna(method='bfill')

df.drop_duplicates()

   ID   data1  data2
0  123    0.0    1.0
2  456    NaN    0.0

当另一行有缺失数据时 Dropna 或 drop_duplicates with NaN 匹配所有数据

Dropna when another row has the missing data OR drop_duplicates with NaN matching all data

python

nan

pandas

drop-duplicates