从 python 中的 pandas 数据框中删除只有一个非零值的行

Question

我有一个 pandas 数据框，如下所示：

Pandas Dataframe

我想删除只有一个非零值的行。最有效的方法是什么？

Answer 1

尝试布尔索引

# sample data
df = pd.DataFrame(np.zeros((10, 10)), columns=list('abcdefghij'))
df.iloc[2:5, 3] = 1
df.iloc[4:5, 4] = 1

# boolean indexing based on condition
df[df.ne(0).sum(axis=1).ne(1)]

仅删除第 2 行和第 3 行，因为第 4 行有两个 non-zero 值，而其他每一行都有零个 non-zero 值。所以我们删除第 2 行和第 3 行。

df.ne(0).sum(axis=1)

0    0
1    0
2    1
3    1
4    2
5    0
6    0
7    0
8    0
9    0

Answer 2

不确定这是否最有效，但我会尝试：

df[[col for col in df.columns if (df[col] != 0).sum() == 1]]

此处每列 2 个循环：1 个用于检查是否 != 0，另一个用于对布尔值求和（如果找到第二个值可能会提前中断）。

否则，您可以定义一个自定义函数来检查而无需每列循环两次：

def check(column):
    already_has_one = False
    for value in column:
        if value != 0:
            if already_has_one:
                return False
            already_has_one = True
    return already_has_one

然后：

df[[col for col in df.columns if check(df[col])]]

比第一个快多了。

Answer 3

或者像这样：

df[(df.applymap(lambda x: bool(x)).sum(1) > 1).values]

从 python 中的 pandas 数据框中删除只有一个非零值的行

dropping rows that has only one non zero value from a pandas dataframe in python

python

dataframe

pandas

data-science