如何比较 DataFrame 中的各种特定单元格（以相对方式）？

Question

我得到了一个像这样的 DataFrame：

我想添加一个新列来检查每一行的 "attributes.count" 当前行和前 5 行是否保持为零。如果是这样，我希望它 return 正确。在 Excel 中，我会简单地对最后 5 个单元格使用 realtive 引用，但我没有为 pandas 找到类似的东西。

因此，如果我在第 55 行，我只想检查第 50 到 55 行是否只包含零，如果是，return 是。

我尝试了 .diff() 方法，但这并没有真正起到作用，因为它只检查前一行而不是一定数量的前行：

df["Is zero?"] = df["attributes.count"].diff()

有什么解决方法吗？或者甚至是我还不知道的特定方法？（我在编码和 Python 方面都是绝对的初学者，所以请原谅我的愚蠢 :D ）

Answer 1

假设 ['col1','col2','col3'] 是您要检查的列是否为零

s = 0
for col in ['col1','col2','col3']: 
     s += 1*(df[col] == 0)

s 如果其中一列（对于每一行）为零，则为一。然后简单地定义 df['Is zero'] = s

Answer 2

只需使用 rolling 和 sum 即可。

df = pd.DataFrame(
    {"attributes_count": [0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 3, 1]}
)
print(df)
    attributes_count
0                  0
1                  0
2                  0
3                  0
4                  0
5                  1
6                  2
7                  0
8                  0
9                  0
10                 0
11                 0
12                 0
13                 0
14                 3
15                 1

然后使用滚动 5 个周期 window 和求和来创建一个新列。如果总和为零，则为真。

df["Is zero?"] = df["attributes_count"].rolling(5).sum()
print(df)
    attributes_count  Is zero?
0                  0       nan
1                  0       nan
2                  0       nan
3                  0       nan
4                  0    0.0000
5                  1    1.0000
6                  2    3.0000
7                  0    3.0000
8                  0    3.0000
9                  0    3.0000
10                 0    2.0000
11                 0    0.0000
12                 0    0.0000
13                 0    0.0000
14                 3    3.0000
15                 1    4.0000

不幸的是，true 和 false 是颠倒的反义词。所以我们需要让0变成1.

df["Is zero?"] = np.where(df["Is zero?"], 0, 1)
print(df)
   attributes_count  Is zero?
0                  0         0
1                  0         0
2                  0         0
3                  0         0
4                  0         1
5                  1         0
6                  2         0
7                  0         0
8                  0         0
9                  0         0
10                 0         0
11                 0         1
12                 0         1
13                 0         1
14                 3         0
15                 1         0

如何比较 DataFrame 中的各种特定单元格（以相对方式）？

How to compare various specific cells in a DataFrame (in a relative manner)?

python

diff

dataframe

pandas