pandas 如何逐行迭代统计一个类别的实例，并在另一个类别出现时重置它们？

Question

我有一个显示机器行为的 DataFrame。该机器可以处于两种状态：生产或清洁。因此，我有一个名为“Production”的虚拟变量，它在机器生产时显示 1，在不生产时显示 0。我想知道生产周期（机器在停止之前持续生产多少小时，以及在重新开始整个过程之前停止多少时间）。因此，我想创建一个列来统计机器在每个状态下的时间（多少行），但是当另一个类别再次出现时它应该自己重置。

示例：

production production_cycle
1          5
1          5
1          5
1          5
1          5
0          2
0          2
1          1
0          3
0          3
0          3

Answer 1

您可以首先通过查看与前一个转折点 diff 的点来检测转折点。然后这个的累积总和给出了所需的分组。我们 transform 这与 count 得到每个组的大小：

>>> grouper = df.production.diff().ne(0).cumsum()
>>> df["production_cycle"] = df.groupby(grouper).transform("count")
>>> df

    production  production_cycle
0            1                 5
1            1                 5
2            1                 5
3            1                 5
4            1                 5
5            0                 2
6            0                 2
7            1                 1
8            0                 3
9            0                 3
10           0                 3

grouper 是

>>> grouper

0     1
1     1
2     1
3     1
4     1
5     2
6     2
7     3
8     4
9     4
10    4

pandas 如何逐行迭代统计一个类别的实例，并在另一个类别出现时重置它们？

pandas how to iteratively count instances of a category by row and reset them when the other category appears?

python

count

categories

pandas