对具有正值的行进行计数，如果为负值则重置

Question

我想添加一个列来计算连续的正数，并在 pandas 数据帧上找到负数时重置计数器。我也许可以使用 'for' 语句遍历它，但我只知道有更好的解决方案。我看过各种类似的帖子，几乎问的都是一样的，但我就是无法得到这些解决方案来解决我的问题。

我有：

Slope
-25
-15
 17
 6
 0.1
 5
-3
 5
 1
 3
-0.1
-0.2
 1
-9

我想要的：

Slope  Count
-25      0
-15      0
 17      1
 6       2
 0.1     3
 5       4
 -3      0
 5       1
 1       2
 3       3
-0.1     0
-0.2     0
 1       1
-9       0

请记住这是一个低技能水平的问题。如果您提出的解决方案有多个步骤，请逐一解释。我想要一个答案，但更希望我能理解 'how'.

Answer 1

IMO，迭代解决这个问题是唯一的方法，因为有一个条件必须满足。您可以使用任何迭代方式，例如 for 或 while。用map解决这个问题会很麻烦，因为这个问题仍然需要修改之前的元素并赋值给当前元素

Answer 2

我们可以通过遍历所有行并使用 pandas 中的 loc 功能来解决问题。假设您已经有一个名为 df 的数据框，其中有一列名为 slope。我们的想法是，我们将按顺序将一个加到前一行，但是如果我们达到 slope_i < 0 行乘以 0 的计数。

df['new_col'] = 0  # just preset everything to be zero

for i in range(1, len(df)):
    df.loc[i, 'new_col'] = (df.loc[i-1, 'new_col'] + 1) * (df.loc[i, 'slope'] >= 0)

Answer 3

您首先要标记新段（即组）开始的位置：

>>> df['Count'] = df.Slope.lt(0)
>>> df.head(7)
    Slope  Count
0   -25.0   True
1   -15.0   True
2    17.0  False
3     6.0  False
4     0.1  False
5     5.0  False
6    -3.0   True

现在你需要使用累积和来标记每个组：因为True在数学方程式中被计算为1，累积和将标记每个段都有一个递增的整数。（这是pandas中非常强大的概念！）

>>> df['Count'] = df.Count.cumsum()
>>> df.head(7)
    Slope  Count
0   -25.0      1
1   -15.0      2
2    17.0      2
3     6.0      2
4     0.1      2
5     5.0      2
6    -3.0      3

现在您可以使用groupby访问每个段，然后您需要做的就是为每个组生成一个从零开始的递增序列。有很多方法可以做到这一点，我只使用每个组的 (reset'ed) 索引，即重置索引，从 0 开始获取新鲜的 RangeIndex，并将它变成一个系列：

>>> df.groupby('Count').apply(lambda x: x.reset_index().index.to_series())
Count   
1      0    0
2      0    0
       1    1
       2    2
       3    3
       4    4
3      0    0
       1    1
       2    2
       3    3
4      0    0
5      0    0
       1    1
6      0    0

这会产生预期的计数，但请注意最终索引与原始数据框不匹配，因此我们需要另一个 reset_index() 和 drop=True 来丢弃分组索引以将其放入我们的原始数据框：

>>> df['Count'] = df.groupby('Count').apply(lambda x:x.reset_index().index.to_series()).reset_index(drop=True)

瞧瞧：

>>> df
    Slope  Count
0   -25.0      0
1   -15.0      0
2    17.0      1
3     6.0      2
4     0.1      3
5     5.0      4
6    -3.0      0
7     5.0      1
8     1.0      2
9     3.0      3
10   -0.1      0
11   -0.2      0
12    1.0      1
13   -9.0      0

Answer 4

您可以使用 groupby-command 来完成此操作。它需要一些步骤，可能可以缩短，但它是这样工作的。

首先，您通过查找负数创建一个重置列

# create reset condition
df['reset'] = df.slope.lt(0)

然后您使用 cumsum() 创建组以重置 --> 此时每组正例都获得一个唯一的组值。这里的最后一行给出了组 0

中的所有负数

# create groups of positive values
df['group'] = df.reset.cumsum()
df.loc[df['reset'], 'group'] = 0

现在你将正数分组并累加一些正数（一定有比这更好的解决方案）来得到你的结果。最后一行再次清理负值的结果

# sum ones :-D
df['count'] = 1
df['count'] = df.groupby('group')['count'].cumsum()
df.loc[df['reset'], 'count'] = 0

没那么好one-line，但特别是对于较大的数据集，它应该比遍历整个数据帧更快

为了更容易复制和粘贴整个内容（包括一些替换之前行的注释行。使其更短但更难理解）

import pandas as pd

## create data
slope = [-25, -15, 17, 6, 0.1, 5, -3, 5, 1, 3, -0.1, -0.2, 1, -9]
df = pd.DataFrame(data=slope, columns=['slope'])

## create reset condition
df['reset'] = df.slope.lt(0)

## create groups of positive values
df['group'] = df.reset.cumsum()
df.loc[df['reset'], 'group'] = 0
# df['group'] = df.reset.cumsum().mask(df.reset, 0)


## sum ones :-D
df['count'] = 1
df['count'] = df.groupby('group')['count'].cumsum()
df.loc[df['reset'], 'count'] = 0
# df['count'] = df.groupby('group')['count'].cumsum().mask(df.reset, 0)

对具有正值的行进行计数，如果为负值则重置

Count rows with positive values and reset if negative

python

cumsum