使用 pandas 获取每行一个特定值的最大出现次数

Question

我有以下数据框：

   1   2   3   4   5   6   7  8  9
0  0   0   1   0   0   0   0  0  1
1  0   0   0   0   1   1   0  1  0
2  1   1   0   1   1   0   0  1  1
...

我想为每一行获取该行中值 0 的最长序列。因此，此数据框的预期结果将是一个如下所示的数组：

[5,4,2,...]

如第一行，最大sequence eof值0为5，等等

我已经看到 post 并尝试在第一行开始获取它（尽管我想对整个数据帧立即执行此操作）但我遇到了错误：

s=df_day.iloc[0]
(~s).cumsum()[s].value_counts().max()

TypeError: ufunc 'invert' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

当我手动插入这样的值时：

s=pd.Series([0,0,1,0,0,0,0,0,1])
(~s).cumsum()[s].value_counts().max()

>>>7

我得到了 7，这是行中总 0 的数量，但不是最大序列。但是，我不明白为什么它一开始会引发错误，更重要的是，我想运行它在 while 数据帧和每行的最后。

我的最终目标：连续获得值 0 的最大不间断出现。

Answer 1

下面的代码应该可以完成这项工作。

函数 longest_streak 将计算连续零的数量和 return 最大值，您可以在 df 上使用 apply。

from itertools import groupby
    def longest_streak(l):
      lst = []
      for n,c in groupby(l):
        num,count = n,sum(1 for i in c)
        if num==0:
          lst.append((num,count))

  maxx = max([y for x,y in lst])
  return(maxx)

df.apply(lambda x: longest_streak(x),axis=1)

Answer 2

使用：

df = df.T.apply(lambda x: (x != x.shift()).astype(int).cumsum().where(x.eq(0)).dropna().value_counts().max())

OUTPUT

0    5
1    4
2    2

Answer 3

每行连续 0 计数的向量化解决方案，因此为了最大程度地使用 max DataFrame c:

#more explain 
m = df.eq(0)
b = m.cumsum(axis=1)
c = b.sub(b.mask(m).ffill(axis=1).fillna(0)).astype(int)
print (c)
   1  2  3  4  5  6  7  8  9
0  1  2  0  1  2  3  4  5  0
1  1  2  3  4  0  0  1  0  1
2  0  0  1  0  0  1  2  0  0

df['max_consecutive_0'] = c.max(axis=1)
print (df)
   1  2  3  4  5  6  7  8  9  max_consecutive_0
0  0  0  1  0  0  0  0  0  1                  5
1  0  0  0  0  1  1  0  1  0                  4
2  1  1  0  1  1  0  0  1  1                  2

使用 pandas 获取每行一个特定值的最大出现次数

Get maximum occurance of one specific value per row with pandas

python

row

pandas

find-occurrences