考虑到每个 ID 组的排序 A 字段，有没有办法检查 B 字段中值的重复？（见下面的例子）

Question

假设我们有 table 数以千计的用户，具有 ID、年月和 余额($)。让我们将其简化为以下 table 有 3 个用户：

user ID (numeric)	year-month (string)	balance(float)
1	2019-01	500.0
1	2019-02	500.0
1	2019-03	0.0
1	2019-04	500.0
1	2019-05	0.0
1	2019-06	0.0
2	2018-09	1000.0
2	2018-10	1000.0
2	2018-11	750.0
2	2018-12	500.0
2	2019-01	0.0
2	2019-02	0.0
2	2019-03	0.0
2	2019-04	0.0
2	2019-05	0.0
2	2019-06	0.0
2	2019-07	0.0
3	2018-01	200.0
3	2018-02	0.0
3	2018-03	200.0
3	2018-04	0.0

主要规则是：如果给定月份的余额达到 0，则之后的月份不能有余额值不是 0。这意味着唯一能正确报告其记录的用户是 ID=2。

作为最终输出，我想要一个 table 来显示有多少用户 ID 满足规则，有多少用户 ID 不满足：

well_informed	num_cases
YES	1
NO	2

由于遍历用户 ID 的连续记录并检查条件很困难，我已经尝试了几种方法，甚至都没有接近结果。

Python-Pandas和SQL中的解决方案对我工作的环境有效。非常感谢！

EDIT v1: @d.b @Henry Ecker 解决方案适用于我提供的示例，但不适用于我的问题，因为我没有指定某些情况将有效，例如以下内容：

user ID (numeric)	year-month (string)	balance(float)
4	2019-02	1000.0
4	2019-03	1000.0
4	2019-04	1000.0
4	2019-05	1000.0
4	2019-06	1000.0
4	2019-07	1000.0
4	2019-08	1000.0
4	2019-09	1000.0
4	2019-10	1000.0
4	2019-11	1000.0
4	2019-12	1000.0

应该被认为是正确的，但将其归类为错误。

Answer 1

对于每个 ID，对 balance 执行运行长度编码并检查是否只有该编码的最后一个值是 0。

import pdrle

def foo(x):
    rle = pdrle.encode(x.eq(0))
    if rle.vals.sum() == 0:
        return True
    if rle.vals.sum() == 1:
        return rle.vals.tail(1).item()
    return False


ans = dat.groupby(dat["user ID"], as_index=False).balance.apply(foo)
ans
#     user ID     balance
# 0         1       False
# 1         2        True
# 2         3       False

下一步可以总结ans

ans.groupby("balance").size()
# balance
# False    2
# True     1
# dtype: int64

考虑到每个 ID 组的排序 A 字段，有没有办法检查 B 字段中值的重复？（见下面的例子）

Is there any way to check the repetition of the value in a B field, taking into account a sorted A field, for each ID group? (See example below)

python

sql

group-by

sas

pandas

考虑到每个 ID 组的排序 A 字段，有没有办法检查 B 字段中值的重复？ （见下面的例子）

Is there any way to check the repetition of the value in a B field, taking into account a sorted A field, for each ID group? (See example below)

python

sql

group-by

sas

pandas

考虑到每个 ID 组的排序 A 字段，有没有办法检查 B 字段中值的重复？（见下面的例子）