在 pandas DataFrame 中识别组中其他人阈值内的值

Identify values within threshold of others in group in pandas DataFrame

所以我的问题是如何让 'accuracy' 列的值相对于 'vin' 列彼此为 + -1。如果我们得到的 +-1 值比特定 'vin' 的最小 2 个值应该存在,如果它小于 2 个值那么它将是错误的。

下面是我的数据框:

导入 pandas 作为 pd

df = pd.DataFrame({'vin':['aaa','aaa','aaa','aaa','bbb','bbb','bbb','bbb','ccc','ccc','ccc','ddd'],
                   'accuracy':[1,2,3,9,22,23,211,212,34,39,40,55]})
df

我的预期输出将类似于 'Result' 列。

df = pd.DataFrame({'vin':['aaa','aaa','aaa','aaa','bbb','bbb','bbb','bbb','ccc','ccc','ccc','ddd'],
                   'value':[1,2,3,9,22,23,211,212,34,39,40,55],'Result':['pass','pass','pass','fail','pass','pass','pass','pass','fail','pass','pass','fail']})
df

输出:

    vin  value Result
0   aaa      1   pass
1   aaa      2   pass
2   aaa      3   pass
3   aaa      9   fail
4   bbb     22   pass
5   bbb     23   pass
6   bbb    211   pass
7   bbb    212   pass
8   ccc     34   fail
9   ccc     39   pass
10  ccc     40   pass
11  ddd     55   fail

假设数据已排序,您可以计算每组的差异,检查差异是否≤ 1,然后使用此掩码并将其转移到 numpy.where:

# if not sorted
# df = df.sort_values(by=['vin', 'accuracy'])

mask = df.groupby('vin')['accuracy'].diff().le(1)
df['Result'] = np.where(mask|mask.groupby(df['vin']).shift(-1), 'pass', 'fail')

输出:

    vin  accuracy Result
0   aaa         1   pass
1   aaa         2   pass
2   aaa         3   pass
3   aaa         9   fail
4   bbb        22   pass
5   bbb        23   pass
6   bbb       211   pass
7   bbb       212   pass
8   ccc        34   fail
9   ccc        39   pass
10  ccc        40   pass
11  ddd        55   fail