在 pandas DataFrame 中识别组中其他人阈值内的值
Identify values within threshold of others in group in pandas DataFrame
所以我的问题是如何让 'accuracy' 列的值相对于 'vin' 列彼此为 + -1。如果我们得到的 +-1 值比特定 'vin' 的最小 2 个值应该存在,如果它小于 2 个值那么它将是错误的。
下面是我的数据框:
导入 pandas 作为 pd
df = pd.DataFrame({'vin':['aaa','aaa','aaa','aaa','bbb','bbb','bbb','bbb','ccc','ccc','ccc','ddd'],
'accuracy':[1,2,3,9,22,23,211,212,34,39,40,55]})
df
我的预期输出将类似于 'Result' 列。
df = pd.DataFrame({'vin':['aaa','aaa','aaa','aaa','bbb','bbb','bbb','bbb','ccc','ccc','ccc','ddd'],
'value':[1,2,3,9,22,23,211,212,34,39,40,55],'Result':['pass','pass','pass','fail','pass','pass','pass','pass','fail','pass','pass','fail']})
df
输出:
vin value Result
0 aaa 1 pass
1 aaa 2 pass
2 aaa 3 pass
3 aaa 9 fail
4 bbb 22 pass
5 bbb 23 pass
6 bbb 211 pass
7 bbb 212 pass
8 ccc 34 fail
9 ccc 39 pass
10 ccc 40 pass
11 ddd 55 fail
假设数据已排序,您可以计算每组的差异,检查差异是否≤ 1,然后使用此掩码并将其转移到 numpy.where
:
# if not sorted
# df = df.sort_values(by=['vin', 'accuracy'])
mask = df.groupby('vin')['accuracy'].diff().le(1)
df['Result'] = np.where(mask|mask.groupby(df['vin']).shift(-1), 'pass', 'fail')
输出:
vin accuracy Result
0 aaa 1 pass
1 aaa 2 pass
2 aaa 3 pass
3 aaa 9 fail
4 bbb 22 pass
5 bbb 23 pass
6 bbb 211 pass
7 bbb 212 pass
8 ccc 34 fail
9 ccc 39 pass
10 ccc 40 pass
11 ddd 55 fail
所以我的问题是如何让 'accuracy' 列的值相对于 'vin' 列彼此为 + -1。如果我们得到的 +-1 值比特定 'vin' 的最小 2 个值应该存在,如果它小于 2 个值那么它将是错误的。
下面是我的数据框:
导入 pandas 作为 pd
df = pd.DataFrame({'vin':['aaa','aaa','aaa','aaa','bbb','bbb','bbb','bbb','ccc','ccc','ccc','ddd'],
'accuracy':[1,2,3,9,22,23,211,212,34,39,40,55]})
df
我的预期输出将类似于 'Result' 列。
df = pd.DataFrame({'vin':['aaa','aaa','aaa','aaa','bbb','bbb','bbb','bbb','ccc','ccc','ccc','ddd'],
'value':[1,2,3,9,22,23,211,212,34,39,40,55],'Result':['pass','pass','pass','fail','pass','pass','pass','pass','fail','pass','pass','fail']})
df
输出:
vin value Result
0 aaa 1 pass
1 aaa 2 pass
2 aaa 3 pass
3 aaa 9 fail
4 bbb 22 pass
5 bbb 23 pass
6 bbb 211 pass
7 bbb 212 pass
8 ccc 34 fail
9 ccc 39 pass
10 ccc 40 pass
11 ddd 55 fail
假设数据已排序,您可以计算每组的差异,检查差异是否≤ 1,然后使用此掩码并将其转移到 numpy.where
:
# if not sorted
# df = df.sort_values(by=['vin', 'accuracy'])
mask = df.groupby('vin')['accuracy'].diff().le(1)
df['Result'] = np.where(mask|mask.groupby(df['vin']).shift(-1), 'pass', 'fail')
输出:
vin accuracy Result
0 aaa 1 pass
1 aaa 2 pass
2 aaa 3 pass
3 aaa 9 fail
4 bbb 22 pass
5 bbb 23 pass
6 bbb 211 pass
7 bbb 212 pass
8 ccc 34 fail
9 ccc 39 pass
10 ccc 40 pass
11 ddd 55 fail