我希望那些出现在数据框中的客户具有比真实价值更多的虚假价值。关于如何实现这一目标的任何建议

I want those customers present in the data frame which has more false value than true value .Any suggestion on how to achieve that

数据框:

df = pd.DataFrame({'A': ['cust1', 'cust1', 'cust2', 'cust1',
                            'cust2', 'cust1', 'cust2', 'cust2','cust2','cust1'],
                       'B': ['true', 'true', 'true', 'false',
                            'false', 'false', 'false', 'true','false','true']})

输出:['cust2']

首先通过 crosstab and then filter index values by columns with boolean indexing, for greater is used Series.gt 获得计数:

df1 = pd.crosstab(df['A'], df['B'])
print (df1)
B      false  true
A                 
cust1      2     3
cust2      3     2

c = df1.index[df1['false'].gt(df1['true'])].tolist()
#if True, False are boolean
#c = df1.index[df1[False].gt(df1[True])].tolist()
print (c)
['cust2']]
df[df['B']=='false'].groupby(['A']).count().sort_values(by['A'],ascending=False).index[0]

解释:取所有只有'False'的值,groupby 'A'并计数。现在按降序对值进行排序,得到第一个索引('A')值。

好像是多索引的情况,所以你可以使用索引来隔离更大的值:

list = list(dataframe.index[dataframe['false'].gt(dataframe['true'])])