我希望那些出现在数据框中的客户具有比真实价值更多的虚假价值。关于如何实现这一目标的任何建议
I want those customers present in the data frame which has more false value than true value .Any suggestion on how to achieve that
数据框:
df = pd.DataFrame({'A': ['cust1', 'cust1', 'cust2', 'cust1',
'cust2', 'cust1', 'cust2', 'cust2','cust2','cust1'],
'B': ['true', 'true', 'true', 'false',
'false', 'false', 'false', 'true','false','true']})
输出:['cust2']
首先通过 crosstab
and then filter index
values by columns with boolean indexing
, for greater is used Series.gt
获得计数:
df1 = pd.crosstab(df['A'], df['B'])
print (df1)
B false true
A
cust1 2 3
cust2 3 2
c = df1.index[df1['false'].gt(df1['true'])].tolist()
#if True, False are boolean
#c = df1.index[df1[False].gt(df1[True])].tolist()
print (c)
['cust2']]
df[df['B']=='false'].groupby(['A']).count().sort_values(by['A'],ascending=False).index[0]
解释:取所有只有'False'的值,groupby 'A'并计数。现在按降序对值进行排序,得到第一个索引('A')值。
好像是多索引的情况,所以你可以使用索引来隔离更大的值:
list = list(dataframe.index[dataframe['false'].gt(dataframe['true'])])
数据框:
df = pd.DataFrame({'A': ['cust1', 'cust1', 'cust2', 'cust1',
'cust2', 'cust1', 'cust2', 'cust2','cust2','cust1'],
'B': ['true', 'true', 'true', 'false',
'false', 'false', 'false', 'true','false','true']})
输出:['cust2']
首先通过 crosstab
and then filter index
values by columns with boolean indexing
, for greater is used Series.gt
获得计数:
df1 = pd.crosstab(df['A'], df['B'])
print (df1)
B false true
A
cust1 2 3
cust2 3 2
c = df1.index[df1['false'].gt(df1['true'])].tolist()
#if True, False are boolean
#c = df1.index[df1[False].gt(df1[True])].tolist()
print (c)
['cust2']]
df[df['B']=='false'].groupby(['A']).count().sort_values(by['A'],ascending=False).index[0]
解释:取所有只有'False'的值,groupby 'A'并计数。现在按降序对值进行排序,得到第一个索引('A')值。
好像是多索引的情况,所以你可以使用索引来隔离更大的值:
list = list(dataframe.index[dataframe['false'].gt(dataframe['true'])])